[mvapich-discuss] process limits

Fri Aug 31 23:13:37 EDT 2007

Jonathan,
    That patch seems to solve the non-bash shell problems.  I've asked
    some others to try it for themselves, and if I learn anything new,
    I'll let you know.  But for now, it appears that problem can be
    checked off.
           regards,


Jonathan L. Perkins wrote:
> Mark Potts wrote:
>> DK,
>>    Thanks for the quick reply.
>>
>>    I'm maybe jumping ahead here, but assuming the "timeout" is the
>>    one that I suggested from the 10 July patch (mentioned below)
>>    I have a question related to that patch.
>>    Could you/your people suggest why this patch seems to fail for
>>    users running tcsh (as opposed to bash)?  My initial testing,
>>    which used bash, of this patch found that it worked quite nicely
>>    to clean up jobs when one or more of the processes killed
>>    themselves.  An mpirun_rsh thread detected this condition and
>>    then killed the remaining remote tasks.  However, we now find
>>    that users employing tcsh login shell are not so lucky.  The
>>    detection part of the patch works successfully and the remote
>>    sshd tasks are killed successfully but the remote MPI processes
>>    continue to run.  Quite interesting if it weren't so bad.  We
>>    can be left with large numbers of CPU burning tasks that are
>>    difficult to find and kill.
> 
> Mark:
> Attached you find a patch that should solve the issue with remote 
> processes not being killed when using tcsh.  The problem arose from 
> using the signal name with the kill command.  Since in many cases the 
> kill command is a shell built in, there can be some minor differences 
> such as SIGKILL in bash and just KILL in tcsh.
> 
> After applying this patch to the MVAPICH source you can simply 
> re-install using the proper make script.  Let me know if you have any 
> further questions regarding this.
> 
> As for your original question, it seems that the problem isn't the 
> timeout while waiting for the other child process to end.  More 
> troublesome is that the child died in the first place.  We have 
> reproduced this issue and are working on finding the best solution to 
> resolve it.  We'll keep you posted.
> 
>>    Shell experts' ideas welcome.
>>
>>            regards,
>>
>> Dhabaleswar Panda wrote:
>>> Hi Mark,
>>>
>>> Thanks for providing us the details. There appears to be some
>>> `time out' with the new mpirun_rsh. We are taking a look at it
>>> and will be able to send you some solution soon.
>>>
>>> Thanks,
>>>
>>> DK
>>>
>>> On Tue, 28 Aug 2007, Mark Potts wrote:
>>>
>>>> Hi,
>>>>     I tried VIADEV_USE_SHMEM_COLL=0 and separately tried
>>>>     VIADEV_USE_BLOCKING=1, with no change in results.  During task
>>>>     startup I get either "Unable to find child nnnn!", "Child died.
>>>>     Timeout while waiting", and/or simply "done."
>>>>
>>>>     I tried repeatedly but was never able to consistently run more
>>>>     than 10 ranks (-np 10) on a single node.  I, of course, am
>>>>     able to run many more ranks, when I spread the targets across
>>>>     more nodes.
>>>>
>>>>    My experiment is to start a very simple code with multiple
>>>>    processes on a single node.  Specific details of my setup on two
>>>>    machines.  The results were the same:
>>>>
>>>>     Machine Cpus per  Cores per   Avail   Cpu      MVAPICH     MVAPICH
>>>>              Node      Cpu        Nodes   Type     version     Device
>>>>      A        1         2          3     X86-64  -0.9.9-1168   ch_gen2
>>>>      B        2         4         16     X86_64  -0.9.9-1326   ch_gen2
>>>>
>>>>     The MVAPICH code, which was obtained from ofed 1.2 installation,
>>>>     has two patches as follows:
>>>>      (1) for mpirun_rsh.c from Sayatan Sur of 10 Jul for MVAPICH errant
>>>>          process/job cleanup.
>>>>      (2) for comm_free.c from Amith Rajith Mamidla of 11 Jul for 
>>>> MVAPICH
>>>>          segmentation fault during MPI_Finalize() in large jobs.
>>>>
>>>>     Is it possible that the mpirun_rsh.c patch is prematurely killing
>>>>     tasks when it determines that the processes on the oversubscribed
>>>>     node are not responding fast enough?  Or is there another
>>>>     clean explanation?  As I understand the note from DK this morning,
>>>>     oversubscription should work...
>>>>           regards,
>>>>
>>>> amith rajith mamidala wrote:
>>>>> Hi Mark,
>>>>>
>>>>> Can you check if you get this error by setting the environment 
>>>>> variable:
>>>>> VIADEV_USE_SHMEM_COLL to 0 e.g. mpirun_rsh -np N 
>>>>> VIADEV_USE_SHMEM_COLL=0
>>>>> ./a.out
>>>>>
>>>>> -thanks,
>>>>> Amith
>>>>>
>>>>> On Tue, 28 Aug 2007, Mark Potts wrote:
>>>>>
>>>>>> Hi,
>>>>>>     Is there an effective or hard limit on the number of MVAPICH
>>>>>>     processes that can be run on a single node?
>>>>>>
>>>>>>     Given N cpus, each having M cores, on a single node, I've been 
>>>>>> told
>>>>>>     that one can not run more than N*M MVAPICH processes on a single
>>>>>>     node.  In fact, I observe that if I try to even approach this 
>>>>>> number
>>>>>>     with "-np 16" (for a node with N=8 and M=4), I observe a 
>>>>>> "unable to
>>>>>>     find child nnnn!" or "Child died" message.  Is this a 
>>>>>> configuration
>>>>>>     problem with this system or somehow an expected behavior?
>>>>>>
>>>>>>     More pointedly, should oversubscription of cores, np > N*M, on a
>>>>>>     single node work in MVAPICH?  How about in MVAPICH2?
>>>>>>
>>>>>>             regards,
>>>>>> -- 
>>>>>> ***********************************
>>>>>>  >> Mark J. Potts, PhD
>>>>>>  >>
>>>>>>  >> HPC Applications Inc.
>>>>>>  >> phone: 410-992-8360 Bus
>>>>>>  >>        410-313-9318 Home
>>>>>>  >>        443-418-4375 Cell
>>>>>>  >> email: potts at hpcapplications.com
>>>>>>  >>        potts at excray.com
>>>>>> ***********************************
>>>>>> _______________________________________________
>>>>>> mvapich-discuss mailing list
>>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>>
>>>> -- 
>>>> ***********************************
>>>>  >> Mark J. Potts, PhD
>>>>  >>
>>>>  >> HPC Applications Inc.
>>>>  >> phone: 410-992-8360 Bus
>>>>  >>        410-313-9318 Home
>>>>  >>        443-418-4375 Cell
>>>>  >> email: potts at hpcapplications.com
>>>>  >>        potts at excray.com
>>>> ***********************************
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>
> 
> 

-- 
***********************************
 >> Mark J. Potts, PhD
 >>
 >> HPC Applications Inc.
 >> phone: 410-992-8360 Bus
 >>        410-313-9318 Home
 >>        443-418-4375 Cell
 >> email: potts at hpcapplications.com
 >>        potts at excray.com
***********************************