[mvapich-discuss] process limits

Jonathan L. Perkins perkinjo at cse.ohio-state.edu
Thu Aug 30 14:37:16 EDT 2007


Mark Potts wrote:
> DK,
>    Thanks for the quick reply.
> 
>    I'm maybe jumping ahead here, but assuming the "timeout" is the
>    one that I suggested from the 10 July patch (mentioned below)
>    I have a question related to that patch.
>    Could you/your people suggest why this patch seems to fail for
>    users running tcsh (as opposed to bash)?  My initial testing,
>    which used bash, of this patch found that it worked quite nicely
>    to clean up jobs when one or more of the processes killed
>    themselves.  An mpirun_rsh thread detected this condition and
>    then killed the remaining remote tasks.  However, we now find
>    that users employing tcsh login shell are not so lucky.  The
>    detection part of the patch works successfully and the remote
>    sshd tasks are killed successfully but the remote MPI processes
>    continue to run.  Quite interesting if it weren't so bad.  We
>    can be left with large numbers of CPU burning tasks that are
>    difficult to find and kill.

Mark:
Attached you find a patch that should solve the issue with remote 
processes not being killed when using tcsh.  The problem arose from 
using the signal name with the kill command.  Since in many cases the 
kill command is a shell built in, there can be some minor differences 
such as SIGKILL in bash and just KILL in tcsh.

After applying this patch to the MVAPICH source you can simply 
re-install using the proper make script.  Let me know if you have any 
further questions regarding this.

As for your original question, it seems that the problem isn't the 
timeout while waiting for the other child process to end.  More 
troublesome is that the child died in the first place.  We have 
reproduced this issue and are working on finding the best solution to 
resolve it.  We'll keep you posted.

>    Shell experts' ideas welcome.
> 
>            regards,
> 
> Dhabaleswar Panda wrote:
>> Hi Mark,
>>
>> Thanks for providing us the details. There appears to be some
>> `time out' with the new mpirun_rsh. We are taking a look at it
>> and will be able to send you some solution soon.
>>
>> Thanks,
>>
>> DK
>>
>> On Tue, 28 Aug 2007, Mark Potts wrote:
>>
>>> Hi,
>>>     I tried VIADEV_USE_SHMEM_COLL=0 and separately tried
>>>     VIADEV_USE_BLOCKING=1, with no change in results.  During task
>>>     startup I get either "Unable to find child nnnn!", "Child died.
>>>     Timeout while waiting", and/or simply "done."
>>>
>>>     I tried repeatedly but was never able to consistently run more
>>>     than 10 ranks (-np 10) on a single node.  I, of course, am
>>>     able to run many more ranks, when I spread the targets across
>>>     more nodes.
>>>
>>>    My experiment is to start a very simple code with multiple
>>>    processes on a single node.  Specific details of my setup on two
>>>    machines.  The results were the same:
>>>
>>>     Machine Cpus per  Cores per   Avail   Cpu      MVAPICH     MVAPICH
>>>              Node      Cpu        Nodes   Type     version     Device
>>>      A        1         2          3     X86-64  -0.9.9-1168   ch_gen2
>>>      B        2         4         16     X86_64  -0.9.9-1326   ch_gen2
>>>
>>>     The MVAPICH code, which was obtained from ofed 1.2 installation,
>>>     has two patches as follows:
>>>      (1) for mpirun_rsh.c from Sayatan Sur of 10 Jul for MVAPICH errant
>>>          process/job cleanup.
>>>      (2) for comm_free.c from Amith Rajith Mamidla of 11 Jul for MVAPICH
>>>          segmentation fault during MPI_Finalize() in large jobs.
>>>
>>>     Is it possible that the mpirun_rsh.c patch is prematurely killing
>>>     tasks when it determines that the processes on the oversubscribed
>>>     node are not responding fast enough?  Or is there another
>>>     clean explanation?  As I understand the note from DK this morning,
>>>     oversubscription should work...
>>>           regards,
>>>
>>> amith rajith mamidala wrote:
>>>> Hi Mark,
>>>>
>>>> Can you check if you get this error by setting the environment 
>>>> variable:
>>>> VIADEV_USE_SHMEM_COLL to 0 e.g. mpirun_rsh -np N 
>>>> VIADEV_USE_SHMEM_COLL=0
>>>> ./a.out
>>>>
>>>> -thanks,
>>>> Amith
>>>>
>>>> On Tue, 28 Aug 2007, Mark Potts wrote:
>>>>
>>>>> Hi,
>>>>>     Is there an effective or hard limit on the number of MVAPICH
>>>>>     processes that can be run on a single node?
>>>>>
>>>>>     Given N cpus, each having M cores, on a single node, I've been 
>>>>> told
>>>>>     that one can not run more than N*M MVAPICH processes on a single
>>>>>     node.  In fact, I observe that if I try to even approach this 
>>>>> number
>>>>>     with "-np 16" (for a node with N=8 and M=4), I observe a 
>>>>> "unable to
>>>>>     find child nnnn!" or "Child died" message.  Is this a 
>>>>> configuration
>>>>>     problem with this system or somehow an expected behavior?
>>>>>
>>>>>     More pointedly, should oversubscription of cores, np > N*M, on a
>>>>>     single node work in MVAPICH?  How about in MVAPICH2?
>>>>>
>>>>>             regards,
>>>>> -- 
>>>>> ***********************************
>>>>>  >> Mark J. Potts, PhD
>>>>>  >>
>>>>>  >> HPC Applications Inc.
>>>>>  >> phone: 410-992-8360 Bus
>>>>>  >>        410-313-9318 Home
>>>>>  >>        443-418-4375 Cell
>>>>>  >> email: potts at hpcapplications.com
>>>>>  >>        potts at excray.com
>>>>> ***********************************
>>>>> _______________________________________________
>>>>> mvapich-discuss mailing list
>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>
>>> -- 
>>> ***********************************
>>>  >> Mark J. Potts, PhD
>>>  >>
>>>  >> HPC Applications Inc.
>>>  >> phone: 410-992-8360 Bus
>>>  >>        410-313-9318 Home
>>>  >>        443-418-4375 Cell
>>>  >> email: potts at hpcapplications.com
>>>  >>        potts at excray.com
>>> ***********************************
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
> 


-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpirun_rsh.patch
Type: text/x-patch
Size: 638 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20070830/a4411d46/mpirun_rsh.bin


More information about the mvapich-discuss mailing list