[mvapich-discuss] process limits

Mark Potts potts at hpcapplications.com
Wed Aug 29 13:36:13 EDT 2007


DK,
    Thanks for the quick reply.

    I'm maybe jumping ahead here, but assuming the "timeout" is the
    one that I suggested from the 10 July patch (mentioned below)
    I have a question related to that patch.
    Could you/your people suggest why this patch seems to fail for
    users running tcsh (as opposed to bash)?  My initial testing,
    which used bash, of this patch found that it worked quite nicely
    to clean up jobs when one or more of the processes killed
    themselves.  An mpirun_rsh thread detected this condition and
    then killed the remaining remote tasks.  However, we now find
    that users employing tcsh login shell are not so lucky.  The
    detection part of the patch works successfully and the remote
    sshd tasks are killed successfully but the remote MPI processes
    continue to run.  Quite interesting if it weren't so bad.  We
    can be left with large numbers of CPU burning tasks that are
    difficult to find and kill.

    Shell experts' ideas welcome.

            regards,

Dhabaleswar Panda wrote:
> Hi Mark,
> 
> Thanks for providing us the details. There appears to be some
> `time out' with the new mpirun_rsh. We are taking a look at it
> and will be able to send you some solution soon.
> 
> Thanks,
> 
> DK
> 
> On Tue, 28 Aug 2007, Mark Potts wrote:
> 
>> Hi,
>>     I tried VIADEV_USE_SHMEM_COLL=0 and separately tried
>>     VIADEV_USE_BLOCKING=1, with no change in results.  During task
>>     startup I get either "Unable to find child nnnn!", "Child died.
>>     Timeout while waiting", and/or simply "done."
>>
>>     I tried repeatedly but was never able to consistently run more
>>     than 10 ranks (-np 10) on a single node.  I, of course, am
>>     able to run many more ranks, when I spread the targets across
>>     more nodes.
>>
>>    My experiment is to start a very simple code with multiple
>>    processes on a single node.  Specific details of my setup on two
>>    machines.  The results were the same:
>>
>>     Machine Cpus per  Cores per   Avail   Cpu      MVAPICH     MVAPICH
>>              Node      Cpu        Nodes   Type     version     Device
>>      A        1         2          3     X86-64  -0.9.9-1168   ch_gen2
>>      B        2         4         16     X86_64  -0.9.9-1326   ch_gen2
>>
>>     The MVAPICH code, which was obtained from ofed 1.2 installation,
>>     has two patches as follows:
>>      (1) for mpirun_rsh.c from Sayatan Sur of 10 Jul for MVAPICH errant
>>          process/job cleanup.
>>      (2) for comm_free.c from Amith Rajith Mamidla of 11 Jul for MVAPICH
>>          segmentation fault during MPI_Finalize() in large jobs.
>>
>>     Is it possible that the mpirun_rsh.c patch is prematurely killing
>>     tasks when it determines that the processes on the oversubscribed
>>     node are not responding fast enough?  Or is there another
>>     clean explanation?  As I understand the note from DK this morning,
>>     oversubscription should work...
>>           regards,
>>
>> amith rajith mamidala wrote:
>>> Hi Mark,
>>>
>>> Can you check if you get this error by setting the environment variable:
>>> VIADEV_USE_SHMEM_COLL to 0 e.g. mpirun_rsh -np N VIADEV_USE_SHMEM_COLL=0
>>> ./a.out
>>>
>>> -thanks,
>>> Amith
>>>
>>> On Tue, 28 Aug 2007, Mark Potts wrote:
>>>
>>>> Hi,
>>>>     Is there an effective or hard limit on the number of MVAPICH
>>>>     processes that can be run on a single node?
>>>>
>>>>     Given N cpus, each having M cores, on a single node, I've been told
>>>>     that one can not run more than N*M MVAPICH processes on a single
>>>>     node.  In fact, I observe that if I try to even approach this number
>>>>     with "-np 16" (for a node with N=8 and M=4), I observe a "unable to
>>>>     find child nnnn!" or "Child died" message.  Is this a configuration
>>>>     problem with this system or somehow an expected behavior?
>>>>
>>>>     More pointedly, should oversubscription of cores, np > N*M, on a
>>>>     single node work in MVAPICH?  How about in MVAPICH2?
>>>>
>>>>             regards,
>>>> --
>>>> ***********************************
>>>>  >> Mark J. Potts, PhD
>>>>  >>
>>>>  >> HPC Applications Inc.
>>>>  >> phone: 410-992-8360 Bus
>>>>  >>        410-313-9318 Home
>>>>  >>        443-418-4375 Cell
>>>>  >> email: potts at hpcapplications.com
>>>>  >>        potts at excray.com
>>>> ***********************************
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>> --
>> ***********************************
>>  >> Mark J. Potts, PhD
>>  >>
>>  >> HPC Applications Inc.
>>  >> phone: 410-992-8360 Bus
>>  >>        410-313-9318 Home
>>  >>        443-418-4375 Cell
>>  >> email: potts at hpcapplications.com
>>  >>        potts at excray.com
>> ***********************************
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>

-- 
***********************************
 >> Mark J. Potts, PhD
 >>
 >> HPC Applications Inc.
 >> phone: 410-992-8360 Bus
 >>        410-313-9318 Home
 >>        443-418-4375 Cell
 >> email: potts at hpcapplications.com
 >>        potts at excray.com
***********************************


More information about the mvapich-discuss mailing list