[mvapich-discuss] process limits

Wed Aug 29 17:33:46 EDT 2007

Mark, 

Thanks for letting us know this issue with tcsh. We are taking a look
at this also.

Best Regards, 

DK

> DK,
>     Thanks for the quick reply.
> 
>     I'm maybe jumping ahead here, but assuming the "timeout" is the
>     one that I suggested from the 10 July patch (mentioned below)
>     I have a question related to that patch.
>     Could you/your people suggest why this patch seems to fail for
>     users running tcsh (as opposed to bash)?  My initial testing,
>     which used bash, of this patch found that it worked quite nicely
>     to clean up jobs when one or more of the processes killed
>     themselves.  An mpirun_rsh thread detected this condition and
>     then killed the remaining remote tasks.  However, we now find
>     that users employing tcsh login shell are not so lucky.  The
>     detection part of the patch works successfully and the remote
>     sshd tasks are killed successfully but the remote MPI processes
>     continue to run.  Quite interesting if it weren't so bad.  We
>     can be left with large numbers of CPU burning tasks that are
>     difficult to find and kill.
> 
>     Shell experts' ideas welcome.
> 
>             regards,
> 
> Dhabaleswar Panda wrote:
> > Hi Mark,
> > 
> > Thanks for providing us the details. There appears to be some
> > `time out' with the new mpirun_rsh. We are taking a look at it
> > and will be able to send you some solution soon.
> > 
> > Thanks,
> > 
> > DK
> > 
> > On Tue, 28 Aug 2007, Mark Potts wrote:
> > 
> >> Hi,
> >>     I tried VIADEV_USE_SHMEM_COLL=0 and separately tried
> >>     VIADEV_USE_BLOCKING=1, with no change in results.  During task
> >>     startup I get either "Unable to find child nnnn!", "Child died.
> >>     Timeout while waiting", and/or simply "done."
> >>
> >>     I tried repeatedly but was never able to consistently run more
> >>     than 10 ranks (-np 10) on a single node.  I, of course, am
> >>     able to run many more ranks, when I spread the targets across
> >>     more nodes.
> >>
> >>    My experiment is to start a very simple code with multiple
> >>    processes on a single node.  Specific details of my setup on two
> >>    machines.  The results were the same:
> >>
> >>     Machine Cpus per  Cores per   Avail   Cpu      MVAPICH     MVAPICH
> >>              Node      Cpu        Nodes   Type     version     Device
> >>      A        1         2          3     X86-64  -0.9.9-1168   ch_gen2
> >>      B        2         4         16     X86_64  -0.9.9-1326   ch_gen2
> >>
> >>     The MVAPICH code, which was obtained from ofed 1.2 installation,
> >>     has two patches as follows:
> >>      (1) for mpirun_rsh.c from Sayatan Sur of 10 Jul for MVAPICH errant
> >>          process/job cleanup.
> >>      (2) for comm_free.c from Amith Rajith Mamidla of 11 Jul for MVAPICH
> >>          segmentation fault during MPI_Finalize() in large jobs.
> >>
> >>     Is it possible that the mpirun_rsh.c patch is prematurely killing
> >>     tasks when it determines that the processes on the oversubscribed
> >>     node are not responding fast enough?  Or is there another
> >>     clean explanation?  As I understand the note from DK this morning,
> >>     oversubscription should work...
> >>           regards,
> >>
> >> amith rajith mamidala wrote:
> >>> Hi Mark,
> >>>
> >>> Can you check if you get this error by setting the environment variable:
> >>> VIADEV_USE_SHMEM_COLL to 0 e.g. mpirun_rsh -np N VIADEV_USE_SHMEM_COLL=0
> >>> ./a.out
> >>>
> >>> -thanks,
> >>> Amith
> >>>
> >>> On Tue, 28 Aug 2007, Mark Potts wrote:
> >>>
> >>>> Hi,
> >>>>     Is there an effective or hard limit on the number of MVAPICH
> >>>>     processes that can be run on a single node?
> >>>>
> >>>>     Given N cpus, each having M cores, on a single node, I've been told
> >>>>     that one can not run more than N*M MVAPICH processes on a single
> >>>>     node.  In fact, I observe that if I try to even approach this number
> >>>>     with "-np 16" (for a node with N=8 and M=4), I observe a "unable to
> >>>>     find child nnnn!" or "Child died" message.  Is this a configuration
> >>>>     problem with this system or somehow an expected behavior?
> >>>>
> >>>>     More pointedly, should oversubscription of cores, np > N*M, on a
> >>>>     single node work in MVAPICH?  How about in MVAPICH2?
> >>>>
> >>>>             regards,
> >>>> --
> >>>> ***********************************
> >>>>  >> Mark J. Potts, PhD
> >>>>  >>
> >>>>  >> HPC Applications Inc.
> >>>>  >> phone: 410-992-8360 Bus
> >>>>  >>        410-313-9318 Home
> >>>>  >>        443-418-4375 Cell
> >>>>  >> email: potts at hpcapplications.com
> >>>>  >>        potts at excray.com
> >>>> ***********************************
> >>>> _______________________________________________
> >>>> mvapich-discuss mailing list
> >>>> mvapich-discuss at cse.ohio-state.edu
> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>>
> >> --
> >> ***********************************
> >>  >> Mark J. Potts, PhD
> >>  >>
> >>  >> HPC Applications Inc.
> >>  >> phone: 410-992-8360 Bus
> >>  >>        410-313-9318 Home
> >>  >>        443-418-4375 Cell
> >>  >> email: potts at hpcapplications.com
> >>  >>        potts at excray.com
> >> ***********************************
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> 
> -- 
> ***********************************
>  >> Mark J. Potts, PhD
>  >>
>  >> HPC Applications Inc.
>  >> phone: 410-992-8360 Bus
>  >>        410-313-9318 Home
>  >>        443-418-4375 Cell
>  >> email: potts at hpcapplications.com
>  >>        potts at excray.com
> ***********************************
>