[mvapich-discuss] process limits
Mark Potts
potts at hpcapplications.com
Fri Aug 31 23:13:37 EDT 2007
Jonathan,
That patch seems to solve the non-bash shell problems. I've asked
some others to try it for themselves, and if I learn anything new,
I'll let you know. But for now, it appears that problem can be
checked off.
regards,
Jonathan L. Perkins wrote:
> Mark Potts wrote:
>> DK,
>> Thanks for the quick reply.
>>
>> I'm maybe jumping ahead here, but assuming the "timeout" is the
>> one that I suggested from the 10 July patch (mentioned below)
>> I have a question related to that patch.
>> Could you/your people suggest why this patch seems to fail for
>> users running tcsh (as opposed to bash)? My initial testing,
>> which used bash, of this patch found that it worked quite nicely
>> to clean up jobs when one or more of the processes killed
>> themselves. An mpirun_rsh thread detected this condition and
>> then killed the remaining remote tasks. However, we now find
>> that users employing tcsh login shell are not so lucky. The
>> detection part of the patch works successfully and the remote
>> sshd tasks are killed successfully but the remote MPI processes
>> continue to run. Quite interesting if it weren't so bad. We
>> can be left with large numbers of CPU burning tasks that are
>> difficult to find and kill.
>
> Mark:
> Attached you find a patch that should solve the issue with remote
> processes not being killed when using tcsh. The problem arose from
> using the signal name with the kill command. Since in many cases the
> kill command is a shell built in, there can be some minor differences
> such as SIGKILL in bash and just KILL in tcsh.
>
> After applying this patch to the MVAPICH source you can simply
> re-install using the proper make script. Let me know if you have any
> further questions regarding this.
>
> As for your original question, it seems that the problem isn't the
> timeout while waiting for the other child process to end. More
> troublesome is that the child died in the first place. We have
> reproduced this issue and are working on finding the best solution to
> resolve it. We'll keep you posted.
>
>> Shell experts' ideas welcome.
>>
>> regards,
>>
>> Dhabaleswar Panda wrote:
>>> Hi Mark,
>>>
>>> Thanks for providing us the details. There appears to be some
>>> `time out' with the new mpirun_rsh. We are taking a look at it
>>> and will be able to send you some solution soon.
>>>
>>> Thanks,
>>>
>>> DK
>>>
>>> On Tue, 28 Aug 2007, Mark Potts wrote:
>>>
>>>> Hi,
>>>> I tried VIADEV_USE_SHMEM_COLL=0 and separately tried
>>>> VIADEV_USE_BLOCKING=1, with no change in results. During task
>>>> startup I get either "Unable to find child nnnn!", "Child died.
>>>> Timeout while waiting", and/or simply "done."
>>>>
>>>> I tried repeatedly but was never able to consistently run more
>>>> than 10 ranks (-np 10) on a single node. I, of course, am
>>>> able to run many more ranks, when I spread the targets across
>>>> more nodes.
>>>>
>>>> My experiment is to start a very simple code with multiple
>>>> processes on a single node. Specific details of my setup on two
>>>> machines. The results were the same:
>>>>
>>>> Machine Cpus per Cores per Avail Cpu MVAPICH MVAPICH
>>>> Node Cpu Nodes Type version Device
>>>> A 1 2 3 X86-64 -0.9.9-1168 ch_gen2
>>>> B 2 4 16 X86_64 -0.9.9-1326 ch_gen2
>>>>
>>>> The MVAPICH code, which was obtained from ofed 1.2 installation,
>>>> has two patches as follows:
>>>> (1) for mpirun_rsh.c from Sayatan Sur of 10 Jul for MVAPICH errant
>>>> process/job cleanup.
>>>> (2) for comm_free.c from Amith Rajith Mamidla of 11 Jul for
>>>> MVAPICH
>>>> segmentation fault during MPI_Finalize() in large jobs.
>>>>
>>>> Is it possible that the mpirun_rsh.c patch is prematurely killing
>>>> tasks when it determines that the processes on the oversubscribed
>>>> node are not responding fast enough? Or is there another
>>>> clean explanation? As I understand the note from DK this morning,
>>>> oversubscription should work...
>>>> regards,
>>>>
>>>> amith rajith mamidala wrote:
>>>>> Hi Mark,
>>>>>
>>>>> Can you check if you get this error by setting the environment
>>>>> variable:
>>>>> VIADEV_USE_SHMEM_COLL to 0 e.g. mpirun_rsh -np N
>>>>> VIADEV_USE_SHMEM_COLL=0
>>>>> ./a.out
>>>>>
>>>>> -thanks,
>>>>> Amith
>>>>>
>>>>> On Tue, 28 Aug 2007, Mark Potts wrote:
>>>>>
>>>>>> Hi,
>>>>>> Is there an effective or hard limit on the number of MVAPICH
>>>>>> processes that can be run on a single node?
>>>>>>
>>>>>> Given N cpus, each having M cores, on a single node, I've been
>>>>>> told
>>>>>> that one can not run more than N*M MVAPICH processes on a single
>>>>>> node. In fact, I observe that if I try to even approach this
>>>>>> number
>>>>>> with "-np 16" (for a node with N=8 and M=4), I observe a
>>>>>> "unable to
>>>>>> find child nnnn!" or "Child died" message. Is this a
>>>>>> configuration
>>>>>> problem with this system or somehow an expected behavior?
>>>>>>
>>>>>> More pointedly, should oversubscription of cores, np > N*M, on a
>>>>>> single node work in MVAPICH? How about in MVAPICH2?
>>>>>>
>>>>>> regards,
>>>>>> --
>>>>>> ***********************************
>>>>>> >> Mark J. Potts, PhD
>>>>>> >>
>>>>>> >> HPC Applications Inc.
>>>>>> >> phone: 410-992-8360 Bus
>>>>>> >> 410-313-9318 Home
>>>>>> >> 443-418-4375 Cell
>>>>>> >> email: potts at hpcapplications.com
>>>>>> >> potts at excray.com
>>>>>> ***********************************
>>>>>> _______________________________________________
>>>>>> mvapich-discuss mailing list
>>>>>> mvapich-discuss at cse.ohio-state.edu
>>>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>>>
>>>> --
>>>> ***********************************
>>>> >> Mark J. Potts, PhD
>>>> >>
>>>> >> HPC Applications Inc.
>>>> >> phone: 410-992-8360 Bus
>>>> >> 410-313-9318 Home
>>>> >> 443-418-4375 Cell
>>>> >> email: potts at hpcapplications.com
>>>> >> potts at excray.com
>>>> ***********************************
>>>> _______________________________________________
>>>> mvapich-discuss mailing list
>>>> mvapich-discuss at cse.ohio-state.edu
>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>>
>>
>
>
--
***********************************
>> Mark J. Potts, PhD
>>
>> HPC Applications Inc.
>> phone: 410-992-8360 Bus
>> 410-313-9318 Home
>> 443-418-4375 Cell
>> email: potts at hpcapplications.com
>> potts at excray.com
***********************************
More information about the mvapich-discuss
mailing list