[mvapich-discuss] scaling problem and stray mpd daemon

Vishwas vvasisht at locuz.com
Sat Oct 28 06:15:11 EDT 2006


Hi,

I was not clear.

a. I have 128 core (32 node) machine on infiniband. I have used VAPI stack
of mvapich2. I have used the

    following command to run the mpd daemons on the nodes

    mpdboot --totalnum=32 --file=< mpd.hosts file with path >  --mpd=< path
to mpd on local machines >  --verbose  --ncpus=4 --ifhn=infinigj

    The problem I am facing  is, if I submit a job (simple farming kind of
job), using -np to be greater than 32, job gets stuck (less than 32 it will
run). It will never end.

    Also, I see lots of mpd daemons start running in nodes, once a job is
submitted. 

 

Vishwas

 

  _____  

From: mvapich-discuss-bounces at cse.ohio-state.edu
[mailto:mvapich-discuss-bounces at cse.ohio-state.edu] On Behalf Of Vishwas
Sent: Saturday, October 28, 2006 2:08 PM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] scaling problem and stray mpd daemon

 

Hello,

 

I am using mvapich2 for the inifiniband interconnect.
a. I have 128 core (32 node) machine on infiniband. I have used VAPI stack
of mvapich2. I have used the

    following command to run the mpd daemons on the nodes

    mpdboot --totalnum=32 --file=< mpd.hosts file with path >  --mpd=< path
to mpd on local machines >  --verbose  --ncpus=4 --ifhn=infinigj

    The problem I am facing  is, if I submit a job (simple farming kind of
job), using totalnum >= 32, job gets stuck (less than 32 it will run). It
will never end.

    Also, I see lots of mpd daemons start running in nodes, once a job is
submitted. 

b.  If I am correct mpdallexit causes all mpds in ring to exit and
mpdcleanup removes socket on local and remote machine 
But for me even after I do these, I see lots of mpd daemons running on
master (I do ps -ef | grep mpd to see this) 
How to clean this up (now I am using kill <pid>)

 

Vishwas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20061028/dc9d9b92/attachment.html


More information about the mvapich-discuss mailing list