[mvapich-discuss] Regarding mpiallexit and scalability

Vishwas vvasisht at locuz.com
Sat Oct 28 01:58:39 EDT 2006


Hi,

 

I am Vishwas. I am from Bangalore, India. I work in the area of Phase
Transition. I also do cluster Installation. 

I have two problems.

 

I am using mvapich2 for the inifiniband interconnect.
1. I have 128 core (32 node) machine on infiniband. I have used VAPI stack
of mvapich2. I have used the

    following command to run the mpd daemons on the nodes

    mpdboot --totalnum=32 --file=< mpd.hosts file with path >  --mpd=< path
to mpd on local machines >  --verbose  --ncpus=4 --ifhn=infinigj

    The problem I am facing  is, if I submit a job (simple farming kind of
job), using more than 32 nodes, job gets job. It will never end.

    Also, I see lots of mpd daemons start running in nodes, once a job is
submitted. 

2.  If I am correct mpdallexit causes all mpds in ring to exit and
mpdcleanup removes socket on local and remote machine 
But for me even after I do these, I see lots of mpd daemons running on
master (I do ps -ef | grep mpd to see this) 
How to clean this up (now I am using kill <pid>)

 

Thanks

Vishwas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20061028/95d7dca1/attachment.html


More information about the mvapich-discuss mailing list