[mvapich-discuss] mpiexec.hydra hangs

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Jul 8 18:29:43 EDT 2013


Thanks for the reply.  My responses are inline.

On Mon, Jul 08, 2013 at 10:10:34PM +0000, Igor Podladtchikov wrote:
> Hi Jonathan,
> 
> thanks for your reply.
> 
> which mpiexec.hydra
> /usr/local/bin/mpiexec.hydra
> 
> which mpirun_rsh
> /usr/local/bin/mpirun_rsh

Okay, it looks like the mvapich2 installation is in /usr/local and
mpiexec.hydra likely comes from that installation.

> ldd /usr/local/bin/mpiexec.hydra
> <snip>

Actually can you provide the ldd output of the mpi program that you're
trying to run and not the launcher.

> On the working system, mpiexec.hydra gives the same output, but
> mpirun_rsh doesn't seem to have libstdc++.so.6 and libgcc_s.so.1..
> Also, if I do ldd mpirun_rsh as root, I don't get stdc++ and gcc_s
> either...
> 
> The other thing that I noticed is, if I tell my program not to create
> a context on the GPUs, mpiexec.hydra still hangs, but doesn't produce
> all those messages if I cancel it with Ctrl+C. I am not setting
> MV2_USE_CUDA, so it shouldn't matter?

Since you're not using MV2_USE_CUDA I do not think that this will
matter.

> And the last thing you may need to know is, for some reason unknown to
> me, root on the compute node can't modify files generated by Active
> Directory users.. so I copied the tarball from the shared location
> (owned by active directory user) to root's home on the compute node,
> where I did configure; make; make install as root. It hangs for both
> root and AD user though.. on the node where everything is OK, I first
> tried to configure and make as AD user, then failed to make install
> because I can't modify /usr/... as AD user. I also failed to make
> install as root, because I couldn't modify some files in the make dir.
> So I ended up copying everything to root'sh home on the compute node
> and doing everything from there.. would that be a problem, you think?

Something funky could have happened here.  This seems to be a gray area,
maybe you can try the process over again except setting the prefix to
something else like

    ./configure --prefix=/usr/local/mvapich2-1.9/ ...

Then you can check to see if you have the same problem with this
installation.

> I never seem to have quite understood how mpirun_rsh works.. it gives
> me "incorrect number of arguments" when I do mpirun_rsh -n 7 <prog +
> args>.

You can try the following link as a primer:
http://mvapich.cse.ohio-state.edu/support/mvapich2-1.9-quick-start.html

Have you tried the osu-micro-benchmarks?  See if you can run the
following command.

    mpirun_rsh -n 2 localhost localhost /usr/local/libexec/mvapich2/osu_latency

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list