[mvapich-discuss] mpiexec.hydra hangs
Jonathan Perkins
perkinjo at cse.ohio-state.edu
Mon Jul 8 18:29:43 EDT 2013
Thanks for the reply. My responses are inline.
On Mon, Jul 08, 2013 at 10:10:34PM +0000, Igor Podladtchikov wrote:
> Hi Jonathan,
>
> thanks for your reply.
>
> which mpiexec.hydra
> /usr/local/bin/mpiexec.hydra
>
> which mpirun_rsh
> /usr/local/bin/mpirun_rsh
Okay, it looks like the mvapich2 installation is in /usr/local and
mpiexec.hydra likely comes from that installation.
> ldd /usr/local/bin/mpiexec.hydra
> <snip>
Actually can you provide the ldd output of the mpi program that you're
trying to run and not the launcher.
> On the working system, mpiexec.hydra gives the same output, but
> mpirun_rsh doesn't seem to have libstdc++.so.6 and libgcc_s.so.1..
> Also, if I do ldd mpirun_rsh as root, I don't get stdc++ and gcc_s
> either...
>
> The other thing that I noticed is, if I tell my program not to create
> a context on the GPUs, mpiexec.hydra still hangs, but doesn't produce
> all those messages if I cancel it with Ctrl+C. I am not setting
> MV2_USE_CUDA, so it shouldn't matter?
Since you're not using MV2_USE_CUDA I do not think that this will
matter.
> And the last thing you may need to know is, for some reason unknown to
> me, root on the compute node can't modify files generated by Active
> Directory users.. so I copied the tarball from the shared location
> (owned by active directory user) to root's home on the compute node,
> where I did configure; make; make install as root. It hangs for both
> root and AD user though.. on the node where everything is OK, I first
> tried to configure and make as AD user, then failed to make install
> because I can't modify /usr/... as AD user. I also failed to make
> install as root, because I couldn't modify some files in the make dir.
> So I ended up copying everything to root'sh home on the compute node
> and doing everything from there.. would that be a problem, you think?
Something funky could have happened here. This seems to be a gray area,
maybe you can try the process over again except setting the prefix to
something else like
./configure --prefix=/usr/local/mvapich2-1.9/ ...
Then you can check to see if you have the same problem with this
installation.
> I never seem to have quite understood how mpirun_rsh works.. it gives
> me "incorrect number of arguments" when I do mpirun_rsh -n 7 <prog +
> args>.
You can try the following link as a primer:
http://mvapich.cse.ohio-state.edu/support/mvapich2-1.9-quick-start.html
Have you tried the osu-micro-benchmarks? See if you can run the
following command.
mpirun_rsh -n 2 localhost localhost /usr/local/libexec/mvapich2/osu_latency
--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
More information about the mvapich-discuss
mailing list