[mvapich-discuss] mpiexec.hydra hangs

Igor Podladtchikov igor.podladtchikov at spectraseis.com
Mon Jul 8 18:56:37 EDT 2013


So I try the prefix thing as AD user? Or as root?

here's my program's ldd output..

ldd /spectraseis/share/applications/s6fd/current/s6fd
        linux-vdso.so.1 =>  (0x00007fffd3bff000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003a1c400000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003a1d000000)
        libs6fd.so => /spectraseis/share/applications/s6fd/current/libs6fd.so (0x00007f356c363000)
        libcudart.so.4 => /spectraseis/share/applications/s6fd/current/lib/libcudart.so.4 (0x00007f356c108000)
        libfdcoords.so => /spectraseis/share/applications/s6fd/current/lib/libfdcoords.so (0x00007f356bf03000)
        libfddomain.so => /spectraseis/share/applications/s6fd/current/lib/libfddomain.so (0x00007f356bcfb000)
        libfdfiles.so => /spectraseis/share/applications/s6fd/current/lib/libfdfiles.so (0x00007f356baf9000)
        libfdindex.so => /spectraseis/share/applications/s6fd/current/lib/libfdindex.so (0x00007f356b8f6000)
        libfdinjrec.so => /spectraseis/share/applications/s6fd/current/lib/libfdinjrec.so (0x00007f356b6f0000)
        libfdlogger.so => /spectraseis/share/applications/s6fd/current/lib/libfdlogger.so (0x00007f356b4ed000)
        libfdmat.so => /spectraseis/share/applications/s6fd/current/lib/libfdmat.so (0x00007f356b2e7000)
        libfdmp.so => /spectraseis/share/applications/s6fd/current/lib/libfdmp.so (0x00007f356b0e2000)
        libfdsolvers.so => /spectraseis/share/applications/s6fd/current/lib/libfdsolvers.so (0x00007f356aecf000)
        libfdsplit.so => /spectraseis/share/applications/s6fd/current/lib/libfdsplit.so (0x00007f356acc9000)
        libfdmon.so => /spectraseis/share/applications/s6fd/current/lib/libfdmon.so (0x00007f356aac1000)
        libfdui.so => /spectraseis/share/applications/s6fd/current/lib/libfdui.so (0x00007f356a8a4000)
        libfdtest.so => /spectraseis/share/applications/s6fd/current/lib/libfdtest.so (0x00007f356a69b000)
        libfdpick.so => /spectraseis/share/applications/s6fd/current/lib/libfdpick.so (0x00007f356a495000)
        libfdsolversgpu.so => /spectraseis/share/applications/s6fd/current/lib/libfdsolversgpu.so (0x00007f356a292000)
        libfdsolversgpu_cuda.so => /spectraseis/share/applications/s6fd/current/lib/libfdsolversgpu_cuda.so (0x00007f356a08b000)
        libfdel3dcpu.so => /spectraseis/share/applications/s6fd/current/lib/libfdel3dcpu.so (0x00007f3569e7c000)
        libfdel3dgpu.so => /spectraseis/share/applications/s6fd/current/lib/libfdel3dgpu.so (0x00007f3569c76000)
        libfdel3dgpu_cuda.so => /spectraseis/share/applications/s6fd/current/lib/libfdel3dgpu_cuda.so (0x00007f3569a1f000)
        libfdac3dcpu.so => /spectraseis/share/applications/s6fd/current/lib/libfdac3dcpu.so (0x00007f3569819000)
        libfdac3dgpu.so => /spectraseis/share/applications/s6fd/current/lib/libfdac3dgpu.so (0x00007f3569614000)
        libfdac3dgpu_cuda.so => /spectraseis/share/applications/s6fd/current/lib/libfdac3dgpu_cuda.so (0x00007f35693eb000)
        libspeclib.so => /spectraseis/share/applications/speclib/current/libspeclib.so (0x00007f35691ba000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003a1cc00000)
        librt.so.1 => /lib64/librt.so.1 (0x0000003a1d800000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003a1c800000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003a1c000000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003a23400000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003a20c00000)

sorry bout that

Igor Podladtchikov
Spectraseis
1899 Wynkoop St, Suite 350
Denver, CO 80202
Tel. +1 303 658 9172 (direct)
Tel. +1 303 330 8296 (cell)
www.spectraseis.com

________________________________________
From: Jonathan Perkins
Sent: Monday, July 08, 2013 4:29 PM
To: Igor Podladtchikov
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mpiexec.hydra hangs

Thanks for the reply.  My responses are inline.

On Mon, Jul 08, 2013 at 10:10:34PM +0000, Igor Podladtchikov wrote:
> Hi Jonathan,
>
> thanks for your reply.
>
> which mpiexec.hydra
> /usr/local/bin/mpiexec.hydra
>
> which mpirun_rsh
> /usr/local/bin/mpirun_rsh

Okay, it looks like the mvapich2 installation is in /usr/local and
mpiexec.hydra likely comes from that installation.

> ldd /usr/local/bin/mpiexec.hydra
> <snip>

Actually can you provide the ldd output of the mpi program that you're
trying to run and not the launcher.

> On the working system, mpiexec.hydra gives the same output, but
> mpirun_rsh doesn't seem to have libstdc++.so.6 and libgcc_s.so.1..
> Also, if I do ldd mpirun_rsh as root, I don't get stdc++ and gcc_s
> either...
>
> The other thing that I noticed is, if I tell my program not to create
> a context on the GPUs, mpiexec.hydra still hangs, but doesn't produce
> all those messages if I cancel it with Ctrl+C. I am not setting
> MV2_USE_CUDA, so it shouldn't matter?

Since you're not using MV2_USE_CUDA I do not think that this will
matter.

> And the last thing you may need to know is, for some reason unknown to
> me, root on the compute node can't modify files generated by Active
> Directory users.. so I copied the tarball from the shared location
> (owned by active directory user) to root's home on the compute node,
> where I did configure; make; make install as root. It hangs for both
> root and AD user though.. on the node where everything is OK, I first
> tried to configure and make as AD user, then failed to make install
> because I can't modify /usr/... as AD user. I also failed to make
> install as root, because I couldn't modify some files in the make dir.
> So I ended up copying everything to root'sh home on the compute node
> and doing everything from there.. would that be a problem, you think?

Something funky could have happened here.  This seems to be a gray area,
maybe you can try the process over again except setting the prefix to
something else like

    ./configure --prefix=/usr/local/mvapich2-1.9/ ...

Then you can check to see if you have the same problem with this
installation.

> I never seem to have quite understood how mpirun_rsh works.. it gives
> me "incorrect number of arguments" when I do mpirun_rsh -n 7 <prog +
> args>.

You can try the following link as a primer:
http://mvapich.cse.ohio-state.edu/support/mvapich2-1.9-quick-start.html

Have you tried the osu-micro-benchmarks?  See if you can run the
following command.

    mpirun_rsh -n 2 localhost localhost /usr/local/libexec/mvapich2/osu_latency

--
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo




More information about the mvapich-discuss mailing list