[mvapich-discuss] mpirexec.hydra "envall" functionality broken across multiple nodes

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Mar 9 19:54:45 EDT 2015


Hi all, this was resolved offline but I wanted to post some information
here for everyone's info.

This is one of the differences in the design between mpirun_rsh and
mpiexec.hydra.  In mpirun_rsh we explicitly handle LD_LIBRARY_PATH found in
the path of mpirun_rsh and set the env of the remote mpispawn when it is
called.

Something like:
env LD_LIBRARY_PATH=$LD_LIBRARY_PATH /path/to/mpispawn

In mpiexec.hydra this does not happen.  They forward all of the detected
environment variables to hydra's version of mpispawn, hydra_pmi_proxy,
after hydra_pmi_proxy is already exec'd.

In Mehmet's case the LD_LIBRARY_PATH will be forwarded by hydra just like
the other variables but in a manner which is too late to allow
hydra_pmi_proxy to start properly.

The most straight forward way I can think of for Mehmet to do would be to
rebuild the mvapich2 1.9 and 2.0 but adding the rpath for
/usr/local/packages/intel/compiler/14.0.2/lib/intel64 to your LDFLAGS.
Something like the following:

./configure ...
LDFLAGS='-Wl,-rpath,/usr/local/packages/intel/compiler/14.0.2/lib/intel64'
...

This will allow hydra_pmi_proxy to execute properly.

On Thu, Mar 5, 2015 at 7:16 PM Mehmet Belgin <mehmet.belgin at oit.gatech.edu>
wrote:

> Hello all,
>
> We are using modules to setup env variables. However, when I run a MPI
> code across multiple nodes, the processes crash with errors complaining
> that they cannot find the dynamic libraries (i.e. bad LD_LIBRARY_PATH).
> E.g.:
>
> $ mpirun -np 64  hostname
>> ...
> /usr/local/packages/mvapich2/1.9rc1/intel-14.0.2/bin/hydra_pmi_proxy:
> error while loading shared libraries: libimf.so: cannot open shared object
> file: No such file or directory
>>
> libimf.so is provided by intel, and it is in the $LD_LIBRARY_PATH defined
> by the intel compiler module that is loaded on the node that MPI is
> launched from.
>
> I also tried with “-envall” (although I know it is the default) to no
> avail.
>
> Here’s the interesting part:  When I include the "module load" statements
> in .bashrc (or other rc files) it runs without a problem, since they are
> sourced by the MPI processes upon login. However isn’t mpirun (which points
> to mpiexec.hydra) supposed to forward all of my env settings on all nodes
> automatically? I confirmed that the lib path is correct:
>
> $ env | grep LD_LIBRARY_PATH
>
> LD_LIBRARY_PATH=/gpfs/packages/python/2.7.2/intel-14.0.2/lib:/usr/local/packages//mvapich2/1.9rc1/intel-14.0.2/lib:/usr/local/packages/intel/compiler/14.0.2/lib/intel64:/usr/local/packages/python/2.5.1/lib:/opt/torque/current/lib:/opt/oracle/current/lib:/usr/local/packages/hwloc/1.5/lib
> DYLD_LIBRARY_PATH=/usr/local/packages/intel/compiler/14.0.2/lib/intel64
> $ ls /usr/local/packages/intel/compiler/14.0.2/lib/intel64 | grep libimf.so
> libimf.so
>
> I will appreciate any help!
>
> Thanks,
>
> -Mehmet
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150309/b598f26a/attachment.html>


More information about the mvapich-discuss mailing list