[mvapich-discuss] mpirexec.hydra "envall" functionality broken across multiple nodes

Mehmet Belgin mehmet.belgin at oit.gatech.edu
Thu Mar 5 19:16:08 EST 2015


Hello all,

We are using modules to setup env variables. However, when I run a MPI code across multiple nodes, the processes crash with errors complaining that they cannot find the dynamic libraries (i.e. bad LD_LIBRARY_PATH). E.g.:

$ mpirun -np 64  hostname
…
...
/usr/local/packages/mvapich2/1.9rc1/intel-14.0.2/bin/hydra_pmi_proxy: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
…

libimf.so is provided by intel, and it is in the $LD_LIBRARY_PATH defined by the intel compiler module that is loaded on the node that MPI is launched from. 

I also tried with “-envall” (although I know it is the default) to no avail. 

Here’s the interesting part:  When I include the "module load" statements in .bashrc (or other rc files) it runs without a problem, since they are sourced by the MPI processes upon login. However isn’t mpirun (which points to mpiexec.hydra) supposed to forward all of my env settings on all nodes automatically? I confirmed that the lib path is correct: 

$ env | grep LD_LIBRARY_PATH
LD_LIBRARY_PATH=/gpfs/packages/python/2.7.2/intel-14.0.2/lib:/usr/local/packages//mvapich2/1.9rc1/intel-14.0.2/lib:/usr/local/packages/intel/compiler/14.0.2/lib/intel64:/usr/local/packages/python/2.5.1/lib:/opt/torque/current/lib:/opt/oracle/current/lib:/usr/local/packages/hwloc/1.5/lib
DYLD_LIBRARY_PATH=/usr/local/packages/intel/compiler/14.0.2/lib/intel64
$ ls /usr/local/packages/intel/compiler/14.0.2/lib/intel64 | grep libimf.so
libimf.so

I will appreciate any help!

Thanks,

-Mehmet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150305/4a390747/attachment.html>


More information about the mvapich-discuss mailing list