[mvapich-discuss] Problem when running a calculation on multiple nodes : libibverbs.so.1 can not open ...

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Feb 5 18:28:44 EST 2010


On Fri, Feb 05, 2010 at 05:53:01PM -0500, irfan bulu wrote:
> We have a small cluster. 14 nodes in total. Fedora 7 x64 is installed. The
> system is equipped with infiniband and OFED 1.2 is installed. The system has
> shared directory structure (home usr etc are shared). I build mvapich2-1.4
> with the following settings:
> ./configure --prefix=/usr/local/mvapich2 --enable-sharedlibs=gcc
> --with-rdma=gen2 --with-ib-libpath=/usr/lib64
> --with-ib-include=/usr/include/infiniband

Since the infiniband libraries are installed in the system path you
don't need to specify the --with-ib-* options.  You'd want to use
--with-ib-include=/usr/include anyways.

> 
> I have confirmed that libibverbs.so.1.0.0 is in /usr/lib64 and verbs.h is in
> /usr/include/infiniband directory. After installing mvapich2 I first tried
> running one of the examples on the master node. Everything runs fine. But
> when I tried running the same example on multiple nodes, I am getting the
> following error:

Does libibverbs.so point to libibverbs.so.1 and libibverbs.so.1 point to
libibverbs.so.1.0.0?  What about on the other nodes that you're using?

> 
> $ /usr/local/mvapich2/bin/mpirun_rsh -ssh -np 2 master-ib node2-ib
> /home/user1/Desktop/mvapich2-1.2-2009-10-28/examples/cpi
> 
> /home/user1/Desktop/mvapich2-1.2-2009-10-28/examples/cpi: error while
> loading shared libraries: libibverbs.so.1: cannot open shared object file:
> No such file or directory
> MPI process terminated unexpectedly
> Exit code -5 signaled from node2-ib
> cleanupKilling remote processes...id: cannot find name for group ID 523

I'm not sure what is emitting the id message above.  Is there anything
special going on with your user having different group membership on the
compute nodes compared to your login node?  This probably doesn't matter
for this issue though.

> 
> I have tried creating a symbolic link in  /usrlocal/mvapich2/lib  and in
> /usr/local/mvapich2/bin directories to  /usr/lib64/libibverbs.so.1.0.0. But,
> I am still getting the same error. I tried running the same file with
> Openmpi. It runs fine without any error. Any help is appreciated. Thanks in
> advance.

This link wouldn't have an effect.  libibverbs.so will need to be found
as mentioned in an earlier inline comment.  Check to make sure this is
correct on two nodes and let us know what the results are at this point.

-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20100205/0c3b4596/attachment.bin


More information about the mvapich-discuss mailing list