[mvapich-discuss] Problem when running a calculation on multiple nodes : libibverbs.so.1 can not open ...

irfan bulu irfanbulu at gmail.com
Fri Feb 5 17:53:01 EST 2010


We have a small cluster. 14 nodes in total. Fedora 7 x64 is installed. The
system is equipped with infiniband and OFED 1.2 is installed. The system has
shared directory structure (home usr etc are shared). I build mvapich2-1.4
with the following settings:
./configure --prefix=/usr/local/mvapich2 --enable-sharedlibs=gcc
--with-rdma=gen2 --with-ib-libpath=/usr/lib64
--with-ib-include=/usr/include/infiniband

I have confirmed that libibverbs.so.1.0.0 is in /usr/lib64 and verbs.h is in
/usr/include/infiniband directory. After installing mvapich2 I first tried
running one of the examples on the master node. Everything runs fine. But
when I tried running the same example on multiple nodes, I am getting the
following error:

$ /usr/local/mvapich2/bin/mpirun_rsh -ssh -np 2 master-ib node2-ib
/home/user1/Desktop/mvapich2-1.2-2009-10-28/examples/cpi

/home/user1/Desktop/mvapich2-1.2-2009-10-28/examples/cpi: error while
loading shared libraries: libibverbs.so.1: cannot open shared object file:
No such file or directory
MPI process terminated unexpectedly
Exit code -5 signaled from node2-ib
cleanupKilling remote processes...id: cannot find name for group ID 523

I have tried creating a symbolic link in  /usrlocal/mvapich2/lib  and in
/usr/local/mvapich2/bin directories to  /usr/lib64/libibverbs.so.1.0.0. But,
I am still getting the same error. I tried running the same file with
Openmpi. It runs fine without any error. Any help is appreciated. Thanks in
advance.




-- 
Irfan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20100205/84711ec4/attachment.html


More information about the mvapich-discuss mailing list