[mvapich-discuss] Problem when running a calculation on multiple nodes : libibverbs.so.1 can not open ...

irfan bulu irfanbulu at gmail.com
Fri Feb 5 18:55:54 EST 2010


ThanksJonathan for your reply. The head node has both libibverbs.so and
libibverbs.so.1 and they point to libibverbs.so.1.0.0. But, I couldn't
locate libibverbs.so.1.0.0 on the other nodes (neither libibverbs.so nor
libibverbs.so.1.0). Can I just copy libibverbs.so.1.0.0 from the head node
to the other nodes and create appropriate links?

On Fri, Feb 5, 2010 at 6:28 PM, Jonathan Perkins <
perkinjo at cse.ohio-state.edu> wrote:

> On Fri, Feb 05, 2010 at 05:53:01PM -0500, irfan bulu wrote:
> > We have a small cluster. 14 nodes in total. Fedora 7 x64 is installed.
> The
> > system is equipped with infiniband and OFED 1.2 is installed. The system
> has
> > shared directory structure (home usr etc are shared). I build
> mvapich2-1.4
> > with the following settings:
> > ./configure --prefix=/usr/local/mvapich2 --enable-sharedlibs=gcc
> > --with-rdma=gen2 --with-ib-libpath=/usr/lib64
> > --with-ib-include=/usr/include/infiniband
>
> Since the infiniband libraries are installed in the system path you
> don't need to specify the --with-ib-* options.  You'd want to use
> --with-ib-include=/usr/include anyways.
>
> >
> > I have confirmed that libibverbs.so.1.0.0 is in /usr/lib64 and verbs.h is
> in
> > /usr/include/infiniband directory. After installing mvapich2 I first
> tried
> > running one of the examples on the master node. Everything runs fine. But
> > when I tried running the same example on multiple nodes, I am getting the
> > following error:
>
> Does libibverbs.so point to libibverbs.so.1 and libibverbs.so.1 point to
> libibverbs.so.1.0.0?  What about on the other nodes that you're using?
>
> >
> > $ /usr/local/mvapich2/bin/mpirun_rsh -ssh -np 2 master-ib node2-ib
> > /home/user1/Desktop/mvapich2-1.2-2009-10-28/examples/cpi
> >
> > /home/user1/Desktop/mvapich2-1.2-2009-10-28/examples/cpi: error while
> > loading shared libraries: libibverbs.so.1: cannot open shared object
> file:
> > No such file or directory
> > MPI process terminated unexpectedly
> > Exit code -5 signaled from node2-ib
> > cleanupKilling remote processes...id: cannot find name for group ID 523
>
> I'm not sure what is emitting the id message above.  Is there anything
> special going on with your user having different group membership on the
> compute nodes compared to your login node?  This probably doesn't matter
> for this issue though.
>
> >
> > I have tried creating a symbolic link in  /usrlocal/mvapich2/lib  and in
> > /usr/local/mvapich2/bin directories to  /usr/lib64/libibverbs.so.1.0.0.
> But,
> > I am still getting the same error. I tried running the same file with
> > Openmpi. It runs fine without any error. Any help is appreciated. Thanks
> in
> > advance.
>
> This link wouldn't have an effect.  libibverbs.so will need to be found
> as mentioned in an earlier inline comment.  Check to make sure this is
> correct on two nodes and let us know what the results are at this point.
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>



-- 
Irfan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20100205/beede753/attachment-0001.html


More information about the mvapich-discuss mailing list