[mvapich-discuss] Re: [openfabrics-ewg] Announcing the release of MVAPICH2 0.9.8 with Checkpoint/Restart, iWARP, RDMA CM-based connection manageme

Shaun Rowland rowland at cse.ohio-state.edu
Thu Nov 16 19:52:05 EST 2006


david elsen wrote:
> Shaun,
> 
> On my system, the following is shown in /usr/local/lib :
> [root at ammasso1 lib]# ls -la

> -rwxr-xr-x  1 root root    837 Nov 16 10:20 librdmacm.la
> -rwxr-xr-x  1 root root  54472 Nov 16 10:20 librdmacm.so
> [root at ammasso1 lib]#
> 
> which is different from your system. It does not have the symlink. I am 
> not sure why is it so and how to correct it if it is not correct.

It did seem strange, which is why I thought this might be a shared
library installation issue, but it's not actually an error by itself. I
can easily create my own test shared library and just name it
libtest.so, and it works just fine. Your ldd and objdump output shows
that the mpdroot binary is expecting the exact name "librdmacm.so", so
it _should_ work. This is why I wanted to see that output in order to
try and determine if it seemed there was a shared library installation
issue. That does not seem to be the case here.

> Please see the output of ldd /usr/local/mvapich2/bin/mpdroot :
> [root at ammasso1 ~]# ldd /usr/local/mvapich2/bin/mpdroot
>         linux-gate.so.1 =>  (0xffffe000)
>         librdmacm.so => /usr/local/lib/librdmacm.so (0xb7fec000)
>         libibverbs.so.2 => /usr/local/lib/libibverbs.so.2 (0xb7fe5000)
>         libibumad.so.1 => /usr/local/lib/libibumad.so.1 (0xb7fdc000)
>         libpthread.so.0 => /lib/libpthread.so.0 (0x0012a000)
>         libc.so.6 => /lib/libc.so.6 (0x00ca7000)
>         libsysfs.so.2 => /usr/lib/libsysfs.so.2 (0x00369000)
>         libdl.so.2 => /lib/libdl.so.2 (0x00de6000)
>         libibcommon.so.1 => /usr/local/lib/libibcommon.so.1 (0xb7fcb000)
>         /lib/ld-linux.so.2 (0x002d8000)
> [root at ammasso1 ~]#
> [root at ammasso1 ~]#

If ldd can resolve the library, which it is here, then when running the
mpdroot program normally it should work. What happens if you run:

/usr/local/mvapich2/bin/mpdroot

directly? Do you get an error message from the program or do you get an
error about not being able to load librdmacm? Also, just to be sure,
this fails if trying to run on this machine after trying "ldd" and it
finding librdmacm.so?

If this is so, then it has to be some other system problem in resolving
the library I think, and in this case I am not sure what it could be. I
can't reproduce that sort of condition locally.

You could do "file /usr/local/lib/librdmacm.so" to make sure it's a 32
bit binary or "ldd /usr/local/lib/librdmacm.so" to make sure it can find
all of its shared library dependencies, but from what I am seeing so
far, those should show what is expected.

> See below the output of objdump -x /usr/local/mvapich2/bin/mpdroot :
> 
> [root at ammasso1 ~]# objdump -x /usr/local/mvapich2/bin/mpdroot

> Dynamic Section:
>   NEEDED      librdmacm.so
>   NEEDED      libibverbs.so.2
>   NEEDED      libibumad.so.1
>   NEEDED      libpthread.so.0
>   NEEDED      libc.so.6

This makes sense because you only have the librdmacm.so file, so it
should be finding it. The "ldd" output shows that it should be finding
it at runtime as well.

> It seems like  shared library installation issue. But if you can help me 
> to root cause the issue, I will appreciate it.
> 
> I am using Fedora 6 with 2.6.17.13 . See below:
> [root at ammasso1 ~]# uname -a
> Linux ammasso1.qlogic.org 2.6.17.13 #1 SMP Wed Nov 8 17:34:14 PST 2006 
> i686 i686 i386 GNU/Linux
> [root at ammasso1 ~]#
> 
> 
> I have sort of good news also. I tried to run the Mvapich2 on other 
> system with SLES10 installed, I did not see this issue there. I am still 
> trying to investigate it further.

On what system was librdmacm built on? In any case, it does not seem to
be a problem with the library itself because the error message says it
cannot open the shared object file because there is no such file or
directory. As long as your user account can read that file, it should
load as shown by "ldd". Just to be sure:

1. Run "ldd /usr/local/mvapich2/bin/mpdroot" again. If it shows
librdmacm.so as above, where it is found, then:

2. Run /usr/local/mvapich2/bin/mpdboot.

Do you get the same error? In this case, I just want to be sure that
your account can actually find the library with ldd and then fail to
execute /usr/local/mvapich2/bin/mpdboot in the same exact session.

If this happens, I am at a loss. It should work.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


More information about the mvapich-discuss mailing list