[mvapich-discuss] Problems with running mpi applications

Abhinav Vishnu vishnu at cse.ohio-state.edu
Wed Jan 10 15:44:18 EST 2007


Hi Eric,

Thanks for trying MVAPICH and reporting the problem to us.

* On Jan,1 Eric Zhang<maillistbox at 126.com> wrote :
> Hi, mvapich-discuss:
> 
> I am a newbie of MVAPICH. I have a cluster installed CentOS 4.4 -- which
> embedded the OFED packages(under /usr/ofed directory). All infiniband
> drivers and libraries have been installed and I have configured IPoIB,
> it also works well.

Thanks for this information. Can you please let us know, if you tried
any tests for verifying the functionality of IPoIB. Please also provide
the following information, is possible:

1. Version of OFED
2. MVAPICH version (tarball /svn)
3. Is it possible that your machines also have a previous installation
(possibly old) of InfiniBand libraries? In that case, the different
library installations may be picked up during the run-time, and some
of the symbols (like ib_str_err) may not be present. Please let us know
if this is case.

I think the best way to debug would be to make sure that the verbs level
tests are working correctly on the cluster. Since your installation is
on /usr/ofed, you should have some compiled verbs-level tests in the
/usr/ofed/bin. May i request you to do the following and provide us
with the results of your experimentation.

For every pair of nodes you are using (say node0 and node1), 
please execute the following command on first node0

%/usr/ofed/bin/ib_rdma_bw

and execute the following on node1:

%/usr/ofed/bin/ib_rdma_bw node0

Please let us know if you see the same errors as you are seeing with the
MPI level tests. 

PS: The replies to your questions are inline in the later part of the
e-mail.

> 
> Now I am trying to install MVAPICH so that I can run my MPI applications
> over Infiniband. I modified make.mvapich.gen2 script(set the IBHOME to
> /usr/ofed, and set the IBHOMELIB to /usr/ofed/lib64, this directory
> contains libibverbs.so, libibcommon.so....., etc.), the installation was
> successful (MVAPICH recognized my HCA adapter -- Mellonox PCI-Express
> SDR, and it seems that there were no errors during configure, make and
> install).
> 
> Then I wrote a simple mpihello.c program to verify the installation.
> This program just printf "helloworld" in every process. I used mpicc to
> compile it and when I run it, the problem occurs:
> 
> [eric at cfx1 testcodes]$ /usr/local/mvapich/bin/mpirun -np 4 -hostfile
> hostfile2 mpihello
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libopensm.so:
> /usr/ofed/lib64/infiniband/libopensm.so: undefined symbol: ib_error_str
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libopensm.so:
> /usr/ofed/lib64/infiniband/libopensm.so: undefined symbol: ib_error_str
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libopensm.so:
> /usr/ofed/lib64/infiniband/libopensm.so: undefined symbol: ib_error_str
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libopensm.so:
> /usr/ofed/lib64/infiniband/libopensm.so: undefined symbol: ib_error_str
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libosmcomp-1.2.1.so:
> /usr/ofed/lib64/infiniband/libosmcomp-1.2.1.so: undefined symbol: osm_log
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libosmcomp.so:
> /usr/ofed/lib64/infiniband/libosmcomp.so: undefined symbol: osm_log
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libosmvendor-1.2.1.so:
> /usr/ofed/lib64/infiniband/libosmvendor-1.2.1.so: undefined symbol:
> ib_error_str
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libosmvendor.so:
> /usr/ofed/lib64/infiniband/libosmvendor.so: undefined symbol: ib_error_str
> libibverbs: Warning: couldn't load driver
> /usr/ofed/lib64/infiniband/libosmvendor_openib.so:
> /usr/ofed/lib64/infiniband/libosmvendor_openib.so: undefined symbol:
> ib_error_str
> mpirun: executable version 1 does not match our version 3.
> done.
> 
> I have two questions here:
> 
> 1. Why the libibverbs wants to find the libraries in
> /usr/ofed/lib64/infiniband directory? 

I think this directory may be present in your LD_LIBRARY_PATH. 

>The libraries are under
> /usr/ofed/lib64 directory but I still copied all the libraries files
> into the /usr/ofed/lib64/infiniband, whereas the problems still exist.
> 
> 2. What does the error messages list above mean? How to solve it? I have
> also tried the command: /usr/local/mvapich/bin/mpirun_rsh -np 4
> -hostfile ./hostfile2 ./mpihello , this also cannot be executed, the
> error message is the same.
> 
> Thanks. Any suggestions are greatly appreciated.

I think the best way would be to make sure that previous installations
of InfiniBand are not present on any of the systems, which you are
trying to use (Unless you need them for some reason). Also, please make
sure that LD_LIBRARY_PATH and PATH are pointing to your current location
of InfiniBand installation. Then do a fresh re-install of the OFED
(version 1.1 is the latest as of now) and please let us know the status.

Thanks,

:- Abhinav
> 
> Eric Zhang
> 2007-01-10
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


More information about the mvapich-discuss mailing list