[mvapich-discuss] Problems with running mpi applications

Eric Zhang maillistbox at 126.com
Wed Jan 10 20:35:55 EST 2007


Hi, Abhinav:

    Thank you for your reply. I am very glad to provide some information 
to you.

    1. About verifying the functionality of IPoIB. First, I use "ping" 
to test the IPoIB, ping works well. And I also run a MPI program over 
IPoIB, the program also works well.

    2. The version of OFED. Since CentOS 4.4(RedHat Enterprise Linux AS 
4 update 4), the ofed package has been embedded into the os 
distribution. The version of this OFED is 1.0.

    3. The version of MVAPICH -- 0.9.8, the latest stable version 
downloaded from ohio-state.edu.

    4. I am sure that there is no any other Infiniband libraries 
installed because I haven't installed any of them. I just installed the 
OS. I also tried the command: "locate libib*", the files which has been 
found are all under /usr/ofed/lib64 directory.

    5. The experimentation result (ib_rdma_bw). I will do this test this 
weekend because the cluster is not in my working place now. :(

    6. About my first question. -- "1. Why the libibverbs wants to find 
the libraries in /usr/ofed/lib64/infiniband directory?"  ,  I haven't 
set the "/usr/ofed/lib64/infiniband" to my LD_LIBRARY_PATH and I also 
haven't configured the "ld.so.conf" stuff.  Actually, this is the most 
strange thing I think here.

    7. About my second question. As you said, I will try to set 
LD_LIBRARY_PATH and PATH to the directory: /usr/ofed/lib64 and 
/usr/ofed/lib64/infiniband.  I will let you know until I have results.

    Thanks, Abhinav. Your suggestions are very helpful to me. I think 
maybe there is something wrong with the CentOS 4.4's OFED. I will try to 
install OFED1.1 manually as soon as possible.

Eric Zhang
2007-01-11

> Hi Eric,
>
> Thanks for trying MVAPICH and reporting the problem to us.
>
> * On Jan,1 Eric Zhang<maillistbox at 126.com> wrote :
>   
>> Hi, mvapich-discuss:
>>
>> I am a newbie of MVAPICH. I have a cluster installed CentOS 4.4 -- which
>> embedded the OFED packages(under /usr/ofed directory). All infiniband
>> drivers and libraries have been installed and I have configured IPoIB,
>> it also works well.
>>     
>
> Thanks for this information. Can you please let us know, if you tried
> any tests for verifying the functionality of IPoIB. Please also provide
> the following information, is possible:
>
> 1. Version of OFED
> 2. MVAPICH version (tarball /svn)
> 3. Is it possible that your machines also have a previous installation
> (possibly old) of InfiniBand libraries? In that case, the different
> library installations may be picked up during the run-time, and some
> of the symbols (like ib_str_err) may not be present. Please let us know
> if this is case.
>
> I think the best way to debug would be to make sure that the verbs level
> tests are working correctly on the cluster. Since your installation is
> on /usr/ofed, you should have some compiled verbs-level tests in the
> /usr/ofed/bin. May i request you to do the following and provide us
> with the results of your experimentation.
>
> For every pair of nodes you are using (say node0 and node1), 
> please execute the following command on first node0
>
> %/usr/ofed/bin/ib_rdma_bw
>
> and execute the following on node1:
>
> %/usr/ofed/bin/ib_rdma_bw node0
>
> Please let us know if you see the same errors as you are seeing with the
> MPI level tests. 
>
> PS: The replies to your questions are inline in the later part of the
> e-mail.
>
>   
>> Now I am trying to install MVAPICH so that I can run my MPI applications
>> over Infiniband. I modified make.mvapich.gen2 script(set the IBHOME to
>> /usr/ofed, and set the IBHOMELIB to /usr/ofed/lib64, this directory
>> contains libibverbs.so, libibcommon.so....., etc.), the installation was
>> successful (MVAPICH recognized my HCA adapter -- Mellonox PCI-Express
>> SDR, and it seems that there were no errors during configure, make and
>> install).
>>
>> Then I wrote a simple mpihello.c program to verify the installation.
>> This program just printf "helloworld" in every process. I used mpicc to
>> compile it and when I run it, the problem occurs:
>>
>> [eric at cfx1 testcodes]$ /usr/local/mvapich/bin/mpirun -np 4 -hostfile
>> hostfile2 mpihello
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libopensm.so:
>> /usr/ofed/lib64/infiniband/libopensm.so: undefined symbol: ib_error_str
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libopensm.so:
>> /usr/ofed/lib64/infiniband/libopensm.so: undefined symbol: ib_error_str
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libopensm.so:
>> /usr/ofed/lib64/infiniband/libopensm.so: undefined symbol: ib_error_str
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libopensm.so:
>> /usr/ofed/lib64/infiniband/libopensm.so: undefined symbol: ib_error_str
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libosmcomp-1.2.1.so:
>> /usr/ofed/lib64/infiniband/libosmcomp-1.2.1.so: undefined symbol: osm_log
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libosmcomp.so:
>> /usr/ofed/lib64/infiniband/libosmcomp.so: undefined symbol: osm_log
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libosmvendor-1.2.1.so:
>> /usr/ofed/lib64/infiniband/libosmvendor-1.2.1.so: undefined symbol:
>> ib_error_str
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libosmvendor.so:
>> /usr/ofed/lib64/infiniband/libosmvendor.so: undefined symbol: ib_error_str
>> libibverbs: Warning: couldn't load driver
>> /usr/ofed/lib64/infiniband/libosmvendor_openib.so:
>> /usr/ofed/lib64/infiniband/libosmvendor_openib.so: undefined symbol:
>> ib_error_str
>> mpirun: executable version 1 does not match our version 3.
>> done.
>>
>> I have two questions here:
>>
>> 1. Why the libibverbs wants to find the libraries in
>> /usr/ofed/lib64/infiniband directory? 
>>     
>
> I think this directory may be present in your LD_LIBRARY_PATH. 
>
>   
>> The libraries are under
>> /usr/ofed/lib64 directory but I still copied all the libraries files
>> into the /usr/ofed/lib64/infiniband, whereas the problems still exist.
>>
>> 2. What does the error messages list above mean? How to solve it? I have
>> also tried the command: /usr/local/mvapich/bin/mpirun_rsh -np 4
>> -hostfile ./hostfile2 ./mpihello , this also cannot be executed, the
>> error message is the same.
>>
>> Thanks. Any suggestions are greatly appreciated.
>>     
>
> I think the best way would be to make sure that previous installations
> of InfiniBand are not present on any of the systems, which you are
> trying to use (Unless you need them for some reason). Also, please make
> sure that LD_LIBRARY_PATH and PATH are pointing to your current location
> of InfiniBand installation. Then do a fresh re-install of the OFED
> (version 1.1 is the latest as of now) and please let us know the status.
>
> Thanks,
>
> :- Abhinav
>   
>> Eric Zhang
>> 2007-01-10
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>     
>
>
>   




More information about the mvapich-discuss mailing list