[mvapich-discuss] Running latency tests (fwd)

wei huang huanwei at cse.ohio-state.edu
Thu Apr 10 13:49:50 EDT 2008


Hi Chris,

You have to make sure related kernel modules are loaded (including
rdma_ucm, ib_uverbs, ib_mthca, etc). Thanks.

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501


On Thu, 10 Apr 2008, Christopher Tanner wrote:

> Ok Wei -
>
> Even though I've copied the libib* libraries from the master node to
> all of the other nodes and included the /usr/local/lib directory in
> the LD_LIBRARY_PATH, it seems that osu_latency still cannot find
> libibverbs.so.1. I'm kind of stuck... Any thoughts?
>
> Also, whenever I try to execute osu_latency using just one core on the
> master node (mpiexec -n 1 ./osu_latency), I receive the following error:
>
> libibverbs: Fatal: couldn't read uverbs ABI version.
> Fatal error in MPI_Init:
> Other MPI error, error stack:
> MPIR_Init_thread(259)...........: Initialization failed
> MPID_Init(102)..................: channel initialization failed
> MPIDI_CH3_Init(178).............:
> MPIDI_CH3I_RMDA_init(115).......: rdma_get_control_parameters
> rdma_get_control_parameters(432):
> rdma_open_hca(367)..............: No IB device found
> rank 0 in job 15  master.cl.ae.gatech.edu_42042   caused collective
> abort of all ranks
>    exit status of rank 0: return code 1
>
> Does this output help solve the other problem?
>
> -------------------------------------------
> Chris Tanner
> Space Systems Design Lab
> Georgia Institute of Technology
> christopher.tanner at gatech.edu
> -------------------------------------------
>
>
>
> On Apr 10, 2008, at 11:53 AM, wei huang wrote:
> >
> > Do you see the same error?
> >
> > Try:
> > export LD_LIBRARY_PATH=/usr/local/lib64:$LD_LIBRARY_PATH
> >
> > Regards,
> > Wei Huang
> >
> > 774 Dreese Lab, 2015 Neil Ave,
> > Dept. of Computer Science and Engineering
> > Ohio State University
> > OH 43210
> > Tel: (614)292-8501
> >
> >
> > On Thu, 10 Apr 2008, Christopher Tanner wrote:
> >
> >> Thanks Wei. Of course, the problem isn't solved yet...
> >>
> >> So I found the file in the /usr/local/lib64 directory on the master
> >> node only. I copied the file to the rest of the nodes to the /usr/
> >> local/lib64 directory and included the directory in my path. When I
> >> tried to execute the osu_latency program, it gave me the same
> >> error. A
> >> 'which librdmacm.so.1' command reveals that it can indeed find the
> >> library.
> >>
> >> Any clues? Or perhaps, any other ways to determine if the Infiniband
> >> is working?
> >>
> >> -------------------------------------------
> >> Chris Tanner
> >> Space Systems Design Lab
> >> Georgia Institute of Technology
> >> christopher.tanner at gatech.edu
> >> -------------------------------------------
> >>
> >>
> >>
> >> On Apr 10, 2008, at 11:18 AM, wei huang wrote:
> >>> Hi Chris,
> >>>
> >>> It seems that some ib libraries are not in your default path. You
> >>> may need
> >>> to explicitly export the path to ib library in your environmental
> >>> variables (bash profile or similar places). To find where those
> >>> libraries
> >>> are, you may try to see /etc/infiniband/info file. Or you can ask
> >>> your
> >>> system administrator about the path.
> >>>
> >>> Thanks.
> >>>
> >>> Regards,
> >>> Wei Huang
> >>>
> >>> 774 Dreese Lab, 2015 Neil Ave,
> >>> Dept. of Computer Science and Engineering
> >>> Ohio State University
> >>> OH 43210
> >>> Tel: (614)292-8501
> >>>
> >>>
> >>> On Thu, 10 Apr 2008, Dhabaleswar Panda wrote:
> >>>
> >>>> ---------- Forwarded message ----------
> >>>> Date: Wed, 9 Apr 2008 20:01:00 -0400
> >>>> From: Christopher Tanner <christopher.tanner at gatech.edu>
> >>>> To: mvapich-discuss at cse.ohio-state.edu
> >>>> Subject: [mvapich-discuss] Running latency tests
> >>>>
> >>>> All -
> >>>>
> >>>> I believe I am gravy with the mvapich2 install so now I'm trying to
> >>>> run the latency tests to see if it's really working. But, I'm a
> >>>> dummy
> >>>> and can't get it to work. Here's what I've done so far:
> >>>>
> >>>> a) Initiated a mpd ring with 16 hosts (i.e. mpdboot --rsh=rsh -n 16
> >>>> -1). I have multiple processors, each with multiple cores on each
> >>>> node, thus the '-1'.
> >>>> b) Compiled osu_latency.c using mpicc (to an executable called
> >>>> osu_latency)
> >>>> b) Tried to execute the compile file via 'mpiexec -machinefile
> >>>> machine.list -n 16 ./osu_latency'
> >>>>
> >>>> I receive the following error (16 times naturally) ::
> >>>> ./osu_latency: error while loading shared libraries: librdmacm.so.
> >>>> 1:
> >>>> cannot open shared object file: No such file or directory
> >>>>
> >>>> I don't know where this file would be -- it's not in the /usr/lib
> >>>> with
> >>>> all of the other *.so.* files.
> >>>> Any thoughts? Thanks.
> >>>>
> >>>> -------------------------------------------
> >>>> Chris Tanner
> >>>> Space Systems Design Lab
> >>>> Georgia Institute of Technology
> >>>> christopher.tanner at gatech.edu
> >>>> -------------------------------------------
> >>>>
> >>>>
> >>>>
> >>>> On Apr 9, 2008, at 2:17 PM, Matthew Koop wrote:
> >>>>> Hi Fred,
> >>>>>
> >>>>> If InfiniBand is not working then the job will not run. There is
> >>>>> currently
> >>>>> no method by which it will fall back to TCP/IP.
> >>>>>
> >>>>> Does this answer your question?
> >>>>>
> >>>>> Matt
> >>>>>
> >>>>> On Wed, 9 Apr 2008, Stecher, Fred wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>> When I installed MVAPICH, I used the default. If Infiniband is
> >>>>>> not
> >>>>>> working will my executable still run?
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Fred
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> mvapich-discuss mailing list
> >>>>> mvapich-discuss at cse.ohio-state.edu
> >>>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>>
> >>>> _______________________________________________
> >>>> mvapich-discuss mailing list
> >>>> mvapich-discuss at cse.ohio-state.edu
> >>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>>>
> >>>
> >>>
> >>
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list