[mvapich-discuss] Problem initializing IB device (gen2)
Roland Fehrenbacher
Roland.Fehrenbacher at transtec.de
Fri Jan 26 04:28:38 EST 2007
>>>>> "Sayantan" == Sayantan Sur <surs at cse.ohio-state.edu> writes:
Hi Sayantan,
>> Anybody got an idea how to debug the error below?
>>
Sayantan> Could you try this patch out to see if there is any
Sayantan> additional error message?
that actually just showed the message "success" ;-) After digging in
it deeper, and putting debug statements in the IB libraries, it
finally turned out, that there was an old installation of OFED under
/usr/local, from which the headers were taken during compile of
mvapich. So there was a mismatch of compile time headers and run-time
libraries which obviously was bound to lead to erratic behaviour. Now
everything is working fine. Thanks for your help.
Roland
Sayantan> Thanks, Sayantan.
Sayantan> Index: viapriv.c
Sayantan> ===================================================================
Sayantan> --- viapriv.c (revision 879) +++ viapriv.c (working
Sayantan> copy) @@ -624,6 +624,7 @@
Sayantan> if(ibv_post_recv(c->vi, &(v->desc.u.rr), &bad_wr))
Sayantan> { + perror("ibv_post_recv");
Sayantan> error_abort_all(IBV_RETURN_ERR, "Error posting recv\n");
Sayantan> }
>> Thanks,
>>
>> Roland
>>
>>
>>>>>>> "Roland" == Roland Fehrenbacher
>>>>>>> <Roland.Fehrenbacher at transtec.de> writes:
>>>>>>>
>> Hi,
>>
>> meanwhile I got past the previous error. I noticed, that I had
>> linked in the static version of libibverbs by mistake (strange
>> that it makes a difference). When I used the dynamic version,
>> the error changed to
>>
>> $ mpiexec -comm mpich-ib ./cpi [2] Abort: Error posting recv at
>> line 628 in file viapriv.c [3] Abort: Error posting recv at
>> line 628 in file viapriv.c [0] Abort: Error posting recv
>> mpiexec: Warning: accept_abort_conn: MPI_Abort from IP
>> 192.168.42.105, rank 2, killing all. at line 628 in file
>> viapriv.c [1] Abort: Error posting recv at line 628 in file
>> viapriv.c mpiexec: Warning: tasks 0-3 exited with status 253.
>>
>> Again, the same happens when using mpirun_rsh.
>>
>> Roland
>>
Roland> Hi, I have problems getting my IB adapters initialized
Roland> when running an mvapich binary:
>>
Roland> $ mpiexec -comm mpich-ib ./cpi [0] Abort: Error getting
Roland> HCA context at line 260 in file viainit.c [1] Abort: Error
Roland> getting HCA context at line 260 in file viainit.c [2]
Roland> Abort: Error getting HCA context at line 260 in file
Roland> viainit.c [3] Abort: Error getting HCA context at line 260
Roland> in file viainit.c mpiexec: Warning: tasks 0-3 exited with
Roland> status 255.
>>
Roland> The same happens when using mpirun_rsh.
>>
Roland> I'm using mvapich 0.9.8 compiled against OFED 1.1.
>>
Roland> Basic IB connectivity works as shown by a ping over the IB
Roland> network between the two test nodes I use. I can also run
Roland> ib_rdma_bw, ib_rdma_lat, etc. programs from the OFED
Roland> release without any problem.
>>
Roland> Loaded IB modules are:
>>
Roland> beo-104:~# lsmod | grep ib_ ib_ipoib 49944 0 ib_sa 17292 1
Roland> ib_ipoib ib_uverbs 41520 0 ib_umad 18480 4 ib_mthca 120880
Roland> 0 ib_mad 39588 3 ib_sa,ib_umad,ib_mthca ib_core 56192 6
Roland> ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad and the
Roland> following devices exist.
>>
Roland> beo-104:~# ls -l /dev/infiniband/ total 0 crw-rw---- 1
Roland> root root 231, 64 Jan 22 14:11 issm0 crw-rw---- 1 root
Roland> root 231, 0 Jan 22 14:11 umad0 crw-rw-rw- 1 root root 231,
Roland> 192 Jan 22 14:11 uverbs0
>>
Roland> I have read section "7.2.2 Error getting HCA Context" from
Roland> the Mvapich User Guide, but this didn't bring me any
Roland> further.
>>
Roland> What is going wrong?
>>
Roland> Thanks,
>>
Roland> Roland
>>
Roland> _______________________________________________
Roland> mvapich-discuss mailing list
Roland> mvapich-discuss at cse.ohio-state.edu
Roland> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>> _______________________________________________ mvapich-discuss
>> mailing list mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
Sayantan> -- http://www.cse.ohio-state.edu/~surs
Sayantan> _______________________________________________
Sayantan> mvapich-discuss mailing list
Sayantan> mvapich-discuss at cse.ohio-state.edu
Sayantan> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list