[mvapich-discuss] Problem initializing IB device (gen2)
Sayantan Sur
surs at cse.ohio-state.edu
Thu Jan 25 00:57:11 EST 2007
Hi Roland,
> Anybody got an idea how to debug the error below?
>
Could you try this patch out to see if there is any additional error
message?
Thanks,
Sayantan.
Index: viapriv.c
===================================================================
--- viapriv.c (revision 879)
+++ viapriv.c (working copy)
@@ -624,6 +624,7 @@
if(ibv_post_recv(c->vi, &(v->desc.u.rr), &bad_wr)) {
+ perror("ibv_post_recv");
error_abort_all(IBV_RETURN_ERR,
"Error posting recv\n");
}
> Thanks,
>
> Roland
>
>
>>>>>> "Roland" == Roland Fehrenbacher <Roland.Fehrenbacher at transtec.de> writes:
>>>>>>
> Hi,
>
> meanwhile I got past the previous error. I noticed, that I had linked
> in the static version of libibverbs by mistake (strange that it makes
> a difference). When I used the dynamic version, the error changed to
>
> $ mpiexec -comm mpich-ib ./cpi
> [2] Abort: Error posting recv
> at line 628 in file viapriv.c
> [3] Abort: Error posting recv
> at line 628 in file viapriv.c
> [0] Abort: Error posting recv
> mpiexec: Warning: accept_abort_conn: MPI_Abort from IP 192.168.42.105, rank 2, killing all.
> at line 628 in file viapriv.c
> [1] Abort: Error posting recv
> at line 628 in file viapriv.c
> mpiexec: Warning: tasks 0-3 exited with status 253.
>
> Again, the same happens when using mpirun_rsh.
>
> Roland
>
> Roland> Hi, I have problems getting my IB adapters initialized
> Roland> when running an mvapich binary:
>
> Roland> $ mpiexec -comm mpich-ib ./cpi
> Roland> [0] Abort: Error getting HCA context
> Roland> at line 260 in file viainit.c
> Roland> [1] Abort: Error getting HCA context
> Roland> at line 260 in file viainit.c
> Roland> [2] Abort: Error getting HCA context
> Roland> at line 260 in file viainit.c
> Roland> [3] Abort: Error getting HCA context
> Roland> at line 260 in file viainit.c
> Roland> mpiexec: Warning: tasks 0-3 exited with status 255.
>
> Roland> The same happens when using mpirun_rsh.
>
> Roland> I'm using mvapich 0.9.8 compiled against OFED 1.1.
>
> Roland> Basic IB connectivity works as shown by a ping over the IB
> Roland> network between the two test nodes I use. I can also run
> Roland> ib_rdma_bw, ib_rdma_lat, etc. programs from the OFED
> Roland> release without any problem.
>
> Roland> Loaded IB modules are:
>
> Roland> beo-104:~# lsmod | grep ib_
> Roland> ib_ipoib 49944 0
> Roland> ib_sa 17292 1 ib_ipoib
> Roland> ib_uverbs 41520 0
> Roland> ib_umad 18480 4
> Roland> ib_mthca 120880 0
> Roland> ib_mad 39588 3 ib_sa,ib_umad,ib_mthca
> Roland> ib_core 56192 6 ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
> Roland> and the following devices exist.
>
> Roland> beo-104:~# ls -l /dev/infiniband/
> Roland> total 0
> Roland> crw-rw---- 1 root root 231, 64 Jan 22 14:11 issm0
> Roland> crw-rw---- 1 root root 231, 0 Jan 22 14:11 umad0
> Roland> crw-rw-rw- 1 root root 231, 192 Jan 22 14:11 uverbs0
>
> Roland> I have read section "7.2.2 Error getting HCA Context" from
> Roland> the Mvapich User Guide, but this didn't bring me any
> Roland> further.
>
> Roland> What is going wrong?
>
> Roland> Thanks,
>
> Roland> Roland
>
> Roland> _______________________________________________
> Roland> mvapich-discuss mailing list
> Roland> mvapich-discuss at cse.ohio-state.edu
> Roland> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
--
http://www.cse.ohio-state.edu/~surs
More information about the mvapich-discuss
mailing list