[mvapich-discuss] Problem initializing IB device (gen2)
Q-Leap Support
support at q-leap.de
Wed Jan 24 17:44:07 EST 2007
Anybody got an idea how to debug the error below?
Thanks,
Roland
>>>>> "Roland" == Roland Fehrenbacher <Roland.Fehrenbacher at transtec.de> writes:
Hi,
meanwhile I got past the previous error. I noticed, that I had linked
in the static version of libibverbs by mistake (strange that it makes
a difference). When I used the dynamic version, the error changed to
$ mpiexec -comm mpich-ib ./cpi
[2] Abort: Error posting recv
at line 628 in file viapriv.c
[3] Abort: Error posting recv
at line 628 in file viapriv.c
[0] Abort: Error posting recv
mpiexec: Warning: accept_abort_conn: MPI_Abort from IP 192.168.42.105, rank 2, killing all.
at line 628 in file viapriv.c
[1] Abort: Error posting recv
at line 628 in file viapriv.c
mpiexec: Warning: tasks 0-3 exited with status 253.
Again, the same happens when using mpirun_rsh.
Roland
Roland> Hi, I have problems getting my IB adapters initialized
Roland> when running an mvapich binary:
Roland> $ mpiexec -comm mpich-ib ./cpi
Roland> [0] Abort: Error getting HCA context
Roland> at line 260 in file viainit.c
Roland> [1] Abort: Error getting HCA context
Roland> at line 260 in file viainit.c
Roland> [2] Abort: Error getting HCA context
Roland> at line 260 in file viainit.c
Roland> [3] Abort: Error getting HCA context
Roland> at line 260 in file viainit.c
Roland> mpiexec: Warning: tasks 0-3 exited with status 255.
Roland> The same happens when using mpirun_rsh.
Roland> I'm using mvapich 0.9.8 compiled against OFED 1.1.
Roland> Basic IB connectivity works as shown by a ping over the IB
Roland> network between the two test nodes I use. I can also run
Roland> ib_rdma_bw, ib_rdma_lat, etc. programs from the OFED
Roland> release without any problem.
Roland> Loaded IB modules are:
Roland> beo-104:~# lsmod | grep ib_
Roland> ib_ipoib 49944 0
Roland> ib_sa 17292 1 ib_ipoib
Roland> ib_uverbs 41520 0
Roland> ib_umad 18480 4
Roland> ib_mthca 120880 0
Roland> ib_mad 39588 3 ib_sa,ib_umad,ib_mthca
Roland> ib_core 56192 6 ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
Roland> and the following devices exist.
Roland> beo-104:~# ls -l /dev/infiniband/
Roland> total 0
Roland> crw-rw---- 1 root root 231, 64 Jan 22 14:11 issm0
Roland> crw-rw---- 1 root root 231, 0 Jan 22 14:11 umad0
Roland> crw-rw-rw- 1 root root 231, 192 Jan 22 14:11 uverbs0
Roland> I have read section "7.2.2 Error getting HCA Context" from
Roland> the Mvapich User Guide, but this didn't bring me any
Roland> further.
Roland> What is going wrong?
Roland> Thanks,
Roland> Roland
Roland> _______________________________________________
Roland> mvapich-discuss mailing list
Roland> mvapich-discuss at cse.ohio-state.edu
Roland> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list