[mvapich-discuss] Problem initializing IB device (gen2)

Roland Fehrenbacher Roland.Fehrenbacher at transtec.de
Tue Jan 23 09:41:15 EST 2007


>>>>> "Roland" == Roland Fehrenbacher <Roland.Fehrenbacher at transtec.de> writes:

Hi,

meanwhile I got past the previous error. I noticed, that I had linked
in the static version of libibverbs by mistake (strange that it makes
a difference). When I used the dynamic version, the error changed to

$ mpiexec -comm mpich-ib ./cpi
[2] Abort: Error posting recv
 at line 628 in file viapriv.c
[3] Abort: Error posting recv
 at line 628 in file viapriv.c
[0] Abort: Error posting recv
mpiexec: Warning: accept_abort_conn: MPI_Abort from IP 192.168.42.105, rank 2, killing all.
 at line 628 in file viapriv.c
[1] Abort: Error posting recv
 at line 628 in file viapriv.c
mpiexec: Warning: tasks 0-3 exited with status 253.

Again, the same happens when using mpirun_rsh.

Roland

    Roland> Hi, I have problems getting my IB adapters initialized
    Roland> when running an mvapich binary:

    Roland> $ mpiexec -comm mpich-ib ./cpi
    Roland> [0] Abort: Error getting HCA context
    Roland>  at line 260 in file viainit.c
    Roland> [1] Abort: Error getting HCA context
    Roland>  at line 260 in file viainit.c
    Roland> [2] Abort: Error getting HCA context
    Roland>  at line 260 in file viainit.c
    Roland> [3] Abort: Error getting HCA context
    Roland>  at line 260 in file viainit.c
    Roland> mpiexec: Warning: tasks 0-3 exited with status 255.

    Roland> The same happens when using mpirun_rsh.

    Roland> I'm using mvapich 0.9.8 compiled against OFED 1.1.

    Roland> Basic IB connectivity works as shown by a ping over the IB
    Roland> network between the two test nodes I use. I can also run
    Roland> ib_rdma_bw, ib_rdma_lat, etc. programs from the OFED
    Roland> release without any problem.

    Roland> Loaded IB modules are:

    Roland> beo-104:~# lsmod | grep ib_
    Roland> ib_ipoib               49944  0
    Roland> ib_sa                  17292  1 ib_ipoib
    Roland> ib_uverbs              41520  0
    Roland> ib_umad                18480  4
    Roland> ib_mthca              120880  0
    Roland> ib_mad                 39588  3 ib_sa,ib_umad,ib_mthca
    Roland> ib_core                56192  6 ib_ipoib,ib_sa,ib_uverbs,ib_umad,ib_mthca,ib_mad
    Roland> and the following devices exist.

    Roland> beo-104:~# ls -l /dev/infiniband/
    Roland> total 0
    Roland> crw-rw----  1 root root 231,  64 Jan 22 14:11 issm0
    Roland> crw-rw----  1 root root 231,   0 Jan 22 14:11 umad0
    Roland> crw-rw-rw-  1 root root 231, 192 Jan 22 14:11 uverbs0

    Roland> I have read section "7.2.2 Error getting HCA Context" from
    Roland> the Mvapich User Guide, but this didn't bring me any
    Roland> further.

    Roland> What is going wrong?

    Roland> Thanks,

    Roland> Roland

    Roland> _______________________________________________
    Roland> mvapich-discuss mailing list
    Roland> mvapich-discuss at cse.ohio-state.edu
    Roland> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list