[mvapich-discuss] RE: BUG REPORT: MVAPICH2 over OFED 1.5.4.1 fails in heterogeneous fabrics

Heinz, Michael William michael.william.heinz at intel.com
Tue Apr 17 11:23:40 EDT 2012


Devendar,

The fix prevents the crash, but the jobs appear to immediately hang during initialization.

Running a simple osu_bibw test, I got the following stacks. If I had to guess, neither is receiving IB traffic.

Mellanox system:

#0  0x00007f293a92c7b5 in MPIDI_CH3I_MRAILI_Get_next_vbuf ()
   from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#1  0x00007f293a8e39f0 in MPIDI_CH3I_read_progress () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#2  0x00007f293a8e3648 in MPIDI_CH3I_Progress () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#3  0x00007f293a928c5f in MPIC_Wait () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#4  0x00007f293a9298ba in MPIC_Sendrecv () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#5  0x00007f293a929aaf in MPIC_Sendrecv_ft () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#6  0x00007f293a8cab0e in MPIR_Allreduce_pt2pt_MV2 () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#7  0x00007f293a8cb59e in MPIR_Allreduce_MV2 () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#8  0x00007f293a906bca in MPIR_Get_contextid_sparse ()
   from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#9  0x00007f293a9055f6 in MPIR_Comm_split_impl () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#10 0x00007f293a905c14 in PMPI_Comm_split () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#11 0x00007f293a909c36 in create_2level_comm () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#12 0x00007f293a93778e in MPIR_Init_thread () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#13 0x00007f293a937192 in PMPI_Init () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#14 0x00000000004009c5 in main ()

QIB system:

#0  0x00007f293a92c7b5 in MPIDI_CH3I_MRAILI_Get_next_vbuf ()
   from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#1  0x00007f293a8e39f0 in MPIDI_CH3I_read_progress () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#2  0x00007f293a8e3648 in MPIDI_CH3I_Progress () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#3  0x00007f293a928c5f in MPIC_Wait () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#4  0x00007f293a9298ba in MPIC_Sendrecv () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#5  0x00007f293a929aaf in MPIC_Sendrecv_ft () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#6  0x00007f293a8cab0e in MPIR_Allreduce_pt2pt_MV2 () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#7  0x00007f293a8cb59e in MPIR_Allreduce_MV2 () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#8  0x00007f293a906bca in MPIR_Get_contextid_sparse ()
   from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#9  0x00007f293a9055f6 in MPIR_Comm_split_impl () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#10 0x00007f293a905c14 in PMPI_Comm_split () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#11 0x00007f293a909c36 in create_2level_comm () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#12 0x00007f293a93778e in MPIR_Init_thread () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#13 0x00007f293a937192 in PMPI_Init () from /usr/mpi/gcc/mvapich2-1.7/lib/libmpich.so.3
#14 0x00000000004009c5 in main ()

-----Original Message-----
From: Devendar Bureddy [mailto:bureddy at cse.ohio-state.edu] 
Sent: Monday, April 09, 2012 11:28 AM
To: Heinz, Michael William
Cc: mvapich-discuss at cse.ohio-state.edu; Marciniszyn, Mike; Rimmer, Todd
Subject: Re: [mvapich-discuss] BUG REPORT: MVAPICH2 over OFED 1.5.4.1 fails in heterogeneous fabrics

Hi Michael

Can you please try the attached patch with latest 1.7 nightly tarball and see if this issue resolved with it?

Please follow below instructions for applying the patch:

$tar xvf mvapich2-latest.tar.gz
$cd mvapich2-1.7-r5225
$patch -p0 < diff.patch

-Devendar

On Mon, Apr 2, 2012 at 2:13 PM, Heinz, Michael William <michael.william.heinz at intel.com> wrote:
> Basically, the problem is this: In version 1.7 of mvapich2, setting up handling of a mixed fabric was done before initialization of the IB queue pairs. This was done by calling rdma_ring_based_allgather() to collect information about the HCA types and then calling rdma_param_handle_heterogenity(). (See lines 250-270 of rdma_iba_init.c).
>
> Working this way permitted each rank to correctly determine whether to create a shared receive queue or not.
>
> Unfortunately, this was eliminated in 1.7-r5140. In the new version, rdma_param_handle_heterogenity() is not called till *after* the shared receive queue has already been created and the QP had been moved to ready-to-receive state - and when rdma_param_handle_heterogenity() turns the shared receive queue off, the queue pairs are left in an unusable state.
>
> This problem affects fabrics using HCAs from IBM, older Tavor-style Mellanox HCAs and QLogic HCAs.
>
> We've reviewed the changes and, unfortunately, we can't see a way to fix this without going back to using rdma_ring_based_allgather() to collect information about the HCA types before initializing the queue pairs. The work around is to manually specify MV2_USE_SRQ=0 when using mvapich2-1.7-r5140.
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



--
Devendar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff.patch
Type: application/octet-stream
Size: 12848 bytes
Desc: diff.patch
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120417/7fa1b1e3/diff.obj


More information about the mvapich-discuss mailing list