[mvapich-discuss] BUG REPORT: MVAPICH2 over OFED 1.5.4.1 fails in heterogeneous fabrics

Devendar Bureddy bureddy at cse.ohio-state.edu
Mon Apr 9 11:27:58 EDT 2012


Hi Michael

Can you please try the attached patch with latest 1.7 nightly tarball
and see if this issue resolved with it?

Please follow below instructions for applying the patch:

$tar xvf mvapich2-latest.tar.gz
$cd mvapich2-1.7-r5225
$patch -p0 < diff.patch

-Devendar

On Mon, Apr 2, 2012 at 2:13 PM, Heinz, Michael William
<michael.william.heinz at intel.com> wrote:
> Basically, the problem is this: In version 1.7 of mvapich2, setting up handling of a mixed fabric was done before initialization of the IB queue pairs. This was done by calling rdma_ring_based_allgather() to collect information about the HCA types and then calling rdma_param_handle_heterogenity(). (See lines 250-270 of rdma_iba_init.c).
>
> Working this way permitted each rank to correctly determine whether to create a shared receive queue or not.
>
> Unfortunately, this was eliminated in 1.7-r5140. In the new version, rdma_param_handle_heterogenity() is not called till *after* the shared receive queue has already been created and the QP had been moved to ready-to-receive state - and when rdma_param_handle_heterogenity() turns the shared receive queue off, the queue pairs are left in an unusable state.
>
> This problem affects fabrics using HCAs from IBM, older Tavor-style Mellanox HCAs and QLogic HCAs.
>
> We've reviewed the changes and, unfortunately, we can't see a way to fix this without going back to using rdma_ring_based_allgather() to collect information about the HCA types before initializing the queue pairs. The work around is to manually specify MV2_USE_SRQ=0 when using mvapich2-1.7-r5140.
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



-- 
Devendar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff.patch
Type: application/octet-stream
Size: 12847 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120409/53f2a61e/diff-0001.obj


More information about the mvapich-discuss mailing list