[mvapich-discuss] Hang in CH3 SMP Rendezvous protocol w/ CUDA w/o Infiniband

Thu Jan 22 17:46:19 EST 2015

Hi Khaled,

Thanks for the feedback, what additional information would be most useful?
full config.log or some subset? /proc/cpuinfo? Something else?

I've dug a little deeper and tried two other non-Infiniband systems I have
access to, both of which succeed. (With a modified configure line to point
to a userspace build of libibverbs.so v1.1.8-1 from the Debian repos and
non-standard CUDA 6.0 path:

../mvapich2-2.1rc1/configure
--prefix=/home/psath/mvapich2-2.1rc1/build/install --enable-cuda
--disable-mcast --with-ib-libpath=/home/psath/libibverbs/install/lib
--with-ib-include=/home/psath/libibverbs/install/include
--with-libcuda=/usr/local/cuda-6.0/lib64
--with-libcudart=/usr/local/cuda-6.0/lib64/
)

One successful system has dual K20Xm's running Nvidia driver version 331.67

The other has a single C2070 running the same Nvidia driver.

The hanging system has 4x Tesla C2070s running Nvidia driver 319.32 and
libibverbs 1.1.6 (I have tested swapping in libibverbs 1.1.8 and gcc 4.8 to
make it more like the successful systems, to no avail. Vimdiff examination
of the config.log of the failing system vs. either succeeding system shows
no significant changes.)

I don't really suspect the CUDA/Driver version to be the root of the
failure, but given that's the only visible difference that I still haven't
tested, that seems the logical next step.

Any other ideas how to further diagnose the issue? With GDB I've managed to
deduce that both sender and receiver are spinning in MPIDI_CH3I_Progress
and their callees (mainly write_progress and read_progress, respectively),
but I suspect that's too high up the call graph to be of much use - I've
been unable to reliably trace the busy looping further down.

Thanks!

-Paul Sathre
Research Programmer - Synergy Lab
Dept. of Computer Science
Virginia Tech

On Thu, Jan 22, 2015 at 12:30 PM, khaled hamidouche <
hamidouc at cse.ohio-state.edu> wrote:

> Hi Paul,
>
>  We are not able to reproduce  your issue. I tried both H-H and D-D with
> different MV2_SMP_EAGERSIZE values (4K,8K,16K ...) and on a node without
> IB HCA and all the tests passed. Would you please provide more information
> about your platform/ system.
>
> Thanks
>
>
>
> On Wed, Jan 21, 2015 at 4:38 PM, Paul Sathre <sath6220 at cs.vt.edu> wrote:
>
>> Hello all,
>>
>> I am in the process of developing some GPGPU library code atop MPI, and
>> we selected MVAPICH due to its demonstrated support for GPUDirect
>> communication. However, in shared memory tests on a local node that is not
>> equipped with Infiniband, we are unable to perform MPI_Send/MPI_Recv pairs
>> beyond the MV2_SMP_EAGERSIZE threshold due to a hang internal to MVAPICH -
>> both for device and host buffers.
>>
>> I have tested this is present in both mvapich2-2.1rc1 and mvapich2-2.0,
>> and confirmed the hang is not restricted to our code, as the same behavior
>> is exhibited by the osu_latency bechmark - the last output before hanging
>> is exactly half of MV2_SMP_EAGERSIZE. (I've tested all power-of-two sizes
>> from 16K to 1M with MV2_SMPI_LENGTH_QUEUE fixed to 4x the eager size,
>> observing the same behavior.)
>>
>> I have been unable to diagnose whether the hang is in the initial
>> rendezvous handshake, or the actual transfer of the large buffer.
>>
>> My configure line is:
>>  ../mvapich2-2.1rc1/configure
>> --prefix=/home/psath/mvapich2-2.1rc1/build/install --enable-cuda
>> --disable-mcast
>>
>> (Run from ~/mvapich2-2.1rc1/build, assuming source is in
>> ~/mvapich2-2.1rc1/mvapich2-2.1rc1/) I am forced to disable multicast as our
>> dev node doesn't have Infiniband or associated header files. Enabling CUDA
>> is a requirement for us.
>>
>> The node is running Ubuntu Linux 3.11.0-14-generic 64-bit and gcc 4.6.4.
>>
>> We are able to continue debugging our non-GPUDirect fallback code paths
>> (our own host-staging of buffers) with standard MPICH in the mean time, but
>> going forward would prefer the performance afforded by sidestepping the
>> host when possible.
>>
>> Please let me know if there is any other information I can provide that
>> would help with diagnosing the issue.
>>
>> Thanks!
>>
>> -Paul Sathre
>> Research Programmer - Synergy Lab
>> Dept. of Computer Science
>> Virginia Tech
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150122/779d4851/attachment-0001.html>