[mvapich-discuss] Deadlock with CUDA and InfiniBand

Hari Subramoni subramoni.1 at osu.edu
Wed Sep 10 23:41:00 EDT 2014


Hello Freddie,

This is a little strange. We have not encountered this issue before. Could
you please let us know which version of MVAPICH2 you are using and with
what configure / run time options?

We recently released MVAPICH2-2.0. Could you please try with that and see
if the same issue exists there as well? You can download MVPAICH2-2.0 from
the following site.

http://mvapich.cse.ohio-state.edu/downloads/

Regards,
Hari.

On Wed, Sep 10, 2014 at 6:12 PM, Witherden, Freddie <
freddie.witherden08 at imperial.ac.uk> wrote:

> Hello,
>
> I have a somewhat obscure deadlock issue that only occurs under a specific
> set of circumstances.  The application I develop, PyFR (http://pyfr.org/),
> is written in Python and uses CUDA and MPI to run on GPU clusters (although
> we do not use any CUDA-aware-MPI functionality, all copying is marshaled by
> us).
>
> When using either TCP, SHM, or a combination thereof as a transport layer
> no problems are observed.  Further, when running over IB with
> one-rank-per-node no problems are observed.  However, when running over IB
> with some ranks on the same node PyFR deadlocks.  The deadlocking can be
> avoided by setting: MV2_USE_RDMA_FAST_PATH=0.
>
> Interestingly, we observed a similar issue with Intel MPI a while back;
> specifically version 3 would deadlock when running over IB (but not TCP or
> SHM).  This was resolved by upgrading to version 4.  No issues have ever
> been observed when using Platform MPI.
>
> Searching I have found the following mailing list posting describing a
> similar behaviour:
>
>
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2013-May/004435.html
>
> which like PyFR appears to occur when combining CUDA and IB and can be
> resolved by setting MV2_USE_RDMA_FAST_PATH=0.
>
> From an API standpoint PyFR is relatively simple: all requests are
> persistent, point-to-point and non-blocking.  Unfortunately my attempts to
> produce a reduced test case have never got very far -- only the complete
> application is able to reliably produce deadlocks.
>
> What is the best means of proceeding?
>
> Regards, Freddie.
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140910/4758ab36/attachment.html>


More information about the mvapich-discuss mailing list