[mvapich-discuss] Deadlock with CUDA and InfiniBand
Witherden, Freddie
freddie.witherden08 at imperial.ac.uk
Wed Sep 10 19:12:50 EDT 2014
Hello,
I have a somewhat obscure deadlock issue that only occurs under a specific set of circumstances. The application I develop, PyFR (http://pyfr.org/), is written in Python and uses CUDA and MPI to run on GPU clusters (although we do not use any CUDA-aware-MPI functionality, all copying is marshaled by us).
When using either TCP, SHM, or a combination thereof as a transport layer no problems are observed. Further, when running over IB with one-rank-per-node no problems are observed. However, when running over IB with some ranks on the same node PyFR deadlocks. The deadlocking can be avoided by setting: MV2_USE_RDMA_FAST_PATH=0.
Interestingly, we observed a similar issue with Intel MPI a while back; specifically version 3 would deadlock when running over IB (but not TCP or SHM). This was resolved by upgrading to version 4. No issues have ever been observed when using Platform MPI.
Searching I have found the following mailing list posting describing a similar behaviour:
http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2013-May/004435.html
which like PyFR appears to occur when combining CUDA and IB and can be resolved by setting MV2_USE_RDMA_FAST_PATH=0.
>From an API standpoint PyFR is relatively simple: all requests are persistent, point-to-point and non-blocking. Unfortunately my attempts to produce a reduced test case have never got very far -- only the complete application is able to reliably produce deadlocks.
What is the best means of proceeding?
Regards, Freddie.
More information about the mvapich-discuss
mailing list