[mvapich-discuss] deadlock behavior

Tue May 21 04:27:34 EDT 2013

Hi,

I'm currently getting a deadlock from a cuda+mpi program and I think it's coming from MVAPICH2. The behavior is the same when sending from device memory, or sending from host (after a transfer from the GPU). Running the same code with mpich2 does not produce the deadlock.

The deadlock seems to occur when I try to perform the equivalent of an asynchronous MPI_Allreduce on a single float, using MPI_Isend's and MPI_Irecv's. However, one of the nodes will never receive their message, despite it being sent. When I run with 4 instances, the deadlock does not happen when ran on a single node, only when split across 2 nodes, and even then it depends on the rank order. 

The problem goes away if I replace the Isend and Irecvs with MPI_Allreduce, and use the flag MV2_CUDA_SMP_IPC=1. Using the flag with Isend & Irecvs from host causes a segfault, and using it after transferring to host will work. 

Any idea on what could cause this? Thanks in advance for any help.

Best,
Brody Huval