[mvapich-discuss] deadlock behavior

Brody Huval brodyh at stanford.edu
Wed May 22 02:47:29 EDT 2013


Hi Sreeram,

I found a parameter that seems to have fixed the problem. 
MV2_USE_RDMA_FAST_PATH=0

How crucial are fast path features? and are there recommended sizes for the parameter MV2_RDMA_FAST_PATH_BUF_SIZE that could also resolve this?

Thanks again,
Brody


On May 21, 2013, at 9:51 AM, sreeram potluri <potluri at cse.ohio-state.edu> wrote:

> Hi Brody,
> 
> Thanks for reporting the issue. Can you send us the reproducers if possible? 
> 
> Sreeram Potluri
> 
> 
> On Tue, May 21, 2013 at 4:27 AM, Brody Huval <brodyh at stanford.edu> wrote:
> Hi,
> 
> I'm currently getting a deadlock from a cuda+mpi program and I think it's coming from MVAPICH2. The behavior is the same when sending from device memory, or sending from host (after a transfer from the GPU). Running the same code with mpich2 does not produce the deadlock.
> 
> The deadlock seems to occur when I try to perform the equivalent of an asynchronous MPI_Allreduce on a single float, using MPI_Isend's and MPI_Irecv's. However, one of the nodes will never receive their message, despite it being sent. When I run with 4 instances, the deadlock does not happen when ran on a single node, only when split across 2 nodes, and even then it depends on the rank order.
> 
> The problem goes away if I replace the Isend and Irecvs with MPI_Allreduce, and use the flag MV2_CUDA_SMP_IPC=1. Using the flag with Isend & Irecvs from host causes a segfault, and using it after transferring to host will work.
> 
> Any idea on what could cause this? Thanks in advance for any help.
> 
> Best,
> Brody Huval
> 
> 
> 
> 
> 
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130521/454bc025/attachment.html


More information about the mvapich-discuss mailing list