[mvapich-discuss] Segmentation fault using mvapich2 to do gpu-gpu communication

Thu Jun 27 10:29:13 EDT 2013

Hi,

I got segmentation fault when I tried to run the following test case.

If I ran 2 processes using the command <<<mpirun -genv MV2_USE_CUDA=1 -n 2 ./a.out>>>, it worked fine.
But if I ran 3 processes, it failed. According to the executing frame, it seemed that it failed at subroutine MPIDI_CH3_CUDAIPC_Rendezvous_push.
If I set variable MV2_CUDA_IPC to be 0, it worked fine.
Is there any problem in my test case, or is it a bug in mvapich2?

Executing frame:
(gdb) bt
#0  0x00002b9c07a18270 in ?? () from /usr/lib64/libcuda.so.1
#1  0x00002b9c079faef5 in ?? () from /usr/lib64/libcuda.so.1
#2  0x00002b9c07a065c7 in ?? () from /usr/lib64/libcuda.so.1
#3  0x00002b9c0793a652 in ?? () from /usr/lib64/libcuda.so.1
#4  0x00002b9c07918d08 in ?? () from /usr/lib64/libcuda.so.1
#5  0x00002b9c075fb7d5 in ?? () from /apps/rhel6/cuda/5.0.35/lib64/libcudart.so.5.0
#6  0x00002b9c0762cecb in cudaStreamWaitEvent () from /apps/rhel6/cuda/5.0.35/lib64/libcudart.so.5.0
#7  0x00002b9c06e584a6 in MPIDI_CH3_CUDAIPC_Rendezvous_push (vc=0x1442b10, sreq=0x2b9c071bed80)
    at src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_rndv.c:121
#8  0x00002b9c06e1be5b in MPIDI_CH3_Rendezvous_push (vc=0x1442b10, sreq=0x2b9c071bed80)
    at src/mpid/ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c:386
#9  0x00002b9c06e1c228 in MPIDI_CH3I_MRAILI_Process_rndv ()
    at src/mpid/ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c:814
#10 0x00002b9c06e19ff2 in MPIDI_CH3I_Progress (is_blocking=1, state=<value optimized out>)
    at src/mpid/ch3/channels/mrail/src/rdma/ch3_progress.c:323
#11 0x00002b9c06eec9df in PMPI_Send (buf=0x2300300000, count=20, datatype=<value optimized out>, dest=2, 
    tag=<value optimized out>, comm=<value optimized out>) at src/mpi/pt2pt/send.c:161
#12 0x0000000000400b82 in main ()

Test Program:
********************************begin program*******************************************
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>

void send_recv(int n, int myid, int numprocs){
        char *t, *r;
        int prev, next;

        cudaMalloc(&t, n);
        cudaMalloc(&r, n);

        cudaMemset(t, 'a', n);
        cudaMemset(r, 'b', n);
        prev = (myid==0) ? MPI_PROC_NULL : (myid-1);
        next = (myid==numprocs-1) ? MPI_PROC_NULL : (myid+1);

        if ((myid%2)==0){
                MPI_Send(t, n, MPI_CHAR, next, 0, MPI_COMM_WORLD);
                MPI_Recv(r, n, MPI_CHAR, prev, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
        }
        else {
                MPI_Recv(r, n, MPI_CHAR, prev, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
                MPI_Send(t, n, MPI_CHAR, next, 0, MPI_COMM_WORLD);
        }
}

int main(int argc, char *argv[]){
        int numprocs, myid;
        int n, device, ierr;

        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
        MPI_Comm_rank(MPI_COMM_WORLD, &myid);

        ierr = cudaSetDevice(myid%3);
        n = 20;
        send_recv(n, myid, numprocs);
        MPI_Finalize();
}
*****************************************end program**************************************