[mvapich-discuss] Segmentation fault using mvapich2 to do gpu-gpu communication

Devendar Bureddy bureddy at cse.ohio-state.edu
Thu Jun 27 10:59:03 EDT 2013


Hi Ye Wang

Currently, It is required to set the cuda context before MPI_Init() with
MVAPICH2 (See user guide section :
http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9.html#x1-840006.20)
.
We will remove this limitation in future releases.  Can you try moving
cudaSetDevice() before MPI_Init.

-Devendar


On Thu, Jun 27, 2013 at 10:29 AM, Ye Wang <wang1351 at purdue.edu> wrote:

> Hi,
>
> I got segmentation fault when I tried to run the following test case.
>
> If I ran 2 processes using the command <<<mpirun -genv MV2_USE_CUDA=1 -n 2
> ./a.out>>>, it worked fine.
> But if I ran 3 processes, it failed. According to the executing frame, it
> seemed that it failed at subroutine MPIDI_CH3_CUDAIPC_Rendezvous_push.
> If I set variable MV2_CUDA_IPC to be 0, it worked fine.
> Is there any problem in my test case, or is it a bug in mvapich2?
>
> Executing frame:
> (gdb) bt
> #0  0x00002b9c07a18270 in ?? () from /usr/lib64/libcuda.so.1
> #1  0x00002b9c079faef5 in ?? () from /usr/lib64/libcuda.so.1
> #2  0x00002b9c07a065c7 in ?? () from /usr/lib64/libcuda.so.1
> #3  0x00002b9c0793a652 in ?? () from /usr/lib64/libcuda.so.1
> #4  0x00002b9c07918d08 in ?? () from /usr/lib64/libcuda.so.1
> #5  0x00002b9c075fb7d5 in ?? () from
> /apps/rhel6/cuda/5.0.35/lib64/libcudart.so.5.0
> #6  0x00002b9c0762cecb in cudaStreamWaitEvent () from
> /apps/rhel6/cuda/5.0.35/lib64/libcudart.so.5.0
> #7  0x00002b9c06e584a6 in MPIDI_CH3_CUDAIPC_Rendezvous_push (vc=0x1442b10,
> sreq=0x2b9c071bed80)
>     at src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_rndv.c:121
> #8  0x00002b9c06e1be5b in MPIDI_CH3_Rendezvous_push (vc=0x1442b10,
> sreq=0x2b9c071bed80)
>     at src/mpid/ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c:386
> #9  0x00002b9c06e1c228 in MPIDI_CH3I_MRAILI_Process_rndv ()
>     at src/mpid/ch3/channels/mrail/src/rdma/ch3_rndvtransfer.c:814
> #10 0x00002b9c06e19ff2 in MPIDI_CH3I_Progress (is_blocking=1, state=<value
> optimized out>)
>     at src/mpid/ch3/channels/mrail/src/rdma/ch3_progress.c:323
> #11 0x00002b9c06eec9df in PMPI_Send (buf=0x2300300000, count=20,
> datatype=<value optimized out>, dest=2,
>     tag=<value optimized out>, comm=<value optimized out>) at
> src/mpi/pt2pt/send.c:161
> #12 0x0000000000400b82 in main ()
>
> Test Program:
> ********************************begin
> program*******************************************
> #include <mpi.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <cuda.h>
>
> void send_recv(int n, int myid, int numprocs){
>         char *t, *r;
>         int prev, next;
>
>         cudaMalloc(&t, n);
>         cudaMalloc(&r, n);
>
>         cudaMemset(t, 'a', n);
>         cudaMemset(r, 'b', n);
>         prev = (myid==0) ? MPI_PROC_NULL : (myid-1);
>         next = (myid==numprocs-1) ? MPI_PROC_NULL : (myid+1);
>
>         if ((myid%2)==0){
>                 MPI_Send(t, n, MPI_CHAR, next, 0, MPI_COMM_WORLD);
>                 MPI_Recv(r, n, MPI_CHAR, prev, 0, MPI_COMM_WORLD,
> MPI_STATUS_IGNORE);
>         }
>         else {
>                 MPI_Recv(r, n, MPI_CHAR, prev, 0, MPI_COMM_WORLD,
> MPI_STATUS_IGNORE);
>                 MPI_Send(t, n, MPI_CHAR, next, 0, MPI_COMM_WORLD);
>         }
> }
>
> int main(int argc, char *argv[]){
>         int numprocs, myid;
>         int n, device, ierr;
>
>         MPI_Init(&argc, &argv);
>         MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
>         MPI_Comm_rank(MPI_COMM_WORLD, &myid);
>
>         ierr = cudaSetDevice(myid%3);
>         n = 20;
>         send_recv(n, myid, numprocs);
>         MPI_Finalize();
> }
> *****************************************end
> program**************************************
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



-- 
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130627/b6ac0857/attachment-0001.html


More information about the mvapich-discuss mailing list