[mvapich-discuss] asynchronous progress with CUDA

Carlos Osuna carlos.osuna at env.ethz.ch
Wed May 8 07:51:37 EDT 2013


Dear Devendar,

thanks for providing the patch. I could recently try it and works 
perfectly.

Cheers, Carlos

On 04/29/2013 01:29 AM, Devendar Bureddy wrote:
> Hi Carlos
>
> This issue is because of no cuda context set in the thread. MVAPICH2 
> supports async communication progress thread with run-time parameter 
> MPICH_ASYNC_PROGRESS=1. you can use this functionality instead of your 
> own thread. The attached patch will fix the context issue in the 
> internal async progress thread.  You can try this patch with above 
> mentioned run-time flag. Please follow below instruction in applying 
> the patch.
>
> [mvapich2-1.9rc1]$ patch -p1 < ./diff.patch
> patching file src/mpi/init/async.c
> patching file src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_util.c
> patching file src/mpid/ch3/include/mpidimpl.h
>
> -Devendar
>
>
> On Tue, Apr 23, 2013 at 12:02 PM, Devendar Bureddy 
> <bureddy at cse.ohio-state.edu <mailto:bureddy at cse.ohio-state.edu>> wrote:
>
>     Hi Carlos
>
>     Thanks for your report. We will take a look at it
>
>     -Devendar
>
>
>     On Tue, Apr 23, 2013 at 3:47 AM, Osuna Escamilla Carlos
>     <carlos.osuna at env.ethz.ch <mailto:carlos.osuna at env.ethz.ch>> wrote:
>
>         Dear mvapich2 team
>
>         I have a fat node with 8 GPUs and a simple communication with
>         MPI_Isend & MPI_Irecv on gpu pointers, which I would like to
>         progress with an additional thread.
>
>         Below I post a snippet with the function that is called by a
>         pthread_create (The tag within the MPI_Irecv is never fulfilled).
>
>         void* mpi_test_fn(void* ptr)
>         {
>           MPI_Request req;
>           MPI_Status status;
>           double* b;
>           cudaMalloc(&b, sizeof(double) );
>
>           MPI_Irecv(&b, 1, MPI_DOUBLE, 0, 599999, MPI_COMM_WORLD, &req);
>           int flag;
>           while(true)
>             MPI_Test(&req, &flag, &status);
>         }
>
>         The trick works with CPU communication, i.e if the pointer I
>         place in the MPI_Isend & MPI_Irecv is a host pointer, and the
>         asynchronous progress seems to work as well.
>         But it crashes when I use gpu pointers (it is the thread
>         created with pthread, and calling MPI_Test the one that crashes).
>
>         The segmentation fault happens in
>         src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_rndv.c
>         in the MPIDI_CH3_CUDAIPC_Rendezvous_push function.
>
>         Early in this function, there is some code like (simpliflying)
>                cudaStream_t strm = 0;
>                 strm = stream_d2h;
>         But stream_d2h was never created, and therefore strm contains
>         a null pointer which later on triggers the seg fault.
>
>         The crash only happens with VAPI_PROTOCOL_CUDAIPC, as I also
>         tested it with devices with non peer to peer capability, then
>         the whole communication has to go via VAPI_PROTOCOL_R3, which
>         seems to work, i.e. there is no crash and the progress happens.
>
>         Am I missing something? perhaps someone already succeeded with
>         this asynchronous progress on cuda device communication with a
>         different approach?
>
>         For reference, I am using mvapich2/1.9rc1 with the following
>         configure
>         ./configure--enable-threads=multiple --enable-shared
>         --enable-sharedlibs=gcc --enable-fc --enable-cxx --with-mpe
>         --enable-rdma-cm --enable-fast --enable-smpcoll --with-hwloc
>         --enable-xrc --with-device=ch3:mrail --with-rdma=gen2
>         --enable-cuda --enable-g=dbg --enable-debuginfo
>         --enable-async-progress CC=gcc CXX=g++ FC=gfortran F77=gfortran
>
>
>         thanks for the help, Carlos
>
>         _______________________________________________
>         mvapich-discuss mailing list
>         mvapich-discuss at cse.ohio-state.edu
>         <mailto:mvapich-discuss at cse.ohio-state.edu>
>         http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
>
>     -- 
>     Devendar
>
>
>
>
> -- 
> Devendar


-- 
----------------------------------------------------

Carlos Osuna
ETH Zürich
Institute for Atmospheric and Climate Sciences
Universitätstrasse 16
CH-8092 Zurich, Switzerland
Tel: +41 (44) 632 82 66
                                                        
----------------------------------------------------

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130508/85d94710/attachment.html


More information about the mvapich-discuss mailing list