[mvapich-discuss] asynchronous progress with CUDA

Devendar Bureddy bureddy at cse.ohio-state.edu
Wed May 8 10:41:13 EDT 2013


Hi Carlos

Good to know that patch resolved the issue.  This patch is also available
in latest MVAPICH2-1.9 GA release. You can upgrade to latest version.

-Devendar


On Wed, May 8, 2013 at 7:51 AM, Carlos Osuna <carlos.osuna at env.ethz.ch>wrote:

>  Dear Devendar,
>
> thanks for providing the patch. I could recently try it and works
> perfectly.
>
> Cheers, Carlos
>
>
> On 04/29/2013 01:29 AM, Devendar Bureddy wrote:
>
> Hi Carlos
>
> This issue is because of no cuda context set in the thread. MVAPICH2
> supports async communication progress thread with run-time parameter MPICH
> _ASYNC_PROGRESS=1. you can use this functionality instead of your own
> thread. The attached patch will fix the context issue in the internal
> async progress thread.  You can try this patch with above mentioned
> run-time flag. Please follow below instruction in applying the patch.
>
>  [mvapich2-1.9rc1]$ patch -p1 < ./diff.patch
> patching file src/mpi/init/async.c
> patching file src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_util.c
> patching file src/mpid/ch3/include/mpidimpl.h
>
>  -Devendar
>
>
> On Tue, Apr 23, 2013 at 12:02 PM, Devendar Bureddy <
> bureddy at cse.ohio-state.edu> wrote:
>
>> Hi Carlos
>>
>>  Thanks for your report. We will take a look at it
>>
>>  -Devendar
>>
>>
>> On Tue, Apr 23, 2013 at 3:47 AM, Osuna Escamilla Carlos <
>> carlos.osuna at env.ethz.ch> wrote:
>>
>>> Dear mvapich2 team
>>>
>>> I have a fat node with 8 GPUs and a simple communication with MPI_Isend
>>> & MPI_Irecv on gpu pointers, which I would like to progress with an
>>> additional thread.
>>>
>>> Below I post a snippet with the function that is called by a
>>> pthread_create (The tag within the MPI_Irecv is never fulfilled).
>>>
>>> void* mpi_test_fn(void* ptr)
>>> {
>>>   MPI_Request req;
>>>   MPI_Status status;
>>>   double* b;
>>>   cudaMalloc(&b, sizeof(double) );
>>>
>>>   MPI_Irecv(&b, 1, MPI_DOUBLE, 0, 599999, MPI_COMM_WORLD, &req);
>>>   int flag;
>>>   while(true)
>>>     MPI_Test(&req, &flag, &status);
>>> }
>>>
>>> The trick works with CPU communication, i.e if the pointer I place in
>>> the MPI_Isend & MPI_Irecv is a host pointer, and the asynchronous progress
>>> seems to work as well.
>>> But it crashes when I use gpu pointers (it is the thread created with
>>> pthread, and calling MPI_Test the one that crashes).
>>>
>>> The segmentation fault happens in
>>> src/mpid/ch3/channels/mrail/src/gen2/ibv_cuda_rndv.c
>>> in the MPIDI_CH3_CUDAIPC_Rendezvous_push function.
>>>
>>> Early in this function, there is some code like (simpliflying)
>>>        cudaStream_t strm = 0;
>>>         strm = stream_d2h;
>>> But stream_d2h was never created, and therefore strm contains a null
>>> pointer which later on triggers the seg fault.
>>>
>>> The crash only happens with VAPI_PROTOCOL_CUDAIPC, as I also tested it
>>> with devices with non peer to peer capability, then the whole communication
>>> has to go via VAPI_PROTOCOL_R3, which seems to work, i.e. there is no crash
>>> and the progress happens.
>>>
>>> Am I missing something? perhaps someone already succeeded with this
>>> asynchronous progress on cuda device communication with a different
>>> approach?
>>>
>>> For reference, I am using mvapich2/1.9rc1 with the following configure
>>> ./configure--enable-threads=multiple --enable-shared
>>> --enable-sharedlibs=gcc --enable-fc --enable-cxx --with-mpe
>>> --enable-rdma-cm --enable-fast --enable-smpcoll --with-hwloc --enable-xrc
>>> --with-device=ch3:mrail --with-rdma=gen2 --enable-cuda --enable-g=dbg
>>> --enable-debuginfo --enable-async-progress CC=gcc CXX=g++ FC=gfortran
>>> F77=gfortran
>>>
>>>
>>> thanks for the help, Carlos
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>
>>
>>
>>   --
>> Devendar
>>
>
>
>
>  --
> Devendar
>
>
>
> --
> ----------------------------------------------------
>
> Carlos Osuna
> ETH Zürich
> Institute for Atmospheric and Climate Sciences
> Universitätstrasse 16
> CH-8092 Zurich, Switzerland
> Tel: +41 (44) 632 82 66
>
> ----------------------------------------------------
>
>


-- 
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130508/362323da/attachment-0001.html


More information about the mvapich-discuss mailing list