[mvapich-discuss] Problem with mvapich2 using CUDA

Devendar Bureddy bureddy at cse.ohio-state.edu
Wed Feb 8 10:46:41 EST 2012


Hi Michael

It is good to know that things are running fine now with MVAPICH2 CUDA
support. Thanks for the suggestion. we will improve the error messages
in future releases.

-Devendar

On Wed, Feb 8, 2012 at 5:04 AM, Michael Haidl
<michael.haidl at uni-muenster.de> wrote:
> Hi Devendar,
>
> first of all thank you for your quick reply. It is not possible to share the
> my code so i tried to reproduce the error. I wasn't able to do so. Same
> amount of Isend and Irecv calls, same amount of data sended from node to
> node. So i went back modifying my simulation and found an "unspecified
> launch error" coming from on of my kernels.  Since this errors are
> asynchronous and mvapich2 checks against cudaSuccess the lib caught this
> error and prompted the:
>
>
> [4] Abort: Cuda Stream Creation failed
>
> Now I cann't see any problem with mvapich2 and CUDA support. It does what it
> has to. May I suggest that you print the error code regarding to any CUDA
> API call with the mvapich2 error messages? This maybe could help debugging
> such problems.
>
> Tanks again.
>
> Michael Haidl
>
>> Hi Michael
>>
>> Thanks for trying out MVAPICH2 GPU features.  It looks like CUDA
>> resources( streams/events) are running out here.  Is this happening if
>> you run 2 processes per node (i.e -np 4)  with increased problem size?
>> If you are using MPI_Irecv and MPI_Isend, are you progressing on them
>> for the completion? Is it possible for you to share your program so
>> that we can better analyze the reasons for these resource limitations?
>>
>> -Devendar
>> On Tue, Feb 7, 2012 at 12:56 PM, Michael Haidl<michael.haidl at gmx.de>
>>  wrote:
>>>
>>> I have the following problem:
>>>
>>> I am running a simulation with mvapich2 and CUDA support on 2 nodes. Each
>>> node has 5 GPUs. The nodes are connected via InfiniBand. With a small
>>> problem size the simulation must send 1,2MB from process to process every
>>> loop (~76 loops per second). This works! If I increase the problem size,
>>> which also increases the amount of data transferred (now: 15,2 MB per
>>> loop
>>> with ~ 3 loops per second) if get the following:
>>>
>>> [4] Abort: Cuda Stream Creation failed
>>>  at line 73 in file ibv_cuda_stream.c
>>>
>>> reproduce able after ~ 740 loops.
>>>
>>> I tried MV2_CUDA_EVENT_SYNC to with the same problem but Event Creation
>>> failed not Stream Creation.
>>>
>>> My start-up command looks like this:
>>> mpirun_rsh -hostfile hosts -np 10 MV2_USE_CUDA=1 MV2_ENABLE_AFFINITY=1
>>> ./sim.x --mpi
>>>
>>> Any advice would be highly appreciated.
>>>
>>> Michael Haidl
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>



More information about the mvapich-discuss mailing list