[mvapich-discuss] Error registering memory with CUDA
Adam T. Moody
moody20 at llnl.gov
Thu Aug 8 19:44:55 EDT 2013
Hi Sreeram,
As far as we can tell, different procs are picking devices, and the GPU
is in the correct mode. However, one clue that we uncovered is that
forcing procs to sleep for different amounts of time before registering
helps. It seems the problem is a race condition when two procs call
cudaHostRegister at the same time. If we force a delay between procs,
there is no error. Any idea what's going on here?
-Adam
sreeram potluri wrote:
>Hi Adam,
>
>I have seen this error earlier when a user tries to share a GPU between two
>processes but the GPU is set in thread exclusive or process exclusive mode.
>Can you check with the user if this is the case?
>
>This can also happen in other cases like when devices are not iniitalized
>properly using deviceQuery. However, I suspect that earlier is the case.
>
>Best
>Sreeram Potluri
>
>On Fri, Jul 19, 2013 at 8:49 PM, Adam T. Moody <moody20 at llnl.gov> wrote:
>
>
>
>>Hello MVAPICH team,
>>Someone is running on a system using MVAPICH2-1.9 with CUDA enabled, but
>>he is sometimes (90% of his runs) failing with the following error.
>>
>>[edge42:mpi_rank_0][ibv_cuda_**register] src/mpid/ch3/channels/mrail/**src/gen2/ibv_cuda_util.c:704:
>>cudaHostRegister Failed
>>
>>
>>
>>>>>>>[edge42:mpi_rank_1][ibv_cuda_**register]
>>>>>>>
>>>>>>>
>>src/mpid/ch3/channels/mrail/**src/gen2/ibv_cuda_util.c:704:
>>cudaHostRegister Failed
>>
>>
>>
>>>>>>>[edge63:mpi_rank_2][ibv_cuda_**register]
>>>>>>>
>>>>>>>
>>src/mpid/ch3/channels/mrail/**src/gen2/ibv_cuda_util.c:704:
>>cudaHostRegister Failed
>>
>>
>>
>>>>>>>[edge63:mpi_rank_3][ibv_cuda_**register]
>>>>>>>
>>>>>>>
>>src/mpid/ch3/channels/mrail/**src/gen2/ibv_cuda_util.c:704:
>>cudaHostRegister Failed
>>
>>Have you seen this before? Do you know why it might happen?
>>Thanks,
>>-Adam
>>______________________________**_________________
>>mvapich-discuss mailing list
>>mvapich-discuss at cse.ohio-**state.edu <mvapich-discuss at cse.ohio-state.edu>
>>http://mail.cse.ohio-state.**edu/mailman/listinfo/mvapich-**discuss<http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>>
>>
>>
>>
>>
>
>
>
More information about the mvapich-discuss
mailing list