[mvapich-discuss] Error registering memory with CUDA

Adam T. Moody moody20 at llnl.gov
Thu Aug 8 19:44:55 EDT 2013


Hi Sreeram,
As far as we can tell, different procs are picking devices, and the GPU 
is in the correct mode.  However, one clue that we uncovered is that 
forcing procs to sleep for different amounts of time before registering 
helps.  It seems the problem is a race condition when two procs call 
cudaHostRegister at the same time.  If we force a delay between procs, 
there is no error.  Any idea what's going on here?
-Adam

 
sreeram potluri wrote:

>Hi Adam,
>
>I have seen this error earlier when a user tries to share a GPU between two
>processes but the GPU is set in thread exclusive or process exclusive mode.
>Can you check with the user if this is the case?
>
>This can also happen in other cases like when devices are not iniitalized
>properly using deviceQuery. However, I suspect that earlier is the case.
>
>Best
>Sreeram Potluri
>
>On Fri, Jul 19, 2013 at 8:49 PM, Adam T. Moody <moody20 at llnl.gov> wrote:
>
>  
>
>>Hello MVAPICH team,
>>Someone is running on a system using MVAPICH2-1.9 with CUDA enabled, but
>>he is sometimes (90% of his runs) failing with the following error.
>>
>>[edge42:mpi_rank_0][ibv_cuda_**register] src/mpid/ch3/channels/mrail/**src/gen2/ibv_cuda_util.c:704:
>>cudaHostRegister Failed
>>
>>    
>>
>>>>>>>[edge42:mpi_rank_1][ibv_cuda_**register]
>>>>>>>              
>>>>>>>
>>src/mpid/ch3/channels/mrail/**src/gen2/ibv_cuda_util.c:704:
>>cudaHostRegister Failed
>>
>>    
>>
>>>>>>>[edge63:mpi_rank_2][ibv_cuda_**register]
>>>>>>>              
>>>>>>>
>>src/mpid/ch3/channels/mrail/**src/gen2/ibv_cuda_util.c:704:
>>cudaHostRegister Failed
>>
>>    
>>
>>>>>>>[edge63:mpi_rank_3][ibv_cuda_**register]
>>>>>>>              
>>>>>>>
>>src/mpid/ch3/channels/mrail/**src/gen2/ibv_cuda_util.c:704:
>>cudaHostRegister Failed
>>
>>Have you seen this before?  Do you know why it might happen?
>>Thanks,
>>-Adam
>>______________________________**_________________
>>mvapich-discuss mailing list
>>mvapich-discuss at cse.ohio-**state.edu <mvapich-discuss at cse.ohio-state.edu>
>>http://mail.cse.ohio-state.**edu/mailman/listinfo/mvapich-**discuss<http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>>
>>
>>
>>    
>>
>
>  
>



More information about the mvapich-discuss mailing list