[EXTERNAL] Re: [mvapich-discuss] MVA1.9a --enable-cuda with ch3:socks compile errors

Mon Oct 29 12:16:50 EDT 2012

The patch solved the issue. Everything is now working as expected :-).

Thanks
Christian

On 10/25/2012 08:20 PM, sreeram potluri wrote:
> Hi Christian,
>
> [perseus.sandia.gov:mpi_rank_**0][cuda_stage_free] cudaMemcpy failed with
>> 11 at 1564
>>
> I assume you are using the CUDA 5.0 toolkit. There has been a change in
> behavior of device pointer detection in the driver and we had a patch to
> handle this in our MVAPICH2 1.8.1 release. The attached patch should fix
> this in the 1.9a version your using.
>
>
>> MPI_Init always attaches my processes to GPU 0 only. No matter whether or
>> not I set the device to something else via cudaSetDevice or not before
>> calling MPI_Init. Even adding a cudaThreadSynchronize before MPI_init
>> doesn't change that. For example then running this way with two processes
>> and telling them to get GPU 0 and 1 respectively, nvidia-smi reports three
>> contexts: two on GPU 0 and one on GPU 1. So I assume for rank 1 I got two
>> contexts one chosen by me and one chosen by MPI (which defaults to GPU 0).
>> But when MPI and I choose different GPUs the code crashes. It does that
>> even if just running 1 process (in which case no actual communication using
>> GPU buffers takes place).
>>
> MVAPICH2 should respect your device selection if it is done before
> MPI_Init. I have tried the test case below and I see that it is working as
> expected. I have pasted the corresponding output from nvidia-smi. Is this
> similar to what you are trying to do? If you can share your test case with
> which you are seeing issues, that will be great.
>
> -------------------------------------------------------
>      mydev = 1;
>      cudaSetDevice(mydev);
>
>      MPI_Init(&c,&v);
>
>      fprintf(stderr, "After MPI_Init \n");
>      sleep(30);
>
>      MPI_Finalize();
> -------------------------------------------------------
> nvidia-smi output
>
> +-----------------------------------------------------------------------------+
> | Compute processes:                                               GPU
> Memory |
> |  GPU       PID  Process name                                     Usage
>     |
> |=============================================================================|
> |    1     28448  ./test_event_destroy
>   51MB  |
> +-----------------------------------------------------------------------------+
> --------------------------------------------------------
>
> Best
> Sreeram Potluri
>
>
>>
>> On 10/24/2012 09:06 AM, sreeram potluri wrote:
>>
>>> You should be able to use MVAPICH2 within a node as long as libibverbs is
>>> available. I am assuming you are using 1.9a for this test too.
>>>
>>> Can you try using these options when you conifgure: --disable-rdmacm
>>> --disable-mcast
>>>
>>> If that does not work, can you give us more details on the issue you are
>>> facing?
>>>
>>> The designs for Internode GPU communication in MVAPICH2 take advantage of
>>> features offered by InfiniBand. There are no plans to move these to
>>> CH3-sock at this point.
>>>
>>> Sreeram Potluri
>>>
>>> On Wed, Oct 24, 2012 at 10:50 AM, Christian Trott<crtrott at sandia.gov>**
>>> wrote:
>>>
>>>   Thanks
>>>> thats what I thought. Do you know if I can compile for that interface on
>>>> my local workstation which does not have Infiniband? And if yes do you
>>>> have
>>>> a link to a list of stuff I need to install (just adding libibverbs via
>>>> yum
>>>> didn't seem to be sufficient. Also is support for the CH3:socks interface
>>>> for GPU to GPU transfer planned?
>>>> I am currently in the process of deciding whether or not I rely for
>>>> direct
>>>> CUDA support within MPI for a number of projects (currently just
>>>> evaluation
>>>> but potentially that would include Trilinos and LAMMPS from Sandia)
>>>> instead
>>>> of writing my own data shuffling stuff. My current status is that it
>>>> seems
>>>> that we got support on Infininband clusters by both MVAPICH2 and OpenMPI,
>>>> Cray seems to have something in release soon for their network, and
>>>> OpenMPI
>>>> seems to work on my local machine as well.
>>>>
>>>> Cheers
>>>> Christian
>>>>
>>>>
>>>>
>>>> On 10/24/2012 08:40 AM, sreeram potluri wrote:
>>>>
>>>>   Hi Christian
>>>>> GPU support is only available with the InfiniBand Gen2 (OFA-IB-CH3)
>>>>> Interface.
>>>>>
>>>>> Please refer to this sections of our userguide on how to build and run
>>>>>
>>>>> http://mvapich.cse.ohio-state.****edu/support/user_guide_**
>>>>> mvapich2-1.9a.html#x1-140004.**5<http://mvapich.cse.ohio-**
>>>>> state.edu/support/user_guide_**mvapich2-1.9a.html#x1-140004.5<http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a.html#x1-140004.5>
>>>>> **>
>>>>> http://mvapich.cse.ohio-state.****edu/support/user_guide_**
>>>>> mvapich2-1.9a.html#x1-780006.****18<http://mvapich.cse.ohio-**
>>>>> state.edu/support/user_guide_**mvapich2-1.9a.html#x1-780006.**18<http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a.html#x1-780006.18>
>>>>> Best
>>>>> Sreeram
>>>>>
>>>>> On Wed, Oct 24, 2012 at 10:19 AM, Christian Trott<crtrott at sandia.gov>**
>>>>> wrote:
>>>>>
>>>>>    Hi all
>>>>>
>>>>>> is it possible to use the GPU support with the CH3:socks interface?
>>>>>> When
>>>>>> I
>>>>>> try to compile the 1.9a release with
>>>>>> ./configure --enable-cuda --with-cuda=/opt/nvidia/cuda/******5.0.36/
>>>>>> --with-device=ch3:sock --prefix=/opt/mpi/mvapich2-1.*****
>>>>>> *9/intel-12.1/cuda5036
>>>>>> CC=/opt/intel/composer_xe_******2011_sp1.9.293/bin/intel64/icc
>>>>>>
>>>>>>
>>>>>>
>>>>>> I run into these errors:
>>>>>>
>>>>>> CC              ch3_isend.c
>>>>>> ch3_isend.c(20): error: a value of type "MPIDI_CH3_PktGeneric_t" cannot
>>>>>> be
>>>>>> assigned to an entity of type "void *"
>>>>>>          sreq->dev.pending_pkt = *(MPIDI_CH3_PktGeneric_t *) hdr;
>>>>>>
>>>>>>      CC              ch3_isendv.c
>>>>>> ch3_isendv.c(28): error: a value of type "MPIDI_CH3_PktGeneric_t"
>>>>>> cannot
>>>>>> be assigned to an entity of type "void *"
>>>>>>          sreq->dev.pending_pkt = *(MPIDI_CH3_PktGeneric_t *)
>>>>>> iov[0].MPID_IOV_BUF;
>>>>>>
>>>>>> Thanks for your help
>>>>>> Christian
>>>>>>
>>>>>> ______________________________******_________________
>>>>>> mvapich-discuss mailing list
>>>>>> mvapich-discuss at cse.ohio-****sta**te.edu<http://state.edu><
>>>>>> mvapich-discuss at cse.**ohio-**state.edu<http://ohio-state.edu><
>>>>>> mvapich-discuss at cse.**ohio-state.edu<mvapich-discuss at cse.ohio-state.edu>
>>>>>> http://mail.cse.ohio-state.******edu/mailman/listinfo/mvapich-**
>>>>>> ****discuss<
>>>>>> http://mail.cse.ohio-**state.**edu/mailman/listinfo/****
>>>>>> mvapich-discuss<http://state.edu/mailman/listinfo/**mvapich-discuss><
>>>>>> http://mail.**cse.ohio-state.edu/mailman/**listinfo/mvapich-discuss<http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>>>>>>
>>>>>>
>>>>>>
>>