[EXTERNAL] Re: [mvapich-discuss] MVA1.9a --enable-cuda with ch3:socks compile errors

sreeram potluri potluri at cse.ohio-state.edu
Wed Oct 31 13:19:13 EDT 2012


Hi Christian,

Good to know that the patch helped.

For everyone's information, we have applied this patch to the MVAPICH2
trunk. The nightly tarballs of trunk are available for download at:

http://mvapich.cse.ohio-state.edu/nightly/mvapich2/trunk/

Sreeram Potluri

On Mon, Oct 29, 2012 at 12:16 PM, Christian Trott <crtrott at sandia.gov>wrote:

> The patch solved the issue. Everything is now working as expected :-).
>
> Thanks
> Christian
>
> On 10/25/2012 08:20 PM, sreeram potluri wrote:
>
>> Hi Christian,
>>
>> [perseus.sandia.gov:mpi_rank_****0][cuda_stage_free] cudaMemcpy failed
>> with
>>
>>  11 at 1564
>>>
>>>  I assume you are using the CUDA 5.0 toolkit. There has been a change in
>> behavior of device pointer detection in the driver and we had a patch to
>> handle this in our MVAPICH2 1.8.1 release. The attached patch should fix
>> this in the 1.9a version your using.
>>
>>
>>  MPI_Init always attaches my processes to GPU 0 only. No matter whether or
>>> not I set the device to something else via cudaSetDevice or not before
>>> calling MPI_Init. Even adding a cudaThreadSynchronize before MPI_init
>>> doesn't change that. For example then running this way with two processes
>>> and telling them to get GPU 0 and 1 respectively, nvidia-smi reports
>>> three
>>> contexts: two on GPU 0 and one on GPU 1. So I assume for rank 1 I got two
>>> contexts one chosen by me and one chosen by MPI (which defaults to GPU
>>> 0).
>>> But when MPI and I choose different GPUs the code crashes. It does that
>>> even if just running 1 process (in which case no actual communication
>>> using
>>> GPU buffers takes place).
>>>
>>>  MVAPICH2 should respect your device selection if it is done before
>> MPI_Init. I have tried the test case below and I see that it is working as
>> expected. I have pasted the corresponding output from nvidia-smi. Is this
>> similar to what you are trying to do? If you can share your test case with
>> which you are seeing issues, that will be great.
>>
>> ------------------------------**-------------------------
>>      mydev = 1;
>>      cudaSetDevice(mydev);
>>
>>      MPI_Init(&c,&v);
>>
>>      fprintf(stderr, "After MPI_Init \n");
>>      sleep(30);
>>
>>      MPI_Finalize();
>> ------------------------------**-------------------------
>> nvidia-smi output
>>
>> +-----------------------------**------------------------------**
>> ------------------+
>> | Compute processes:                                               GPU
>> Memory |
>> |  GPU       PID  Process name                                     Usage
>>     |
>> |=============================**==============================**
>> ==================|
>> |    1     28448  ./test_event_destroy
>>   51MB  |
>> +-----------------------------**------------------------------**
>> ------------------+
>> ------------------------------**--------------------------
>>
>> Best
>> Sreeram Potluri
>>
>>
>>
>>> On 10/24/2012 09:06 AM, sreeram potluri wrote:
>>>
>>>  You should be able to use MVAPICH2 within a node as long as libibverbs
>>>> is
>>>> available. I am assuming you are using 1.9a for this test too.
>>>>
>>>> Can you try using these options when you conifgure: --disable-rdmacm
>>>> --disable-mcast
>>>>
>>>> If that does not work, can you give us more details on the issue you are
>>>> facing?
>>>>
>>>> The designs for Internode GPU communication in MVAPICH2 take advantage
>>>> of
>>>> features offered by InfiniBand. There are no plans to move these to
>>>> CH3-sock at this point.
>>>>
>>>> Sreeram Potluri
>>>>
>>>> On Wed, Oct 24, 2012 at 10:50 AM, Christian Trott<crtrott at sandia.gov>**
>>>> wrote:
>>>>
>>>>   Thanks
>>>>
>>>>> thats what I thought. Do you know if I can compile for that interface
>>>>> on
>>>>> my local workstation which does not have Infiniband? And if yes do you
>>>>> have
>>>>> a link to a list of stuff I need to install (just adding libibverbs via
>>>>> yum
>>>>> didn't seem to be sufficient. Also is support for the CH3:socks
>>>>> interface
>>>>> for GPU to GPU transfer planned?
>>>>> I am currently in the process of deciding whether or not I rely for
>>>>> direct
>>>>> CUDA support within MPI for a number of projects (currently just
>>>>> evaluation
>>>>> but potentially that would include Trilinos and LAMMPS from Sandia)
>>>>> instead
>>>>> of writing my own data shuffling stuff. My current status is that it
>>>>> seems
>>>>> that we got support on Infininband clusters by both MVAPICH2 and
>>>>> OpenMPI,
>>>>> Cray seems to have something in release soon for their network, and
>>>>> OpenMPI
>>>>> seems to work on my local machine as well.
>>>>>
>>>>> Cheers
>>>>> Christian
>>>>>
>>>>>
>>>>>
>>>>> On 10/24/2012 08:40 AM, sreeram potluri wrote:
>>>>>
>>>>>   Hi Christian
>>>>>
>>>>>> GPU support is only available with the InfiniBand Gen2 (OFA-IB-CH3)
>>>>>> Interface.
>>>>>>
>>>>>> Please refer to this sections of our userguide on how to build and run
>>>>>>
>>>>>> http://mvapich.cse.ohio-state.******edu/support/user_guide_**
>>>>>> mvapich2-1.9a.html#x1-140004.****5<http://mvapich.cse.ohio-**
>>>>>> state.edu/support/user_guide_****mvapich2-1.9a.html#x1-140004.**5<http://state.edu/support/user_guide_**mvapich2-1.9a.html#x1-140004.5>
>>>>>> <http://mvapich.cse.ohio-**state.edu/support/user_guide_**
>>>>>> mvapich2-1.9a.html#x1-140004.5<http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a.html#x1-140004.5>
>>>>>> **>
>>>>>> **>
>>>>>> http://mvapich.cse.ohio-state.******edu/support/user_guide_**
>>>>>> mvapich2-1.9a.html#x1-780006.******18<http://mvapich.cse.ohio-****
>>>>>> state.edu/support/user_guide_****mvapich2-1.9a.html#x1-780006.****18<http://state.edu/support/user_guide_**mvapich2-1.9a.html#x1-780006.**18>
>>>>>> <http://mvapich.cse.ohio-**state.edu/support/user_guide_**
>>>>>> mvapich2-1.9a.html#x1-780006.**18<http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-1.9a.html#x1-780006.18>
>>>>>> >
>>>>>>
>>>>>> Best
>>>>>> Sreeram
>>>>>>
>>>>>> On Wed, Oct 24, 2012 at 10:19 AM, Christian Trott<crtrott at sandia.gov
>>>>>> >**
>>>>>> wrote:
>>>>>>
>>>>>>    Hi all
>>>>>>
>>>>>>  is it possible to use the GPU support with the CH3:socks interface?
>>>>>>> When
>>>>>>> I
>>>>>>> try to compile the 1.9a release with
>>>>>>> ./configure --enable-cuda --with-cuda=/opt/nvidia/cuda/***
>>>>>>> *****5.0.36/
>>>>>>> --with-device=ch3:sock --prefix=/opt/mpi/mvapich2-1.*******
>>>>>>> *9/intel-12.1/cuda5036
>>>>>>> CC=/opt/intel/composer_xe_********2011_sp1.9.293/bin/intel64/**icc
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I run into these errors:
>>>>>>>
>>>>>>> CC              ch3_isend.c
>>>>>>> ch3_isend.c(20): error: a value of type "MPIDI_CH3_PktGeneric_t"
>>>>>>> cannot
>>>>>>> be
>>>>>>> assigned to an entity of type "void *"
>>>>>>>          sreq->dev.pending_pkt = *(MPIDI_CH3_PktGeneric_t *) hdr;
>>>>>>>
>>>>>>>      CC              ch3_isendv.c
>>>>>>> ch3_isendv.c(28): error: a value of type "MPIDI_CH3_PktGeneric_t"
>>>>>>> cannot
>>>>>>> be assigned to an entity of type "void *"
>>>>>>>          sreq->dev.pending_pkt = *(MPIDI_CH3_PktGeneric_t *)
>>>>>>> iov[0].MPID_IOV_BUF;
>>>>>>>
>>>>>>> Thanks for your help
>>>>>>> Christian
>>>>>>>
>>>>>>> ______________________________********_________________
>>>>>>> mvapich-discuss mailing list
>>>>>>> mvapich-discuss at cse.ohio-******sta**te.edu<http://state.edu><
>>>>>>> mvapich-discuss at cse.**ohio-**s**tate.edu <http://state.edu><
>>>>>>> http://ohio-state.edu**><
>>>>>>> mvapich-discuss at cse.**ohio-**state.edu <http://ohio-state.edu><
>>>>>>> mvapich-discuss at cse.**ohio-state.edu<mvapich-discuss at cse.ohio-state.edu>
>>>>>>> >
>>>>>>> http://mail.cse.ohio-state.********edu/mailman/listinfo/**mvapich-**
>>>>>>> ****discuss<
>>>>>>> http://mail.cse.ohio-**state.****edu/mailman/listinfo/****
>>>>>>> mvapich-discuss<http://state.**edu/mailman/listinfo/****
>>>>>>> mvapich-discuss<http://state.edu/mailman/listinfo/**mvapich-discuss>
>>>>>>> ><
>>>>>>> http://mail.**cse.ohio-state.**edu/mailman/**listinfo/**
>>>>>>> mvapich-discuss<http://cse.ohio-state.edu/mailman/**listinfo/mvapich-discuss>
>>>>>>> <http://mail.**cse.ohio-state.edu/mailman/**listinfo/mvapich-discuss<http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121031/a70256d9/attachment-0001.html


More information about the mvapich-discuss mailing list