[mvapich-discuss] code fails running on GPUs attached to different sockets

khaled hamidouche hamidouc at cse.ohio-state.edu
Mon Apr 18 10:22:17 EDT 2016


Hi Sebastiano,

Can you please try this configurations one by one and see if they help
fixing the issue

1) - MV2_USE_GPUDIRECT_LOOPBACK=0
2) - MV2_CUDA_NONBLOCKING_STREAMS=0

Thanks

On Mon, Apr 18, 2016 at 10:16 AM, Sebastiano Fabio Schifano <
schifano at fe.infn.it> wrote:

> Hi Hari,
>
> NO, we are using the latest MVAPICH2-GDR version 2.2b. Any other
> idea/suggestion ?
>
> Ragards,
> fabio
>
>
> On 04/18/2016 02:58 PM, Hari Subramoni wrote:
>
> Hello Fabio,
>
> Thanks for your note. Looks like you are using the regular MVAPICH2
> version for your GPU system. Many issues related to GPUs and their various
> code paths (including those involving QPIs) have been resolved in the
> MVAPICH2-GDR version. This version also has many other features for
> performance and scalability. We recommend this version to be used for GPU
> systems. Please use the latest MVAPICH2-GDR 2.2b release and let us know if
> you see any issues.
>
> Regards,
> Hari.
> On Apr 18, 2016 2:25 AM, "Sebastiano Fabio Schifano" <schifano at fe.infn.it>
> wrote:
>
>> Hi,
>>
>> we are experimenting some issues on running our "Lattice Boltzmann" code
>> on a machine with
>> - 2 CPU sockets E5-2630-v3 (Haswell class)
>> - 4 K80 per socket
>> - 1 IB card per socket
>>
>> The code updates at each iteration the halo boundaries of the lattice
>> portion allocated on each GPU.
>>
>> The issue we are facing is the following:
>>
>> - running on two GPUs attached to the same CPU-socket the result is
>> correct
>> - running on two GPUs each attached to a different CPU-socket the result
>> is wrong
>>
>> However, if we set MV2_USE_SHARED_MEM=0 the result is correct in both
>> cases.
>>
>> Investigating more the problem we found that it happens only for select
>> sizes of halos:
>>
>> - for halos size 8192 and 16384 (double values) the code fails
>> - for halos size 2048,4096,9216,10240,12288 (double values) the result is
>> correct
>>
>> Being LY the size of the halo, the size of MPI communication is
>> 26*(LY+6)*8Bytes.
>>
>> We are running CentOS 7.2, MVAPICH2-2.2 and we are NOT using GDRCOPY.
>>
>> Any idea how to further investigate this problem ? Any suggestion is
>> welcome.
>>
>> Best Regards
>> fabio
>>
>>
>> --
>>   ------------------------------------------------------------------
>>   Schifano Sebastiano Fabio
>>   Department of Mathematics and Computer Science - University of Ferrara
>>   c/o Polo Scientifico e Tecnologico, Edificio B stanza 208
>>   via Saragat 1, I-44122 Ferrara (Italy)
>>   Tel: +39 0532 97 4614
>>   -------------------------------------------------------------------
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
>
> --
>  ------------------------------------------------------------------
>  Schifano Sebastiano Fabio
>  Department of Mathematics and Computer Science - University of Ferrara
>  c/o Polo Scientifico e Tecnologico, Edificio B stanza 208
>  via Saragat 1, I-44122 Ferrara (Italy)
>  Tel: +39 0532 97 4614
>  -------------------------------------------------------------------
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160418/0d4dc321/attachment.html>


More information about the mvapich-discuss mailing list