[mvapich-discuss] code fails running on GPUs attached to different sockets
Sebastiano Fabio Schifano
schifano at fe.infn.it
Mon Apr 18 10:16:20 EDT 2016
Hi Hari,
NO, we are using the latest MVAPICH2-GDR version 2.2b. Any other
idea/suggestion ?
Ragards,
fabio
On 04/18/2016 02:58 PM, Hari Subramoni wrote:
>
> Hello Fabio,
>
> Thanks for your note. Looks like you are using the regular MVAPICH2
> version for your GPU system. Many issues related to GPUs and their
> various code paths (including those involving QPIs) have been resolved
> in the MVAPICH2-GDR version. This version also has many other features
> for performance and scalability. We recommend this version to be used
> for GPU systems. Please use the latest MVAPICH2-GDR 2.2b release and
> let us know if you see any issues.
>
> Regards,
> Hari.
>
> On Apr 18, 2016 2:25 AM, "Sebastiano Fabio Schifano"
> <schifano at fe.infn.it <mailto:schifano at fe.infn.it>> wrote:
>
> Hi,
>
> we are experimenting some issues on running our "Lattice
> Boltzmann" code on a machine with
> - 2 CPU sockets E5-2630-v3 (Haswell class)
> - 4 K80 per socket
> - 1 IB card per socket
>
> The code updates at each iteration the halo boundaries of the
> lattice portion allocated on each GPU.
>
> The issue we are facing is the following:
>
> - running on two GPUs attached to the same CPU-socket the result
> is correct
> - running on two GPUs each attached to a different CPU-socket the
> result is wrong
>
> However, if we set MV2_USE_SHARED_MEM=0 the result is correct in
> both cases.
>
> Investigating more the problem we found that it happens only for
> select sizes of halos:
>
> - for halos size 8192 and 16384 (double values) the code fails
> - for halos size 2048,4096,9216,10240,12288 (double values) the
> result is correct
>
> Being LY the size of the halo, the size of MPI communication is
> 26*(LY+6)*8Bytes.
>
> We are running CentOS 7.2, MVAPICH2-2.2 and we are NOT using GDRCOPY.
>
> Any idea how to further investigate this problem ? Any suggestion
> is welcome.
>
> Best Regards
> fabio
>
>
> --
> ------------------------------------------------------------------
> Schifano Sebastiano Fabio
> Department of Mathematics and Computer Science - University of
> Ferrara
> c/o Polo Scientifico e Tecnologico, Edificio B stanza 208
> via Saragat 1, I-44122 Ferrara (Italy)
> Tel: +39 0532 97 4614 <tel:%2B39%200532%2097%204614>
> -------------------------------------------------------------------
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
--
------------------------------------------------------------------
Schifano Sebastiano Fabio
Department of Mathematics and Computer Science - University of Ferrara
c/o Polo Scientifico e Tecnologico, Edificio B stanza 208
via Saragat 1, I-44122 Ferrara (Italy)
Tel: +39 0532 97 4614
-------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160418/7f3ed047/attachment-0001.html>
More information about the mvapich-discuss
mailing list