[mvapich-discuss] Performance of CUDA Managed Memory and Device Memory for GDR 2.3a

Ammar Ahmad Awan ammar.ahmad.awan at gmail.com
Thu Jan 11 10:47:34 EST 2018


Hi Yussuf,

Can you please share the details of your system? Is this an OpenPOWER or an
x86 system?

It will be helpful if you can share the output of 'nvidia-smi topo -m' as
well.

Regards,
Ammar


On Thu, Jan 11, 2018 at 2:18 AM, Yussuf Ali <Yussuf.ali at jaea.go.jp> wrote:

> Dear MVAPICH2 developers and users,
>
>
>
> I measured the intra node performance of our GPU cluster system(4 x NVIDIA
> Tesla P100-SXM2-16GB, CUDA 8.0) with the osu bi-directional bandwidth
> benchmark with the current MVAPICH-GDR 2.3a version.
>
>
>
> I executed the benchmark for:
>
>   Device Memory  <-> Device Memory
>
> and
>
>   Managed Memory <-> Managed Memory
>
>
>
> The following environment variables were set during both benchmarks in the
> PBS script:
>
> _______________________________________________
>
> export MV2_USE_CUDA=1
>
> export MV2_GPUDIRECT_GDRCOPY_LIB=./libgdrapi.so
>
> export MV2_USE_GPUDIRECT=1
>
> export MV2_GPUDIRECT_GDRCOPY=1
>
> export MV2_USE_GPUDIRECT_GDRCOPY=1
>
> export MV2_CUDA_IPC=1
>
> export MV2_CUDA_ENABLE_MANAGED=1
>
> export MV2_CUDA_MANAGED_IPC=1
>
>
>
> I obtained the following results:
>
>
>
>                 M<->M               D<->D
>
> 1                3.1                           1.1
>
> 2                6.1                           2.2
>
> 4               12.3                    4.4
>
> 8               24.6                    8.9
>
> 16            49.3                  17.4
>
> 32            95.3                  17.2
>
> 64           182.0                 34.0
>
> 128         373.7                 67.3
>
> 256         663.5               130.9
>
> 512        1,211.0             250.0
>
> 1,024               1,927.6             406.9
>
> 2,048               2,490.1             653.1
>
> 4,096               3,116.4             488.6
>
> 8,192                5,528.9            481.6
>
> 16,384             8,980.7         2,528.6
>
> 32,768             1,118.2         6,553.0
>
> 65,536             2,178.6        12,729.1
>
> 131,072      4,026.9        18,738.3
>
> 262,144      6,930.5        26,631.6
>
> 524,288    10,566.6       28,645.9
>
> 1,048,576   9,229.6       32,114.8
>
> 2,097,152   8,908.8       32,776.5
>
> 4,194,304   8,818.7       33,884.9
>
>
>
> It seems that for messages sizes up to 16,384 bytes Managed Memory
> performs better than Device Memory.
>
> For message sizes larger or equal to 32,768 bytes Device Memory achieves a
> higher performance.
>
>
>
> Is there a way to tune Managed Memory performance in order to get the same
> performance
>
> as Device Memory for messages sizes larger or equal to 32,768 bytes?
> Because for convenience we
>
> would like to use CUDA Managed Memory.
>
>
>
> Thank you for your help,
>
> Yussuf
>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180111/75b98ee3/attachment-0001.html>


More information about the mvapich-discuss mailing list