[mvapich-discuss] Announcing the Release of MVAPICH2-GDR 2.3.3 GA

Subramoni, Hari subramoni.1 at osu.edu
Thu Jan 9 17:14:57 EST 2020


Happy New Year!!

The MVAPICH team is pleased to announce the release of MVAPICH2-GDR 2.3.3 GA.

MVAPICH2-GDR 2.3.3 is based on the standard MVAPICH2 2.3.3 release and
incorporates designs that take advantage of the GPUDirect RDMA (GDR) technology
for inter-node data movement on NVIDIA GPUs clusters with Mellanox InfiniBand
interconnect. It also provides support for DGX-2, OpenPOWER and NVLink2,
efficient intra-node CUDA-Aware unified memory communication and support for
RDMA_CM, RoCE-V1, and RoCE-V2. Further, MVAPICH2-GDR 2.3.3 provides optimized
large message collectives (broadcast, reduce, and allreduce) for emerging Deep
Learning and Streaming frameworks.

Features, Enhancements, and Bug Fixes for MVAPICH2-GDR 2.3.3 GA are listed here.

* Features and Enhancements (Since 2.3.2)
    - Based on MVAPICH2 2.3.3
    - Support for GDRCopy v2.0
    - Enhanced datatype support for CUDA kernel-based Allreduce
    - Enhanced inter-node point-to-point performance for CUDA managed buffers
      on POWER9 system
    - Enhanced CUDA-Aware MPI_Allreduce on NVLink-enabled GPU systems.
    - Enhanced CUDA-Aware MPI_Pack and MPI_Unpack

Bug Fixes (Since 2.3.2)
    - Fix data validation in datatype support for device buffers
    - Fix segfault for MPI_Gather on device buffers
    - Fix segfault and hang on intra-node communication on device buffers
    - Fix issues in intra-node MPI_Bcast design on device buffers
    - Fix incorrect initialization of CUDA support for collective communication
    - Fix compilation warnings

Further, MVAPICH2-GDR 2.3.3 GA provides support on GPU-Cluster using regular
OFED (without GPUDirect RDMA).

MVAPICH2-GDR 2.3.3 GA continues to deliver excellent performance. It provides
inter-node Device-to-Device latency of 1.85 microsec (8 bytes) with CUDA 10.1
and Volta GPUs. On OpenPOWER platforms with NVLink2, it delivers up to 70.4 Gbps
unidirectional intra-node Device-to-Device bandwidth for large messages. On
DGX-2 platforms, it delivers up to 144.79 Gbps unidirectional intra-node
Device-to-Device bandwidth for large messages. More performance numbers are
available from the MVAPICH website (under Performance link).

For downloading MVAPICH2-GDR 2.3.3 GA and associated user guides,
quick start guide, please visit the following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedback, bug reports, hints for performance tuning, patches and
enhancements are welcome. Please post it to the mvapich-discuss mailing list
(mvapich-discuss at cse.ohio-state.edu).

Thanks,

The MVAPICH Team

PS: We are also happy to inform that the number of organizations using
MVAPICH2 libraries (and registered at the MVAPICH site) has crossed
3,050 worldwide (in 89 countries). The number of downloads from the
MVAPICH site has crossed 650,000 (0.6 million).  The MVAPICH team
would like to thank all its users and organizations!!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200109/670a8a2e/attachment.html>


More information about the mvapich-discuss mailing list