[mvapich-discuss] Announcing the release of MVAPICH2-GDR 2.3a

Fri Nov 10 00:08:32 EST 2017

The MVAPICH team is pleased to announce the release of MVAPICH2-GDR
2.3a.

MVAPICH2-GDR 2.3a is based on the standard MVAPICH2 2.2 release and
incorporates designs that take advantage of the GPUDirect RDMA (GDR)
technology for inter-node data movement on NVIDIA GPUs clusters with
Mellanox InfiniBand interconnect. It also provides support for
OpenPower and NVLink, efficient intra-node CUDA-Aware unified memory
communication and support for RDMA_CM, RoCE-V1, and RoCE-V2. Further,
MVAPICH2-GDR 2.3a provides optimized large message collectives
(broadcast, reduce, and allreduce) for emerging Deep Learning and
Streaming frameworks.

Features, Enhancements, and Bug Fixes for MVAPICH2-GDR 2.3a are listed
here.

* Features and Enhancements (since MVAPICH2-GDR 2.2 GA)

    - Based on MVAPICH2 2.2
    - Support for CUDA 9.0
    - Support for Volta (V100) GPU
    - Support for OpenPOWER with NVLink
    - Efficient Multiple CUDA stream-based IPC communication for
      multi-GPU systems with and without NVLink
    - Enhanced performance of GPU-based point-to-point communication
    - Leverage Linux Cross Memory Attach (CMA) feature for enhanced
      host-based communication
    - Enhanced performance of MPI_Allreduce for GPU-resident data
    - InfiniBand Multicast (IB-MCAST) based designs for GPU-based
      broadcast and streaming applications
        - Basic support for IB-MCAST designs with GPUDirect RDMA
        - Advanced support for zero-copy IB-MCAST designs with GPUDirect RDMA
        - Advanced reliability support for IB-MCAST designs
    - Efficient broadcast designs for Deep Learning applications
    - Enhanced collective tuning on Xeon, OpenPOWER, and NVIDIA DGX-1 systems

* Bug Fixes (Since 2.2 GA)
    - Fix issue with MPI_Finalize when MV2_USE_GPUDIRECT=0
    - Fix data validation issue with GDRCOPY and Loopback
    - Fix issue with runtime error when MV2_USE_CUDA=0
    - Fix issue with MPI_Allreduce for R3 protocol
    - Fix warning message when GDRCOPY module cannot be used

MVAPICH2-GDR 2.3a release requires the following software to be
installed on your system:

  - Mellanox OFED 3.2 or later
  - NVIDIA Driver 367.48 or later
  - NVIDIA CUDA Toolkit 7.5/8.0/9.0
  - Plugin module to enable GPUDirect RDMA
  - [Strongly recommended] GDRCOPY kernel module and library

Further, MVAPICH2-GDR 2.3a provides support on GPU-Cluster using
regular OFED (without GPUDirect RDMA).

MVAPICH2-GDR 2.3a continues to deliver excellent performance. It
provides inter-node Device-to-Device latency of 1.88 microsec (8
bytes) with CUDA 9.0 and Volta GPUs. On OpenPOWER platforms with
NVLink, it delivers up to 34.4 Gbps unidirectional intra-node
Device-to-Device bandwidth for large messages. More performance
numbers are available from the MVAPICH website (under Performance
link).

For downloading MVAPICH2-GDR 2.3a, associated user guide, and sample
performance numbers please visit the following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedback, bug reports, hints for performance tuning,
and enhancements are welcome. Please post it to the mvapich-discuss
mailing list (mvapich-discuss at cse.ohio-state.edu).

Thanks,

The MVAPICH Team

PS: We are also happy to inform that the number of organizations using
MVAPICH2 libraries (and registered at the MVAPICH site) has crossed
2,840 worldwide (in 85 countries). The number of downloads from the
MVAPICH site has crossed 432,000 (0.43 million).  The MVAPICH team
would like to thank all its users and organizations!!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5084 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20171110/2ccd03d7/attachment.bin>