[mvapich-discuss] Announcing the release of MVAPICH2-GDR 2.0 and OSU Micro-Bencmarks (OMB) 4.4

Panda, Dhabaleswar panda at cse.ohio-state.edu
Sun Aug 24 01:46:31 EDT 2014


The MVAPICH team is pleased to announce the release of MVAPICH2-GDR
2.0 and OSU Micro-Benchmarks (OMB) 4.4.

MVAPICH2-GDR 2.0 is based on the standard MVAPICH2 2.0 release and
incorporates designs that take advantage of the new GPUDirect RDMA
technology for inter-node data movement on NVIDIA GPUs clusters with
Mellanox InfiniBand interconnect.

Features, Enhancements, and Bug Fixes for MVAPICH2-GDR 2.0 are listed
here.

* Features and Enhancements (since MVAPICH2-GDR 2.0b):

  - Based on MVAPICH2 2.0 (OFA-IB-CH3 interface)
  - Support for MPI-3 RMA (one Sided) communication and atomics
    using GPU buffers with GPUDirect RDMA and pipelining
  - Efficient small message inter-node communication using the new
    NVIDIA GDRCOPY module
  - Efficient small message inter-node communication using loopback design
  - Automatic communication channel selection for different GPU
    communication modes (DD, HH and HD) in different configurations
    (intra-IOH and inter-IOH)
  - Automatic selection and binding of the best GPU/HCA pair in a
    multi-GPU/HCA system configuration
  - Optimized and tuned support for collective communication from
    GPU buffers
  - Enhanced and Efficient support for datatype processing on GPU buffers
    including support for vector/h-vector, index/h-index, array and
    subarray

* Bug-Fixes (since MVAPICH2-GDR 2.0b):
  - Fix a bug in the registration cache of GPU buffers

MVAPICH2-GDR 2.0 release requires the following software to be
installed on your system:

  - Mellanox OFED 2.1 or later
  - NVIDIA Driver 331.20 or later
  - NVIDIA CUDA Toolkit 6.0 or later
  - Plugin module to enable GPUDirect RDMA
  - (Strongly recommended) NVIDIA GDRCOPY module

New features, enhancements and bug fixes for OSU Micro-Benchmarks
(OMB) 4.4 are listed here.

* New Features & Enhancements (since OMB 4.3.1)
  - Support for MPI-3 RMA (one-sided) and atomic operations
      using GPU buffers
        * osu_acc_latency
        * osu_cas_latency
        * osu_fop_latency
        * osu_get_bw
        * osu_get_latency
        * osu_put_bibw
        * osu_put_bw
        * osu_put_latency

* Bug Fixes
  - remove use of AC_FUNC_MALLOC to avoid undefined rpl_malloc
    reference
  - add missing upc benchmarks for make dist rule

Sample performance numbers using MVAPICH2-GDR 2.0 are as follows:

  - GPU-GPU (D-D) two-sided communication latency of 2.18 microsec (4 bytes)

  - GPU-GPU (D-D) MPI-3 RMA (one-sided) Put latency of 2.88 microsec (4 bytes)

Additional performance numbers (including Hoomd-Blue performance on
Wilkes) can be viewed by visiting `Performance' section of the
project's web page.

For downloading MVAPICH2-GDR 2.0 and OMB 4.4, please visit the
following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedback, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).

Thanks,

The MVAPICH Team


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140824/ccacccd2/attachment.html>


More information about the mvapich-discuss mailing list