[mvapich-discuss] Announcing the release of MVAPICH2-GDR 2.0 and OSU Micro-Bencmarks (OMB) 4.4
Panda, Dhabaleswar
panda at cse.ohio-state.edu
Sun Aug 24 01:46:31 EDT 2014
The MVAPICH team is pleased to announce the release of MVAPICH2-GDR
2.0 and OSU Micro-Benchmarks (OMB) 4.4.
MVAPICH2-GDR 2.0 is based on the standard MVAPICH2 2.0 release and
incorporates designs that take advantage of the new GPUDirect RDMA
technology for inter-node data movement on NVIDIA GPUs clusters with
Mellanox InfiniBand interconnect.
Features, Enhancements, and Bug Fixes for MVAPICH2-GDR 2.0 are listed
here.
* Features and Enhancements (since MVAPICH2-GDR 2.0b):
- Based on MVAPICH2 2.0 (OFA-IB-CH3 interface)
- Support for MPI-3 RMA (one Sided) communication and atomics
using GPU buffers with GPUDirect RDMA and pipelining
- Efficient small message inter-node communication using the new
NVIDIA GDRCOPY module
- Efficient small message inter-node communication using loopback design
- Automatic communication channel selection for different GPU
communication modes (DD, HH and HD) in different configurations
(intra-IOH and inter-IOH)
- Automatic selection and binding of the best GPU/HCA pair in a
multi-GPU/HCA system configuration
- Optimized and tuned support for collective communication from
GPU buffers
- Enhanced and Efficient support for datatype processing on GPU buffers
including support for vector/h-vector, index/h-index, array and
subarray
* Bug-Fixes (since MVAPICH2-GDR 2.0b):
- Fix a bug in the registration cache of GPU buffers
MVAPICH2-GDR 2.0 release requires the following software to be
installed on your system:
- Mellanox OFED 2.1 or later
- NVIDIA Driver 331.20 or later
- NVIDIA CUDA Toolkit 6.0 or later
- Plugin module to enable GPUDirect RDMA
- (Strongly recommended) NVIDIA GDRCOPY module
New features, enhancements and bug fixes for OSU Micro-Benchmarks
(OMB) 4.4 are listed here.
* New Features & Enhancements (since OMB 4.3.1)
- Support for MPI-3 RMA (one-sided) and atomic operations
using GPU buffers
* osu_acc_latency
* osu_cas_latency
* osu_fop_latency
* osu_get_bw
* osu_get_latency
* osu_put_bibw
* osu_put_bw
* osu_put_latency
* Bug Fixes
- remove use of AC_FUNC_MALLOC to avoid undefined rpl_malloc
reference
- add missing upc benchmarks for make dist rule
Sample performance numbers using MVAPICH2-GDR 2.0 are as follows:
- GPU-GPU (D-D) two-sided communication latency of 2.18 microsec (4 bytes)
- GPU-GPU (D-D) MPI-3 RMA (one-sided) Put latency of 2.88 microsec (4 bytes)
Additional performance numbers (including Hoomd-Blue performance on
Wilkes) can be viewed by visiting `Performance' section of the
project's web page.
For downloading MVAPICH2-GDR 2.0 and OMB 4.4, please visit the
following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedback, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
Thanks,
The MVAPICH Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140824/ccacccd2/attachment.html>
More information about the mvapich-discuss
mailing list