[mvapich-discuss] Announcing the release of MVAPICH2 2.0 GA, MVAPICH2-X 2.0 GA and OSU Micro-Benchmarks (OMB) 4.3.1
Panda, Dhabaleswar
panda at cse.ohio-state.edu
Fri Jun 20 17:01:06 EDT 2014
The MVAPICH team is pleased to announce the release of MVAPICH2 2.0 GA,
MVAPICH2-X 2.0 GA (Hybrid MPI+PGAS (OpenSHMEM) with Unified
Communication Runtime) and OSU Micro-Benchmarks (OMB) 4.3.1.
Features, Enhancements, and Bug Fixes for MVAPICH2 2.0 are
listed here.
* Features and Enhancements (since MVAPICH2 1.9 GA):
- Based on MPICH-3.1
- Extended support for MPI-3 RMA in OFA-IB-CH3, OFA-IWARP-CH3,
and OFA-RoCE-CH3 interfaces
- RMA optimizations for shared memory and atomic operations
- Optimized communication when using MPI_Win_allocate for
OFA-IB-CH3 channel
- MPI-3 RMA support for CH3-PSM channel
- Support MPI_T Performance and Control variables
- Optimized and tuned blocking and non-blocking collectives
for OFA-IB-CH3, OFA-IB-Nemesis, and CH3-PSM channels
- Large message transfer support for PSM interface
- CMA support is now enabled by default
- Enhanced intra-node SMP performance
- Tuned SMP eager threshold parameters
- Tuning RGET and Atomics operations
- Dynamic CUDA initialization. Support GPU device selection
after MPI_Init and initialize GPU resources only when
used by MPI transfer
- Support for running on heterogeneous clusters with GPU and non-GPU nodes
- Multi-rail support for GPU communication
- Non-blocking streams in asynchronous CUDA transfers for better overlap
- Optimized sub-array data-type processing for GPU-to-GPU communication
- Added options to specify CUDA library paths
- Tuned RDMA FP-based communication
- Tuning for Ivy-Bridge architecture
- Tuning for Mellanox Connect-IB adapters
- Reduced memory footprint
- Improved job-startup performance for large-scale mpirun_rsh jobs
- Introduced retry mechanism in mpirun_rsh for socket binding
- Capability to checkpoint CH3 channel using the Hydra process manager
- Warn and continue when ptmalloc fails to initialize
- Enable hierarchical SSH-based startup with Checkpoint-Restart
- Multi-rail support for UD-Hybrid channel
- Updated compiler wrappers to remove application dependency on
network and other extra libraries
(Thanks to Adam Moody from LLNL for the suggestion)
- Deprecation of uDAPL-CH3 channel
- Updated to hwloc v1.9
* Bug-Fixes (since MVAPICH2 1.9 GA):
- Fix data validation issue with MPI_Bcast
- Thanks to Claudio J. Margulis from University of Iowa for the report
- Fix issue with very large message (>2GB bytes) MPI_Bcast
- Thanks to Lu Qiyue for the report
- Fix multicast hang when there is a single process on one node
and more than one process on other nodes
- Fix for bcastzero type hang during finalize
- Fix non-power-of-two usage of scatter-doubling-allgather algorithm
- Fix issues related to large message transfers for OFA-IB-CH3
and OFA-IB-Nemesis channels
- Fix buffer alignment for large message shared memory transfers
- Initialize using better defaults for ibv_modify_QPI (initial ring)
- Enhanced handling of failures in RDMA_CM based connection establishment
- Fix for hangs in connection setup and finalize when using RDMA_CM
- Fix warning in job launch, when using DPM
- Fix issues in Nemesis interface with --with-ch3-rank-bits=32
- Better cleanup of XRC files in corner cases
- Fix a flow-control bug in UD transport
- Thanks to Benjamin M. Auer from NASA for the report
- Fix issues related to MPI-3 RMA locks
- Fix a bug in One-Sided shared memory backed windows
- Fix an issue related to MPI-3 dynamic window
- Fix issues related to MPI_Win_allocate backed by shared memory
- Fix bugs with MPI-3 RMA in Nemesis IB interface
- Fix an issue related to MPI atomic operations on HCAs without
atomics support
- Handle case where $HOME is not set during search for MV2 user config file
- Thanks to Adam Moody from LLNL for the patch
- Fix compilation error with --enable-g=all in PSM interface
- Prevent printing out inter-node runtime parameters for pure
intra-node runs
- Thanks to Jerome Vienne from TACC for the report
- MPI_Get_library_version updated with proper MVAPICH2 branding
- Thanks to Jerome Vienne from the TACC for the report
- Finish receive request when RDMA READ completes in RGET protocol
- Always use direct RDMA when flush is used
- Fixed an issue related to selection of compiler. (We prefer
the GNU, Intel, PGI, and Ekopath compilers in that order).
- Thanks to Uday R Bondhugula from IISc for the report
- Fix an issue in message coalescing
- Consider list provided by MV2_IBA_HCA when scanning device list
- Add unconditional check and addition of pthread library
- Fix an issue related to ordering of messages for GPU-to-GPU transfers
- Fix multiple warnings and memory leaks
MVAPICH2-X 2.0 software package provides support for hybrid
MPI+PGAS (UPC and OpenSHMEM) programming models with unified
communication runtime for emerging exascale systems. This software
package provides flexibility for users to write applications using the
following programming models with a unified communication runtime:
MPI, MPI+OpenMP, pure UPC, and pure OpenSHMEM programs as well as
hybrid MPI(+OpenMP) + PGAS (UPC and OpenSHMEM) programs.
Features and enhancements for MVAPICH2-X 2.0 are as follows:
* Features and Enhancements (since MVAPICH2-X 1.9 GA):
- MPI Features
- Based on MVAPICH2 2.0 (OFA-IB-CH3 interface)
- Unified Runtime Features
- Based on MVAPICH2 2.0 (OFA-IB-CH3 interface). All the
runtime features enabled by default in OFA-IB-CH3 interface
of MVAPICH2 2.0 are available in MVAPICH2-X 2.0
- OpenSHMEM Features
- Based on OpenSHMEM reference implementation 1.0f
- Improved intra-node communication performance using
Shared memory and Cross Memory Attach (CMA)
- Optimized OpenSHMEM collectives (improved performance for
shmem_collect, shmem_fcollect, shmem_barrier,
shmem_reduce, and shmem_broadcast)
- Optimized shmalloc routine
- UPC Features
- Based on Berkeley UPC 2.18.0 (contains changes/additions
in preparation for upcoming UPC 1.3 specification)
- Optimized UPC collectives (improved performance for
upc_all_broadcast, upc_all_scatter, upc_all_gather,
upc_all_gather_all, and upc_all_exchange)
- Support for GUPC translator
* Bug Fixes (since MVAPICH2-X 1.9 GA):
- OpenSHMEM Bug Fixes
- Fix an issue related to atomics on HCAs without atomics support
- Fixed synchronization issue in shmem_fence
- Fixed issue in shmem_collect which prevented variable
length collect routine
Bug Fixes for OSU Micro-Benchmarks (OMB) 4.3.1 are
listed here.
* Bug Fixes (since OMB 4.3)
- Fix typo in MPI collective benchmark help message
- Explicitly mention that -m and -M parameters are specified in bytes
Various performance numbers for MVAPICH2 2.0 and MVAPICH2-X 2.0
on different platforms and system configurations can be viewed
by visiting `Performance' section of the project's web page.
For downloading MVAPICH2 2.0 and MVAPICH2-X 2.0, associated user
guides, quick start guide, and accessing the SVN, please visit the
following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
Thanks,
The MVAPICH Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140620/c710e4be/attachment.html>
More information about the mvapich-discuss
mailing list