[mvapich-discuss] Announcing the Release of MVAPICH2 1.9 GA,
MVAPICH2-X 1.9 GA and OSU Micro-Benchmarks (OMB) 4.0.1
Dhabaleswar Panda
panda at cse.ohio-state.edu
Mon May 6 23:54:06 EDT 2013
The MVAPICH team is pleased to announce the release of MVAPICH2 1.9 GA,
MVAPICH2-X 1.9 GA (Hybrid MPI+PGAS with UPC and OpenSHMEM support through
Unified Communication Runtime) and OSU Micro-Benchmarks (OMB) 4.0.1.
Features, Enhancements, and Bug Fixes for MVAPICH2 1.9 are listed
here.
* New Features and Enhancements (since MVAPICH2 1.8.1). (**) indicates
enhancement since 1.9RC1:
- Based on MPICH-3.0.3
- Support for all MPI-3 features
(Available for all interfaces: OFA-IB-CH3, OFA-iWARP-CH3,
OFA-RoCE-CH3, uDAPL-CH3, OFA-IB-Nemesis and PSM-CH3)
- Support for Mellanox Connect-IB HCA
- Adaptive number of registration cache entries based on job size
- Support for single copy intra-node communication using Linux supported
CMA (Cross Memory Attach)
- Provides flexibility for intra-node communication: shared memory,
LiMIC2, and CMA
- New version of LiMIC2 (v0.5.6)
- Provides support for unlocked ioctl calls
- Checkpoint/Restart using LLNL's Scalable Checkpoint/Restart Library (SCR)
- Using SCR version 1.1.8
- Support for application-level checkpointing
- Support for hierarchical system-level checkpointing
- Install utility scripts included with SCR
- Scalable UD-multicast-based designs for collectives
(Bcast, Allreduce and Scatter)
- LiMIC-based design for Gather collective
- Improved performance for shared-memory-aware collectives
(Reduce and Bcast)
- (**) Tuned Bcast, Alltoall, AllReduce, Allgather, Reduce, Scatter,
Reduce_Scatter, Allgatherv collectives
- Tuned MPI performance on Kepler GPUs
- Improved intra-node communication performance with GPU buffers
using pipelined design
- Improved inter-node communication performance with GPU buffers
with non-blocking CUDA copies
- Improved small message communication performance with
GPU buffers using CUDA IPC design
- Efficient vector, hindexed datatype processing on GPU buffers
- Improved automatic GPU device selection and CUDA context management
- Optimal communication channel selection for different
GPU communication modes (DD, DH and HD) in different
configurations (intra-IOH and inter-IOH)
- Provided option to use CUDA library call instead of CUDA driver to
check buffer pointer type
- Thanks to Christian Robert from Sandia for the suggestion
- Revamped Build system:
- Uses automake instead of simplemake
- Renamed "maint/updatefiles" to "autogen.sh"
- Allows for parallel builds ("make -j8" and similar)
- Improved job startup time
- A new runtime variable, MV2_HOMOGENEOUS_CLUSTER, for optimized
startup on homogeneous clusters
- Introduced option to export environment variables automatically with
mpirun_rsh
- Support for automatic detection of path to utilities used by
mpirun_rsh during configuration
- Utilities supported: rsh, ssh, xterm, TotalView
- Support for launching jobs on heterogeneous networks with mpirun_rsh
- Removed libibumad dependency for building the library
- Tuned thresholds for various architectures
- Set DAPL-2.0 as the default version for the uDAPL interface
- (**) Updated to hwloc v1.7
- Option to use IP address as a fallback if hostname
cannot be resolved
- Introduced MV2_RDMA_CM_CONF_FILE_PATH parameter which specifies
path to mv2.conf
- Improved debug messages and error reporting
* Bug Fixes (since 1.9RC1):
- Fix cuda context issue with async progress thread
- Thanks to Osuna Escamilla Carlos from env.ethz.ch for the report
- Overwrite pre-existing PSM environment variables
- Thanks to Adam Moody from LLNL for the patch
- Fix several warnings
- Thanks to Adam Moody from LLNL for some of the patches
For a complete set of bug fixes of MVAPICH2 1.9 (compared to 1.8.1),
please refer to the following URL:
http://mvapich.cse.ohio-state.edu/download/mvapich2/changes-1.9.shtml
MVAPICH2-X 1.9 software package (released as a technology preview)
provides support for hybrid MPI+PGAS (UPC and OpenSHMEM) programming
models with unified communication runtime for emerging exascale
systems. This software package provides flexibility for users to
write applications using the following programming models with a
unified communication runtime: MPI, MPI+OpenMP, pure UPC, and pure
OpenSHMEM programs as well as hybrid MPI(+OpenMP) + PGAS (UPC and
OpenSHMEM) programs.
Features for MVAPICH2-X 1.9 are as follows. (**) indicates features
since 1.9RC1:
*MPI Features
- (**) Based on MVAPICH2 1.9 (OFA-IB-CH3 interface) including
MPI-3 features. MPI programs can take advantage of all
the features enabled by default in OFA-IB-CH3 interface
of MVAPICH2 1.9
- High performance two-sided communication scalable to
multi-thousand nodes
- Optimized collective communication operations:
- Shared-memory optimized algorithms for barrier, broadcast,
reduce and allreduce operations
- Optimized two-level designs for scatter and gather operations
- Improved implementation of allgather, alltoall operations
- High-performance and scalable support for one-sided communication
- Direct RDMA based designs for one-sided communication
- Shared memory backed Windows for One-Sided communication
- Support for truly passive locking for intra-node RMA
in shared memory backed windows
- Multi-threading support
- Enhanced support for multi-threaded MPI applications
* Unified Parallel C (UPC) Features
- UPC Language Specification v1.2 standard compliance
- Based on Berkeley UPC v2.16.2
- Optimized RDMA-based implementation of UPC data movement routines
- Improved UPC memput design for small/medium size messages
* OpenSHMEM Features:
- (**) Added 'shmem_ptr' functionality
- OpenSHMEM v1.0d standard compliance
- Optimized RDMA-based implementation of OpenSHMEM
data movement routines
- Efficient implementation of OpenSHMEM atomics using RDMA atomics
- High performance intra-node communication using
shared memory based schemes
- Optimized OpenSHMEM put routines for small/medium message sizes
* Hybrid Program Features:
- (**) Based on MVAPICH2 1.9 (OFA-IB-CH3 interface). All the runtime
features enabled by default in OFA-IB-CH3 interface of MVAPICH2 1.9
are available in MVAPICH2-X 1.9
- Supports hybrid programming using MPI(+OpenMP),
MPI(+OpenMP)+UPC and MPI(+OpenMP)+OpenSHMEM
- Support for MPI-3, UPC v1.2 and OpenSHMEM v1.0d
- Optimized network resource utilization through the
unified communication runtime
- Efficient deadlock-free progress of MPI and UPC/OpenSHMEM calls
* Unified Runtime Features:
- (**) Based on MVAPICH2 1.9 (OFA-IB-CH3 interface). All the
runtime features enabled by default in OFA-IB-CH3 interface of
MVAPICH2 1.9 are available in MVAPICH2-X 1.9. MPI, UPC,
OpenSHMEM and Hybrid programs benefit from its features
listed below:
- Scalable inter-node communication with highest performance
and reduced memory usage
- Integrated RC/XRC design to get best performance on
large-scale systems with reduced/constant memory footprint
- RDMA Fast Path connections for efficient small
message communication
- Shared Receive Queue (SRQ) with flow control to significantly
reduce memory footprint of the library
- AVL tree-based resource-aware registration cache
- Automatic tuning based on network adapter and host architecture
- Optimized intra-node communication support by taking
advantage of shared-memory communication
- Efficient Buffer Organization for Memory Scalability of
Intra-node Communication
- Automatic intra-node communication parameter tuning
based on platform
- Flexible CPU binding capabilities
- Portable Hardware Locality (hwloc v1.7) support for
defining CPU affinity
- Efficient CPU binding policies (bunch and scatter patterns,
socket and numanode granularities) to specify CPU binding
per job for modern multi-core platforms
- Allow user-defined flexible processor affinity
- Two modes of communication progress
- Polling
- Blocking (enables running multiple processes/processor)
- Flexible process manager support
- Support for mpirun rsh, hydra and oshrun process managers
- Support for upcrun process manager
Bug Fixes for OSU Micro-Benchmarks (OMB) 4.0.1 are listed here.
* Bug Fixes (since OMB 4.0)
- Fix several warnings
http://mvapich.cse.ohio-state.edu/svn/mpi-benchmarks/branches/4.0/CHANGES
Various performance numbers for MVAPICH2 1.9 and MVAPICH2-X 1.9
on different platforms and system configurations can be viewed
by visiting `Performance' section of the project's web page.
For downloading MVAPICH2 1.9, MVAPICH2-X 1.9, OMB 4.0.1,
associated user guides, quick start guide, and accessing the SVN,
please visit the following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
Thanks,
The MVAPICH Team
More information about the mvapich-discuss
mailing list