[mvapich-discuss] Announcing the release of MVAPICH2 2.0 GA, MVAPICH2-X 2.0 GA and OSU Micro-Benchmarks (OMB) 4.3.1

Fri Jun 20 17:01:06 EDT 2014

The MVAPICH team is pleased to announce the release of MVAPICH2 2.0 GA,
MVAPICH2-X 2.0 GA (Hybrid MPI+PGAS (OpenSHMEM) with Unified
Communication Runtime) and OSU Micro-Benchmarks (OMB) 4.3.1.

Features, Enhancements, and Bug Fixes for MVAPICH2 2.0 are
listed here.

* Features and Enhancements (since MVAPICH2 1.9 GA):

    - Based on MPICH-3.1
    - Extended support for MPI-3 RMA in OFA-IB-CH3, OFA-IWARP-CH3,
      and OFA-RoCE-CH3 interfaces
    - RMA optimizations for shared memory and atomic operations
    - Optimized communication when using MPI_Win_allocate for
      OFA-IB-CH3 channel
    - MPI-3 RMA support for CH3-PSM channel
    - Support MPI_T Performance and Control variables
    - Optimized and tuned blocking and non-blocking collectives
      for OFA-IB-CH3, OFA-IB-Nemesis, and CH3-PSM channels
    - Large message transfer support for PSM interface
    - CMA support is now enabled by default
    - Enhanced intra-node SMP performance
    - Tuned SMP eager threshold parameters
    - Tuning RGET and Atomics operations
    - Dynamic CUDA initialization. Support GPU device selection
      after MPI_Init and initialize GPU resources only when
      used by MPI transfer
    - Support for running on heterogeneous clusters with GPU and non-GPU nodes
    - Multi-rail support for GPU communication
    - Non-blocking streams in asynchronous CUDA transfers for better overlap
    - Optimized sub-array data-type processing for GPU-to-GPU communication
    - Added options to specify CUDA library paths
    - Tuned RDMA FP-based communication
    - Tuning for Ivy-Bridge architecture
    - Tuning for Mellanox Connect-IB adapters
    - Reduced memory footprint
    - Improved job-startup performance for large-scale mpirun_rsh jobs
    - Introduced retry mechanism in mpirun_rsh for socket binding
    - Capability to checkpoint CH3 channel using the Hydra process manager
    - Warn and continue when ptmalloc fails to initialize
    - Enable hierarchical SSH-based startup with Checkpoint-Restart
    - Multi-rail support for UD-Hybrid channel
    - Updated compiler wrappers to remove application dependency on
      network and other extra libraries
      (Thanks to Adam Moody from LLNL for the suggestion)
    - Deprecation of uDAPL-CH3 channel
    - Updated to hwloc v1.9

* Bug-Fixes (since MVAPICH2 1.9 GA):

    - Fix data validation issue with MPI_Bcast
        - Thanks to Claudio J. Margulis from University of Iowa for the report
    - Fix issue with very large message (>2GB bytes) MPI_Bcast
        - Thanks to Lu Qiyue for the report
    - Fix multicast hang when there is a single process on one node
      and more than one process on other nodes
    - Fix for bcastzero type hang during finalize
    - Fix non-power-of-two usage of scatter-doubling-allgather algorithm
    - Fix issues related to large message transfers for OFA-IB-CH3
      and OFA-IB-Nemesis channels
    - Fix buffer alignment for large message shared memory transfers
    - Initialize using better defaults for ibv_modify_QPI (initial ring)
    - Enhanced handling of failures in RDMA_CM based connection establishment
    - Fix for hangs in connection setup and finalize when using RDMA_CM
    - Fix warning in job launch, when using DPM
    - Fix issues in Nemesis interface with --with-ch3-rank-bits=32
    - Better cleanup of XRC files in corner cases
    - Fix a flow-control bug in UD transport
        - Thanks to Benjamin M. Auer from NASA for the report
    - Fix issues related to MPI-3 RMA locks
    - Fix a bug in One-Sided shared memory backed windows
    - Fix an issue related to MPI-3 dynamic window
    - Fix issues related to MPI_Win_allocate backed by shared memory
    - Fix bugs with MPI-3 RMA in Nemesis IB interface
    - Fix an issue related to MPI atomic operations on HCAs without
      atomics support
    - Handle case where $HOME is not set during search for MV2 user config file
        - Thanks to Adam Moody from LLNL for the patch
    - Fix compilation error with --enable-g=all in PSM interface
    - Prevent printing out inter-node runtime parameters for pure
      intra-node runs
        - Thanks to Jerome Vienne from TACC for the report
    - MPI_Get_library_version updated with proper MVAPICH2 branding
         - Thanks to Jerome Vienne from the TACC for the report
    - Finish receive request when RDMA READ completes in RGET protocol
    - Always use direct RDMA when flush is used
    - Fixed an issue related to selection of compiler. (We prefer
      the GNU, Intel, PGI, and Ekopath compilers in that order).
        - Thanks to Uday R Bondhugula from IISc for the report
    - Fix an issue in message coalescing
    - Consider list provided by MV2_IBA_HCA when scanning device list
    - Add unconditional check and addition of pthread library
    - Fix an issue related to ordering of messages for GPU-to-GPU transfers
    - Fix multiple warnings and memory leaks

MVAPICH2-X 2.0 software package provides support for hybrid
MPI+PGAS (UPC and OpenSHMEM) programming models with unified
communication runtime for emerging exascale systems. This software
package provides flexibility for users to write applications using the
following programming models with a unified communication runtime:
MPI, MPI+OpenMP, pure UPC, and pure OpenSHMEM programs as well as
hybrid MPI(+OpenMP) + PGAS (UPC and OpenSHMEM) programs.

Features and enhancements for MVAPICH2-X 2.0 are as follows:

* Features and Enhancements (since MVAPICH2-X 1.9 GA):
    - MPI Features
        - Based on MVAPICH2 2.0 (OFA-IB-CH3 interface)

    - Unified Runtime Features
        - Based on MVAPICH2 2.0 (OFA-IB-CH3 interface). All the
          runtime features enabled by default in OFA-IB-CH3 interface
          of MVAPICH2 2.0 are available in MVAPICH2-X 2.0

    - OpenSHMEM Features
        - Based on OpenSHMEM reference implementation 1.0f
        - Improved intra-node communication performance using
          Shared memory and Cross Memory Attach (CMA)
        - Optimized OpenSHMEM collectives (improved performance for
          shmem_collect, shmem_fcollect, shmem_barrier,
          shmem_reduce, and shmem_broadcast)
        - Optimized shmalloc routine

    - UPC Features
        - Based on Berkeley UPC 2.18.0 (contains changes/additions
          in preparation for upcoming UPC 1.3 specification)
        - Optimized UPC collectives (improved performance for
          upc_all_broadcast, upc_all_scatter, upc_all_gather,
          upc_all_gather_all, and upc_all_exchange)
        - Support for GUPC translator

* Bug Fixes (since MVAPICH2-X 1.9 GA):
    - OpenSHMEM Bug Fixes
        - Fix an issue related to atomics on HCAs without atomics support
        - Fixed synchronization issue in shmem_fence
        - Fixed issue in shmem_collect which prevented variable
          length collect routine

Bug Fixes for OSU Micro-Benchmarks (OMB) 4.3.1 are
listed here.

* Bug Fixes (since OMB 4.3)
    - Fix typo in MPI collective benchmark help message
    - Explicitly mention that -m and -M parameters are specified in bytes

Various performance numbers for MVAPICH2 2.0 and MVAPICH2-X 2.0
on different platforms and system configurations can be viewed
by visiting `Performance' section of the project's web page.

For downloading MVAPICH2 2.0 and MVAPICH2-X 2.0, associated user
guides, quick start guide, and accessing the SVN, please visit the
following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).

Thanks,

The MVAPICH Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140620/c710e4be/attachment.html>