[mvapich-discuss] Announing the Release of MVAPICH2 2.0rc1, MVAPICH2-X 2.0rc1 and OSU Micro-Benchmarks (OMB) 4.3

Tue Mar 25 03:06:06 EDT 2014

The MVAPICH team is pleased to announce the release of MVAPICH2
2.0rc1, MVAPICH2-X 2.0rc1 (Hybrid MPI+PGAS (OpenSHMEM) with Unified
Communication Runtime) and OSU Micro-Benchmarks (OMB) 4.3.

Features, Enhancements, and Bug Fixes for MVAPICH2 2.0rc1 (since
MVAPICH2 2.0b release) are listed here.

* Features and Enhancements (since 2.0b):
    - Based on MPICH-3.1
    - Enhanced direct RDMA based designs for MPI_Put and MPI_Get operations
      in OFA-IB-CH3 channel
    - Optimized communication when using MPI_Win_allocate for OFA-IB-CH3
      channel
    - MPI-3 RMA support for CH3-PSM channel
    - Multi-rail support for UD-Hybrid channel
    - Optimized and tuned blocking and non-blocking collectives for
      OFA-IB-CH3, OFA-IB-Nemesis, and CH3-PSM channels
    - Improved hierarchical job startup performance
    - Optimized sub-array data-type processing for GPU-to-GPU communication
    - Tuning for Mellanox Connect-IB adapters
    - Updated hwloc to version 1.8
    - Added options to specify CUDA library paths
    - Deprecation of uDAPL-CH3 channel

* Bug-Fixes (since 2.0b):
    - Fix issues related to MPI-3 RMA locks
    - Fix an issue related to MPI-3 dynamic window
    - Fix issues related to MPI_Win_allocate backed by shared memory
    - Fix issues related to large message transfers for OFA-IB-CH3 and
      OFA-IB-Nemesis channels
    - Fix warning in job launch, when using DPM
    - Fix an issue related to MPI atomic operations on HCAs without atomics
      support
    - Fixed an issue related to selection of compiler. (We prefer the GNU,
      Intel, PGI, and Ekopath compilers in that order).
        - Thanks to Uday R Bondhugula from IISc for the report
    - Fix an issue in message coalescing
    - Prevent printing out inter-node runtime parameters for pure intra-node
      runs
        - Thanks to Jerome Vienne from TACC for the report
    - Fix an issue related to ordering of messages for GPU-to-GPU transfers
    - Fix a few memory leaks and warnings

MVAPICH2-X 2.0rc1 software package provides support for hybrid
MPI+PGAS (UPC and OpenSHMEM) programming models with unified
communication runtime for emerging exascale systems. This software
package provides flexibility for users to write applications using the
following programming models with a unified communication runtime:
MPI, MPI+OpenMP, pure UPC, and pure OpenSHMEM programs as well as
hybrid MPI(+OpenMP) + PGAS (UPC and OpenSHMEM) programs.

Features, enhancements and bug-fixes for MVAPICH2-X 2.0rc1 (since
MVAPICH2-X 2.0b) are as follows:

* Features and Enhancements (since 2.0b):
    - OpenSHMEM Features
        - Based on OpenSHMEM reference implementation 1.0f
        - Improved intra-node communication performance using
          Shared memory and Cross Memory Attach (CMA)

    - UPC Features
        - Based on Berkeley UPC 2.18.0 (contains changes/additions
          in preparation for upcoming UPC 1.3 specification)
        - Optimized UPC collectives (improved performance for
          upc_all_broadcast, upc_all_scatter, upc_all_gather,
          upc_all_gather_all, and upc_all_exchange)

    - MPI Features
        - Based on MVAPICH2 2.0rc1 (OFA-IB-CH3 interface)

    - Unified Runtime Features
        - Based on MVAPICH2 2.0rc1 (OFA-IB-CH3 interface). All the
          runtime features enabled by default in OFA-IB-CH3 interface
          of MVAPICH2 2.0rc1 are available in MVAPICH2-X 2.0rc1

* Bug Fixes (since 2.0b):
    - OpenSHMEM Bug Fixes
        - Fix an issue related to atomics on HCAs without atomics support

New features and Enhancements of OSU Micro-Benchmarks (OMB) 4.3 (since
OMB 4.2 release) are listed here.

* New Features & Enhancements (since 4.2)

    - This new suite includes several new (or updated) benchmarks to
      measure performance of MPI-3 RMA communication operations with
      options to select different window creation (WIN_CREATE,
      WIN_DYNAMIC, and WIN_ALLOCATE) and synchronization functions
      (LOCK, PSCW, FENCE, FLUSH, FLUSH_LOCAL, and LOCK_ALL) in each
      benchmark
        * osu_acc_latency
        * osu_cas_latency
        * osu_fop_latency
        * osu_get_acc_latency
        * osu_get_bw
        * osu_get_latency
        * osu_put_bibw
        * osu_put_bw
        * osu_put_latency

    - New UPC Collective Benchmarks
        * osu_upc_all_barrier
        * osu_upc_all_broadcast
        * osu_upc_all_exchange
        * osu_upc_all_gather
        * osu_upc_all_gather_all
        * osu_upc_all_reduce
        * osu_upc_all_scatter

    - Build MPI3 benchmarks when MPI library support is detected

* Bug Fixes (since 4.2)
    - Add shmem_quiet() in OpenSHMEM Message Rate benchmark to ensure all
      previously issued operations are completed
    - Allocate pWrk from symmetric heap in OpenSHMEM Reduce benchmark

For downloading MVAPICH2 2.0rc1, MVAPICH2-X 2.0rc1, OMB 4.3,
associated user guides, quick start guide, and accessing the SVN,
please visit the following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedbacks, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).

Thanks,

The MVAPICH Team
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140325/eea68f0a/attachment.html>