[mvapich-discuss] Announcing the release of MVAPICH2 2.2 GA

Fri Sep 9 19:15:02 EDT 2016

The MVAPICH team is pleased to announce the release of MVAPICH2 2.2 GA.

* Features and Enhancements (since 2.1 GA):
    - Based on MPICH 3.1.4
    - Support for OpenPower architecture
        - Optimized inter-node and intra-node communication
    - Support for Intel Knights Landing architecture
        - Optimized point-to-point and collective performance
    - Support for Intel Omni-Path architecture through a new PSM2 channel
        - Thanks to Intel for contributing the patch
    - Support for RoCEv2
    - Enhanced performance for MPI_Comm_split through new bitonic algorithm
        - Thanks to Adam T. Moody at LLNL for the patch
    - Enable support for multiple MPI initializations
    - Enhanced performance for small messages
    - Enhanced startup performance and reduced memory footprint for storing
      InfiniBand end-point information with SLURM
        - Support for PMIX_Iallgather and PMIX_Ifence
        - Support for shared memory based PMI operations
        - Availability of an updated patch from the MVAPICH project website
          with this support for SLURM installations
    - Support for backing on-demand UD CM information with shared memory
      for minimizing memory footprint
    - Improved startup performance for QLogic PSM-CH3 channel
        - Thanks to Maksym Planeta at TU Dresden for the patch
    - Enable efficient process affinity for asynchronous progress thread
    - Unify process affinity support in Gen2, PSM and PSM2 channels
    - Enable affinity by default for TrueScale(PSM) and Omni-Path(PSM2)
      channels
    - Allow processes to request MPI_THREAD_MULTIPLE when socket or NUMA node
      level affinity is specified
    - Enable graceful fallback to Shared Memory if LiMIC2 or CMA transfer fails
    - Reorganized HCA-aware process mapping
    - Update to hwloc version 1.11.2
    - Add support for HCAs that return result of atomics in big endian notation
    - Establish loopback connections by default if HCA supports atomics
    - Dynamic identification of maximum read/atomic operations supported by HCA
    - Automatic detection and tuning for InfiniBand EDR HCAs
    - Enabling support for intra-node communications in RoCE mode without
      shared memory
    - Enhanced support for MPIT based performance variables
    - Architecture detection for PSC Bridges system with Omni-Path
    - Automatic detection and tuning for 24-core Intel Haswell architecture
    - Automatic detection and tuning for 28-core Intel Broadwell processors
    - Enhanced tuning for shared-memory based MPI_Bcast
    - Optimized pt-to-pt and collective tuning for Chameleon InfiniBand
      systems at TACC/UoC
    - Collective tuning for Bridges at PSC, Stampede at TACC and other
      architectures
    - Collective tuning for Opal at LLNL, Bridges at PSC, Stampede at TACC and,
      Stampede-1.5 at TACC
    - Warn user to reconfigure library if rank type is not large enough to
      represent all ranks in job
    - Add ability to avoid using --enable-new-dtags with ld
        - Thanks to Adam T. Moody at LLNL for the suggestion
    - Add LIBTVMPICH specific CFLAGS and LDFLAGS
        - Thanks to Adam T. Moody at LLNL for the suggestion
    - Enable PSM builds when both PSM and PSM2 libraries are present
        - Thanks to Adam T. Moody at LLNL for the report and patch
    - Remove verbs dependency when building the PSM and PSM2 channels
        - Thanks to Jeff Hammond at Intel for the report
    - Updated to sm_20 kernel optimizations for MPI Datatypes
    - Enhanced debugging support and error messages

* Bug Fixes (since MVAPICH 2.1 GA):
    - Fix minor error in use of communicator object in collectives
    - Fix missing u_int64_t declaration with PGI compilers
        - Thanks to Adam T. Moody at LLNL for the report and patch
    - Fix memory leak in RMA rendezvous code path
        - Thanks to Min Si at ANL for the report and patch
    - Disable optimization that removes use of calloc in ptmalloc hook
      detection code
        - Thanks to Karl W. Schulz at Intel
    - Fix weak alias typos (allows successful compilation with CLANG compiler)
        - Thanks to Min Dong at Old Dominion University for the patch
    - Fix issues in PSM large message gather operations
        - Thanks to Adam T. Moody at LLNL for the report
    - Enhance error checking in collective tuning code
        - Thanks to Jan Bierbaum at Technical University of Dresden for the patch
    - Fix issues with UD based communication in RoCE mode
    - Fix issues with PMI2 support in singleton mode
    - Fix default binding bug in hydra launcher
    - Fix issues with Checkpoint Restart when launched with mpirun_rsh
    - Fix fortran binding issues with Intel 2016 compilers
    - Fix issues with socket/NUMA node level binding
    - Disable atomics when using Connect-IB with RDMA_CM
    - Fix hang in MPI_Finalize when using hybrid channel
    - Fix memory leaks
    - Fix issue in some of the internal algorithms used for MPI_Bcast,
      MPI_Alltoall and MPI_Reduce
    - Fix hang in one of the internal algorithms used for MPI_Scatter
        - Thanks to Ivan Raikov at Stanford for reporting this issue
    - Fix issue with rdma_connect operation
    - Fix issue with Dynamic Process Management feature
    - Fix issue with de-allocating InfiniBand resources in blocking mode
    - Fix build errors caused due to improper compile time guards
        - Thanks to Adam Moody at LLNL for the report
    - Fix finalize hang when running in hybrid or UD-only mode
        - Thanks to Jerome Vienne at TACC for reporting this issue
    - Fix issue in MPI_Win_flush operation
        - Thanks to Nenad Vukicevic for reporting this issue
    - Fix out of memory issues with non-blocking collectives code
        - Thanks to Phanisri Pradeep Pratapa and Fang Liu at GaTech for
          reporting this issue
    - Fix fall-through bug in external32 pack
        - Thanks to Adam Moody at LLNL for the report and patch
    - Fix issue with on-demand connection establishment and blocking mode
        - Thanks to Maksym Planeta at TU Dresden for the report
    - Fix memory leaks in hardware multicast based broadcast code
    - Fix memory leaks in TrueScale(PSM) channel
    - Fix compilation warnings
    - Fix issue with MPI_Get_count in QLogic PSM-CH3 channel with very large
      messages (>2GB)
    - Fix issues with shared memory collectives and checkpoint-restart
    - Fix hang with checkpoint-restart
    - Fix issue with unlinking shared memory files
    - Fix memory leak with MPIT
    - Fix minor typos and usage of inline and static keywords
        - Thanks to Maksym Planeta at TU Dresden for the patch
    - Continue with warning if user asks to enable XRC when the system does not
      support XRC
    - Fix for error with multi-vbuf design for GPU based communication
    - Fix bugs with hybrid UD/RC/XRC communications
    - Fix for MPICH putfence/getfence for large messages
    - Fix for error in collective tuning framework
    - Fix validation failure with Alltoall with IN_PLACE option
       - Thanks for Mahidhar Tatineni @SDSC for the report
    - Fix bug with MPI_Reduce with IN_PLACE option
       - Thanks to Markus Geimer for the report
    - Fix for compilation failures with multicast disabled
       - Thanks to Devesh Sharma @Emulex for the report
    - Fix bug with MPI_Bcast
    - Fix IPC selection for shared GPU mode systems
    - Fix for build time warnings and memory leaks
    - Fix issues with Dynamic Process Management
       - Thanks to Neil Spruit for the report
     - Fix bug in architecture detection code
       - Thanks to Adam Moody @LLNL for the report

For downloading MVAPICH2 2.2 GA, associated user guides, quick start
guide, and accessing the SVN, please visit the following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedback, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).

Thanks,

The MVAPICH Team