[mvapich-discuss] Announcing the release of MVAPICH2 2.2 GA
Panda, Dhabaleswar
panda at cse.ohio-state.edu
Fri Sep 9 19:15:02 EDT 2016
The MVAPICH team is pleased to announce the release of MVAPICH2 2.2 GA.
* Features and Enhancements (since 2.1 GA):
- Based on MPICH 3.1.4
- Support for OpenPower architecture
- Optimized inter-node and intra-node communication
- Support for Intel Knights Landing architecture
- Optimized point-to-point and collective performance
- Support for Intel Omni-Path architecture through a new PSM2 channel
- Thanks to Intel for contributing the patch
- Support for RoCEv2
- Enhanced performance for MPI_Comm_split through new bitonic algorithm
- Thanks to Adam T. Moody at LLNL for the patch
- Enable support for multiple MPI initializations
- Enhanced performance for small messages
- Enhanced startup performance and reduced memory footprint for storing
InfiniBand end-point information with SLURM
- Support for PMIX_Iallgather and PMIX_Ifence
- Support for shared memory based PMI operations
- Availability of an updated patch from the MVAPICH project website
with this support for SLURM installations
- Support for backing on-demand UD CM information with shared memory
for minimizing memory footprint
- Improved startup performance for QLogic PSM-CH3 channel
- Thanks to Maksym Planeta at TU Dresden for the patch
- Enable efficient process affinity for asynchronous progress thread
- Unify process affinity support in Gen2, PSM and PSM2 channels
- Enable affinity by default for TrueScale(PSM) and Omni-Path(PSM2)
channels
- Allow processes to request MPI_THREAD_MULTIPLE when socket or NUMA node
level affinity is specified
- Enable graceful fallback to Shared Memory if LiMIC2 or CMA transfer fails
- Reorganized HCA-aware process mapping
- Update to hwloc version 1.11.2
- Add support for HCAs that return result of atomics in big endian notation
- Establish loopback connections by default if HCA supports atomics
- Dynamic identification of maximum read/atomic operations supported by HCA
- Automatic detection and tuning for InfiniBand EDR HCAs
- Enabling support for intra-node communications in RoCE mode without
shared memory
- Enhanced support for MPIT based performance variables
- Architecture detection for PSC Bridges system with Omni-Path
- Automatic detection and tuning for 24-core Intel Haswell architecture
- Automatic detection and tuning for 28-core Intel Broadwell processors
- Enhanced tuning for shared-memory based MPI_Bcast
- Optimized pt-to-pt and collective tuning for Chameleon InfiniBand
systems at TACC/UoC
- Collective tuning for Bridges at PSC, Stampede at TACC and other
architectures
- Collective tuning for Opal at LLNL, Bridges at PSC, Stampede at TACC and,
Stampede-1.5 at TACC
- Warn user to reconfigure library if rank type is not large enough to
represent all ranks in job
- Add ability to avoid using --enable-new-dtags with ld
- Thanks to Adam T. Moody at LLNL for the suggestion
- Add LIBTVMPICH specific CFLAGS and LDFLAGS
- Thanks to Adam T. Moody at LLNL for the suggestion
- Enable PSM builds when both PSM and PSM2 libraries are present
- Thanks to Adam T. Moody at LLNL for the report and patch
- Remove verbs dependency when building the PSM and PSM2 channels
- Thanks to Jeff Hammond at Intel for the report
- Updated to sm_20 kernel optimizations for MPI Datatypes
- Enhanced debugging support and error messages
* Bug Fixes (since MVAPICH 2.1 GA):
- Fix minor error in use of communicator object in collectives
- Fix missing u_int64_t declaration with PGI compilers
- Thanks to Adam T. Moody at LLNL for the report and patch
- Fix memory leak in RMA rendezvous code path
- Thanks to Min Si at ANL for the report and patch
- Disable optimization that removes use of calloc in ptmalloc hook
detection code
- Thanks to Karl W. Schulz at Intel
- Fix weak alias typos (allows successful compilation with CLANG compiler)
- Thanks to Min Dong at Old Dominion University for the patch
- Fix issues in PSM large message gather operations
- Thanks to Adam T. Moody at LLNL for the report
- Enhance error checking in collective tuning code
- Thanks to Jan Bierbaum at Technical University of Dresden for the patch
- Fix issues with UD based communication in RoCE mode
- Fix issues with PMI2 support in singleton mode
- Fix default binding bug in hydra launcher
- Fix issues with Checkpoint Restart when launched with mpirun_rsh
- Fix fortran binding issues with Intel 2016 compilers
- Fix issues with socket/NUMA node level binding
- Disable atomics when using Connect-IB with RDMA_CM
- Fix hang in MPI_Finalize when using hybrid channel
- Fix memory leaks
- Fix issue in some of the internal algorithms used for MPI_Bcast,
MPI_Alltoall and MPI_Reduce
- Fix hang in one of the internal algorithms used for MPI_Scatter
- Thanks to Ivan Raikov at Stanford for reporting this issue
- Fix issue with rdma_connect operation
- Fix issue with Dynamic Process Management feature
- Fix issue with de-allocating InfiniBand resources in blocking mode
- Fix build errors caused due to improper compile time guards
- Thanks to Adam Moody at LLNL for the report
- Fix finalize hang when running in hybrid or UD-only mode
- Thanks to Jerome Vienne at TACC for reporting this issue
- Fix issue in MPI_Win_flush operation
- Thanks to Nenad Vukicevic for reporting this issue
- Fix out of memory issues with non-blocking collectives code
- Thanks to Phanisri Pradeep Pratapa and Fang Liu at GaTech for
reporting this issue
- Fix fall-through bug in external32 pack
- Thanks to Adam Moody at LLNL for the report and patch
- Fix issue with on-demand connection establishment and blocking mode
- Thanks to Maksym Planeta at TU Dresden for the report
- Fix memory leaks in hardware multicast based broadcast code
- Fix memory leaks in TrueScale(PSM) channel
- Fix compilation warnings
- Fix issue with MPI_Get_count in QLogic PSM-CH3 channel with very large
messages (>2GB)
- Fix issues with shared memory collectives and checkpoint-restart
- Fix hang with checkpoint-restart
- Fix issue with unlinking shared memory files
- Fix memory leak with MPIT
- Fix minor typos and usage of inline and static keywords
- Thanks to Maksym Planeta at TU Dresden for the patch
- Continue with warning if user asks to enable XRC when the system does not
support XRC
- Fix for error with multi-vbuf design for GPU based communication
- Fix bugs with hybrid UD/RC/XRC communications
- Fix for MPICH putfence/getfence for large messages
- Fix for error in collective tuning framework
- Fix validation failure with Alltoall with IN_PLACE option
- Thanks for Mahidhar Tatineni @SDSC for the report
- Fix bug with MPI_Reduce with IN_PLACE option
- Thanks to Markus Geimer for the report
- Fix for compilation failures with multicast disabled
- Thanks to Devesh Sharma @Emulex for the report
- Fix bug with MPI_Bcast
- Fix IPC selection for shared GPU mode systems
- Fix for build time warnings and memory leaks
- Fix issues with Dynamic Process Management
- Thanks to Neil Spruit for the report
- Fix bug in architecture detection code
- Thanks to Adam Moody @LLNL for the report
For downloading MVAPICH2 2.2 GA, associated user guides, quick start
guide, and accessing the SVN, please visit the following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedback, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
Thanks,
The MVAPICH Team
More information about the mvapich-discuss
mailing list