[mvapich-discuss] Announcing the release of MVAPICH2 2.3 GA and OMB 5.4.3
Subramoni, Hari
subramoni.1 at osu.edu
Mon Jul 23 17:44:43 EDT 2018
The MVAPICH team is pleased to announce the release of MVAPICH2 2.3 GA and
OSU Micro-Benchmarks (OMB) 5.4.3.
Features and enhancements for MVAPICH2 2.3 GA are as follows:
* Features and Enhancements (since 2.2 GA):
- Based on MPICH v3.2.1
- Enhanced small message performance for MPI_Alltoallv
- Improve performance for host-based transfers when CUDA is enabled
- Add architecture detection for IBM POWER9 CPUs
- Add point-to-point and collective tuning for IBM POWER9 CPUs
- Enhance architecture detection for Intel Skylake CPUs
- Enhance MPI initialization to gracefully handle RDMA_CM failures
- Improve algorithm selection of several collectives
- Enhance detection of number and IP addresses of IB devices
- Enhanced performance for Allreduce, Reduce_scatter_block, Allgather,
Allgatherv through new algorithms
- Thanks to Danielle Sikich and Adam Moody @ LLNL for the patch
- Enhance support for MPI_T PVARs and CVARs
- Improved job startup time for OFA-IB-CH3, PSM-CH3, and PSM2-CH3
- Support to automatically detect IP address of IB/RoCE interfaces when
RDMA_CM is enabled without relying on mv2.conf file
- Enhance HCA detection to handle cases where node has both IB and RoCE HCAs
- Automatically detect and use maximum supported MTU by the HCA
- Added logic to detect heterogeneous CPU/HFI configurations in PSM-CH3 and
PSM2-CH3 channels
- Thanks to Matias Cabral at Intel for the report
- Enhanced intra-node and inter-node tuning for PSM-CH3 and PSM2-CH3
channels
- Enhanced HFI selection logic for systems with multiple Omni-Path HFIs
- Enhanced tuning and architecture detection for OpenPOWER, Intel Skylake
and Cavium ARM (ThunderX) systems
- Added 'SPREAD', 'BUNCH', and 'SCATTER' binding options for hybrid CPU
binding policy
- Rename MV2_THREADS_BINDING_POLICY to MV2_HYBRID_BINDING_POLICY
- Added support for MV2_SHOW_CPU_BINDING to display number of OMP threads
- Enhance performance of point-to-point operations for CH3-Gen2 (InfiniBand),
CH3-PSM, and CH3-PSM2 (Omni-Path) channels
- Improve performance for MPI-3 RMA operations
- Introduce support for Cavium ARM (ThunderX) systems
- Improve support for process to core mapping on many-core systems
- New environment variable MV2_THREADS_BINDING_POLICY for
multi-threaded MPI and MPI+OpenMP applications
- Support `linear' and `compact' placement of threads
- Warn user if oversubcription of core is detected
- Improve launch time for large-scale jobs with mpirun_rsh
- Add support for non-blocking Allreduce using Mellanox SHARP
- Efficient support for different Intel Knight's Landing (KNL) models
- Improve performance for Intra- and Inter-node communication for OpenPOWER
architecture
- Improve support for large processes per node and hugepages on SMP systems
- Enhance collective tuning for Intel Knight's Landing and Intel Omni-Path
based systems
- Enhance collective tuning for Bebop at ANL, Bridges at PSC, and Stampede2 at TACC
systems
- Enhanced collective tuning for IBM POWER8, Intel Skylake, Intel KNL, Intel
Broadwell architectures
- Enhance large message intra-node performance with CH3-IB-Gen2 channel on
Intel Knight's Landing
- Enhance support for MPI_T PVARs and CVARs
- Based on and ABI compatible with MPICH 3.2
- Support collective offload using Mellanox's SHArP for Allreduce
- Enhance tuning framework for Allreduce using SHArP
- Introduce capability to run MPI jobs across multiple InfiniBand subnets
- Introduce basic support for executing MPI jobs in Singularity
- Enhance collective tuning for Intel Knight's Landing and Intel Omni-path
- Enhance process mapping support for multi-threaded MPI applications
- Introduce MV2_CPU_BINDING_POLICY=hybrid
- Introduce MV2_THREADS_PER_PROCESS
- On-demand connection management for PSM-CH3 and PSM2-CH3 channels
- Enhance PSM-CH3 and PSM2-CH3 job startup to use non-blocking PMI calls
- Enhance debugging support for PSM-CH3 and PSM2-CH3 channels
- Improve performance of architecture detection
- Introduce run time parameter MV2_SHOW_HCA_BINDING to show process to HCA
bindings
- Enhance MV2_SHOW_CPU_BINDING to enable display of CPU bindings on all
nodes
- Deprecate OFA-IB-Nemesis channel
- Update to hwloc version 1.11.9
- Tested with CLANG v5.0.0
* Bug Fixes (since 2.2 GA):
- Fix issues in CH3-TCP/IP channel
- Fix build and runtime issues with CUDA support
- Fix error when XRC and RoCE were enabled at the same time
- Fix issue with XRC connection establishment
- Fix for failure at finalize seen on iWARP enabled devices
- Fix issue with MPI_IN_PLACE-based communcation in MPI_Reduce and
MPI_Reduce_scatter
- Fix issue with allocating large number of shared memory based MPI3-RMA
windows
- Fix failure in mpirun_rsh with large number of nodes
- Fix singleton initialization issue with SLURM/PMI2 and PSM/Omni-Path
- Thanks to Adam Moody @LLNL for the report
- Fix build failure with when enabling GPFS support in ROMIO
- Thanks to Doug Johnson @OHTech for the report
- Fix issues with architecture detection in PSM-CH3 and PSM2-CH3 channels
- Fix failures with CMA read at very large message sizes
- Fix faiures with MV2_SHOW_HCA_BINDING on single-node jobs
- Fix issue in autogen step with duplicate error messages
- Fix issue with XRC connection establishment
- Fix build issue with SLES 15 and Perl 5.26.1
- Thanks to Matias A Cabral @Intel for the report and patch
- Fix segfault when manually selecting collective algorithms
- Fix cleanup of preallocated RDMA_FP regions at RDMA_CM finalize
- Fix issue with RDMA_CM in multi-rail scenario
- Fix issues in nullpscw RMA test.
- Fix issue with reduce and allreduce algorithms for large message sizes
- Fix hang issue in hydra when no SLURM environment is present
- Thanks to Vaibhav Sundriyal for the report
- Fix issue to test Fortran KIND with FFLAGS
- Thanks to Rob Latham at mcs.anl.gov for the patch
- Fix issue in parsing environment variables
- Fix issue in displaying process to HCA binding
- Enhance CPU binding logic to handle vendor specific core mappings
- Fix issue with bcast algorithm selection
- Fix issue with large message transfers using CMA
- Fix issue in Scatter and Gather with large messages
- Fix tuning tables for various collectives
- Fix issue with launching single-process MPI jobs
- Fix compilation error in the CH3-TCP/IP channel
- Thanks to Isaac Carroll at Lightfleet for the patch
- Fix issue with memory barrier instructions on ARM
- Thanks to Pavel (Pasha) Shamis at ARM for reporting the issue
- Fix issue with ring startup in multi-rail systems
- Fix startup issue with SLURM and PMI-1
- Thanks to Manuel Rodriguez for the report
- Fix startup issue caused by fix for bash `shellshock' bug
- Fix issue with very large messages in PSM
- Fix issue with singleton jobs and PMI-2
- Thanks to Adam T. Moody at LLNL for the report
- Fix incorrect reporting of non-existing files with Luster ADIO
- Thanks to Wei Kang at NWU for the report
- Fix hang in MPI_Probe
- Thanks to John Westlund at Intel for the report
- Fix issue while setting affinity with Torque Cgroups
- Thanks to Doug Johnson at OSC for the report
- Fix runtime errors observed when running MVAPICH2 on aarch64 platforms
- Thanks to Sreenidhi Bharathkar Ramesh at Broadcom for posting
the original patch
- Thanks to Michal Schmidt at RedHat for reposting it
- Fix failure in mv2_show_cpu_affinity with affinity disabled
- Thanks to Carlos Rosales-Fernandez at TACC for the report
- Fix mpirun_rsh error when running short-lived non-MPI jobs
- Thanks to Kevin Manalo at OSC for the report
- Fix comment and spelling mistake
- Thanks to Maksym Planeta for the report
- Ignore cpusets and cgroups that may have been set by resource manager
- Thanks to Adam T. Moody at LLNL for the report and the patch
- Fix reduce tuning table entry for 2ppn 2node
- Fix compilation issues due to inline keyword with GCC 5 and newer
- Fix compilation warnings and memory leaks
New features, enhancements and bug fixes for OSU Micro-Benchmarks
(OMB) 5.4.3 are listed here.
* Bug Fixes
- Fix buffer overflow in osu_reduce_scatter
- Thanks to Matias A Cabral @Intel for reporting the issue and patch
- Thanks to Gilles Gouaillardet for creating the patch
- Fix buffer overflow in one sided tests
- Thanks to John Byrne @HPE for reporting this issue
- Fix buffer overflow in multi threaded latency test
- Fix issues with freeing buffers for one-sided tests
- Fix issues with freeing buffers for CUDA-enabled tests
- Fix warning messages for benchmarks that do not support CUDA and/or
Managed memory
- Thanks to Carl Ponder at NVIDIA for reporting this issue
- Fix compilation warnings
For downloading MVAPICH2 2.3 GA, OSU Micro-Benchmarks (OMB) 5.4.3, associated user guides,
quick start guide, and accessing the SVN, please visit the following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedback, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu).
Thanks,
The MVAPICH Team
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 23480 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20180723/acd08996/attachment-0001.bin>
More information about the mvapich-discuss
mailing list