[Mvapich] Announcing the release of MVAPICH-Plus 3.0 GA
Panda, Dhabaleswar
panda at cse.ohio-state.edu
Sat Mar 9 14:54:14 EST 2024
The MVAPICH team is pleased to announce the release of MVAPICH-Plus
3.0 GA.
The new MVAPICH-Plus series is an advanced version of the MVAPICH MPI
library. It is targeted to support unified MVAPICH2-GDR and
MVAPICH2-X features. It is also targeted to provide optimized support
for modern platforms (CPU, GPU, and interconnects) for HPC, Deep
Learning, Machine Learning, Big Data and Data Science applications.
The major features and enhancements available in MVAPICH-Plus 3.0 GA are
as follows:
- Based on MVAPICH 3.0
- Support for various high-performance communication fabrics
- InfiniBand, Slingshot-10/11, Omni-Path, OPX, RoCE, and Ethernet
- Support naive CPU staging approach for collectives for small messages
- Tune naive limits for the following systems
- Frontier at OLCF, Pitzer at OSC, Owens at OSC, Ascend at OSC, Frontera at TACC,
Lonestar6 at TACC, ThetaGPU at ALCF, Polaris at ALCF, Tioga at LLNL
- Initial support for blocking collectives on NVIDIA and AMD GPUs
- Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather,
Gatherv, Reduce, Reduce_scatter, Scatter, Scatterv, Reduce_local,
Reduce_scatter_block
- Initial support for non-blocking GPU collectives on NVIDIA and AMD GPUs
- Iallgather, Iallgatherv, Iallreduce, Ialltoall, Ialltoallv, Ibcast,
Igather, Igatherv, Ireduce, Ireduce_scatter, Iscatter, Iscatterv
- Enhanced collective and pt2pt tuning for NVIDIA Grace-Hopper systems
- Enhanced collective tuning for NVIDIA V100, A100, H100 GPUs
- Enhanced collective tuning for AMD MI100, and MI250x GPUs
- Enhanced support for blocking and non-blocking GPU to GPU point-to-point
operations on NVIDIA and AMD GPUs taking advantage of:
- NVIDIA GDRCopy, AMD LargeBar support
- CUDA and ROCM IPC support
- Enhanced CPU tuning on various HPC systems and architectures
- Stampede3 at TACC, Frontier at OLCF, Lonestar6 at TACC
- AMD Rome, AMD Millan, Intel Sapphire Rapids
- Tested with
- Various HPC applications, mini-applications, and benchmarks
- HiDL, MPI4DL, and MCR-DL packages for MPI-driven distributed training
- MPI4cuML (a custom cuML package with MPI support)
for scalable machine learning
- Tested with CUDA <= 12.3
- Tested with ROCM <= 5.6.0
For downloading MVAPICH-Plus 3.0 GA library and associated user guide,
please visit the following URL:
http://mvapich.cse.ohio-state.edu
All questions, feedback, bug reports, hints for performance tuning,
patches, and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu).
Thanks,
The MVAPICH Team
PS: We are also happy to inform that the number of organizations using
MVAPICH libraries (and registered at the MVAPICH site) has crossed
3,375 worldwide (in 91 countries). The number of downloads from the
MVAPICH site has crossed 1,765,000 (1.765 million). The MVAPICH team
would like to thank all its users and organizations!!
More information about the Mvapich
mailing list