[Hidl-discuss] Announcing the release of High-Performance Deep Learning (HiDL) 2.0 package with MPI backend
Panda, Dhabaleswar
panda at cse.ohio-state.edu
Tue Jul 1 22:50:12 EDT 2025
The High-Performance Deep Learning (HiDL) team is pleased to announce
the 2.0 release of HiDL, which is a high-performance deep learning
vendor-neutral stack based on the MVAPICH-Plus MPI backend. HiDL 2.0
uses PyTorch 2.0 and later versions with the MVAPICH-Plus backend to
support large-scale Distributed Data Parallel (DDP) training workloads
and targets modern GPU clusters and high-performance
interconnects. This vendor-neutral approach does not require any
vendor-supported collective communication library (such as NCCL or
RCCL) and delivers competing performance on latest GPU clusters.
The modified PyTorch 2.0 stack to use the latest MVAPICH-Plus is
available in an open-source manner from the following location:
https://github.com/OSU-Nowlab/pytorch/tree/hidl-2.0
* HiDL 2.0: PyTorch 2.0 with MVAPICH-Plus Features
- Support for PyTorch 2.0 and later versions
- Full support for PyTorch Native Distributed Data Parallel (DDP) training
- Optimized support for MPI communication backend in model training workloads
- Efficient large-message collectives (e.g., Allreduce) on various CPUs and GPUs
- GPU-Direct Ring and Two-level multi-leader algorithms for Allreduce operations
- Support for fork safety in distributed training environments
- Exploits efficient large message collectives in MVAPICH-Plus 4.0 and later
- Open-source PyTorch version with advanced MPI backend support
- Available in our PyTorch tag (https://github.com/OSU-Nowlab/pytorch/tree/hidl-2.0)
- Vendor-neutral stack with competitive performance and throughput
to GPU-based collective libraries (etc. NCCL, RCCL)
- Battle tested on modern HPC clusters (OLCF Frontier, TACC Vista, etc.) with
up-to-date GPUs (NVIDIA and AMD)
- Compatible with
- InfiniBand Networks: Mellanox InfiniBand adapters (EDR, FDR, HDR, NDR)
- Slingshot Networks: HPE Slingshot
- GPU&CPU Support:
- NVIDIA GPU A100, H100, GH200
- AMD MI200 series GPUs
- Software Stack:
- CUDA [12.x] and Latest CuDNN
- ROCm [6.x]
- (NEW)PyTorch [2.x]
- (NEW)Python [3.x]
For setting up the HiDL stack and the associated user guide, please
visit the following URL:
http://hidl.cse.ohio-state.edu
Sample performance numbers for DDP training using the HiDL 2.0 stack
is available from:
http://hidl.cse.ohio-state.edu/performance/pytorch-ddp-gpu/
All questions, feedback, and bug reports are welcome. Please post to
hidl-discuss at lists.osu.edu.
Thanks,
The High-Performance Deep Learning (HiDL) Team
http://hidl.cse.ohio-state.edu
More information about the Hidl-discuss
mailing list