From panda at cse.ohio-state.edu Tue Jul 1 22:50:12 2025 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Wed, 2 Jul 2025 02:50:12 +0000 Subject: [Hidl-announce] Announcing the release of High-Performance Deep Learning (HiDL) 2.0 package with MPI backend Message-ID: The High-Performance Deep Learning (HiDL) team is pleased to announce the 2.0 release of HiDL, which is a high-performance deep learning vendor-neutral stack based on the MVAPICH-Plus MPI backend. HiDL 2.0 uses PyTorch 2.0 and later versions with the MVAPICH-Plus backend to support large-scale Distributed Data Parallel (DDP) training workloads and targets modern GPU clusters and high-performance interconnects. This vendor-neutral approach does not require any vendor-supported collective communication library (such as NCCL or RCCL) and delivers competing performance on latest GPU clusters. The modified PyTorch 2.0 stack to use the latest MVAPICH-Plus is available in an open-source manner from the following location: https://github.com/OSU-Nowlab/pytorch/tree/hidl-2.0 * HiDL 2.0: PyTorch 2.0 with MVAPICH-Plus Features - Support for PyTorch 2.0 and later versions - Full support for PyTorch Native Distributed Data Parallel (DDP) training - Optimized support for MPI communication backend in model training workloads - Efficient large-message collectives (e.g., Allreduce) on various CPUs and GPUs - GPU-Direct Ring and Two-level multi-leader algorithms for Allreduce operations - Support for fork safety in distributed training environments - Exploits efficient large message collectives in MVAPICH-Plus 4.0 and later - Open-source PyTorch version with advanced MPI backend support - Available in our PyTorch tag (https://github.com/OSU-Nowlab/pytorch/tree/hidl-2.0) - Vendor-neutral stack with competitive performance and throughput to GPU-based collective libraries (etc. NCCL, RCCL) - Battle tested on modern HPC clusters (OLCF Frontier, TACC Vista, etc.) with up-to-date GPUs (NVIDIA and AMD) - Compatible with - InfiniBand Networks: Mellanox InfiniBand adapters (EDR, FDR, HDR, NDR) - Slingshot Networks: HPE Slingshot - GPU&CPU Support: - NVIDIA GPU A100, H100, GH200 - AMD MI200 series GPUs - Software Stack: - CUDA [12.x] and Latest CuDNN - ROCm [6.x] - (NEW)PyTorch [2.x] - (NEW)Python [3.x] For setting up the HiDL stack and the associated user guide, please visit the following URL: http://hidl.cse.ohio-state.edu Sample performance numbers for DDP training using the HiDL 2.0 stack is available from: http://hidl.cse.ohio-state.edu/performance/pytorch-ddp-gpu/ All questions, feedback, and bug reports are welcome. Please post to hidl-discuss at lists.osu.edu. Thanks, The High-Performance Deep Learning (HiDL) Team http://hidl.cse.ohio-state.edu