From panda at cse.ohio-state.edu Sun Dec 24 11:28:36 2023 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Sun, 24 Dec 2023 16:28:36 +0000 Subject: [Hidl-discuss] Announcing the release of MCR-DL v0.1 Message-ID: The High-Performance Deep Learning (HiDL) team is pleased to announce the release of MCR-DL 0.1. This is an interface between PyTorch and communication backends such as MPI and NCCL that enables high-performance communication, simple communication extensibility, and a suite of PyTorch communication benchmarks. This helps an user to achieve the best performance and scalability for distributed DL training by mixing-and-matching communication backends. The new features available with this release of the MCR-DL package are as follows: * MCR-DL 0.1: ? (NEW) Support for several communication backends, enabling MPI communication without PyTorch source builds o (NEW) PyTorch distributed module o (NEW) Pure MPI o (NEW) Pure NCCL ? (NEW) PyTorch communication benchmarking suite ? (NEW) Testing suite ? (NEW) Tested with . NVIDIA GPU A100 and H100 . CUDA [11.7, 12.1] . Python >= 3.8 . PyTorch [1.13.1 , 2.0.1] . MVAPICH2-GDR = 2.3.7 . MVAPICH-Plus = 3.0b The MCR-DL package is open-source, and hosted at the following URL: https://github.com/OSU-Nowlab/MCR-DL. For associated release information, please visit the following URL: http://hidl.cse.ohio-state.edu All questions, feedback, and bug reports are welcome. Please post to hidl-discuss at lists.osu.edu. Thanks, The High-Performance Deep Learning (HiDL) Team http://hidl.cse.ohio-state.edu PS: The number of organizations using the HiDL stacks has crossed 88 (from 21 countries). The HiDL team would like to thank all its users and organizations!!