[Hidl-discuss] Announcing the release of MCR-DL v0.1

Panda, Dhabaleswar panda at cse.ohio-state.edu
Sun Dec 24 11:28:36 EST 2023


The High-Performance Deep Learning (HiDL) team is pleased to announce
the release of MCR-DL 0.1. This is an interface between PyTorch and
communication backends such as MPI and NCCL that enables
high-performance communication, simple communication extensibility,
and a suite of PyTorch communication benchmarks. This helps an user to
achieve the best performance and scalability for distributed DL
training by mixing-and-matching communication backends.

The new features available with this release of the MCR-DL package are
as follows:

* MCR-DL 0.1:
      • (NEW) Support for several communication backends, enabling
         MPI communication without PyTorch source builds
          o (NEW) PyTorch distributed module
          o (NEW) Pure MPI
          o (NEW) Pure NCCL
      • (NEW) PyTorch communication benchmarking suite
      • (NEW) Testing suite
      • (NEW) Tested with
          . NVIDIA GPU A100 and H100
          . CUDA [11.7, 12.1]
          . Python >= 3.8
          . PyTorch [1.13.1 , 2.0.1]
          . MVAPICH2-GDR = 2.3.7
          . MVAPICH-Plus = 3.0b

The MCR-DL package is open-source, and hosted at the following URL:

https://github.com/OSU-Nowlab/MCR-DL.

For associated release information, please visit the following URL:

http://hidl.cse.ohio-state.edu

All questions, feedback, and bug reports are welcome. Please post to
hidl-discuss at lists.osu.edu.

Thanks,

The High-Performance Deep Learning (HiDL) Team
http://hidl.cse.ohio-state.edu

PS: The number of organizations using the HiDL stacks has crossed 88
(from 21 countries).  The HiDL team would like to thank all its users
and organizations!!


More information about the Hidl-discuss mailing list