[Hidl-discuss] Announcing the release of MPI4DL 0.6

Panda, Dhabaleswar panda at cse.ohio-state.edu
Thu Nov 9 14:10:10 EST 2023


The High-Performance Deep Learning (HiDL) team is pleased to announce
the release of MPI4DL 0.6, which is a distributed and accelerated training
framework for very high-resolution images that integrates Spatial Parallelism,
Layer Parallelism, Bidirectional Parallelism, and Pipeline Parallelism with
support for the MVAPICH2-GDR high-performance CUDA-aware
communication backend.

This library allows MPI-driven converged software infrastructure to extract
maximum performance and scalability for AI, Big Data and Data Science
applications and workflows on modern heterogeneous clusters consisting
of diverse CPUs, GPUs, and Interconnects (InfiniBand, ROCE, Omni-Path, iWARP,
and SlingShot).

The new features available with this release of the MPI4DL package are as follows:

* MPI4DL 0.6:

  *   Based on PyTorch
  *   (NEW) Support for training very high-resolution images
     *   Distributed training support for:
        *   Layer Parallelism (LP)
        *   Pipeline Parallelism (PP)
        *   Spatial Parallelism (SP)
        *   Spatial and Layer Parallelism (SP+LP)
        *   Spatial and Pipeline Parallelism (SP+PP)
        *   (NEW) Bidirectional and Layer Parallelism (GEMS+LP)
        *   (NEW) Bidirectional and Pipeline Parallelism (GEMS+PP)
        *   (NEW) Spatial, Bidirectional and Layer Parallelism (SP+GEMS+LP)
        *   (NEW) Spatial, Bidirectional and Pipeline Parallelism (SP+GEMS+PP)
     *   (NEW) Support for AmoebaNet and ResNet models
     *   (NEW) Support for different image sizes and custom datasets
  *   Exploits collective features of MVAPICH2-GDR
  *   Compatible with
     *   NVIDIA GPU A100 and V100
     *   CUDA [11.6, 11.7]
     *   Python >= 3.8
     *   PyTorch [1.12.1 , 1.13.1]
     *   MVAPICH2-GDR = 2.3.7
The MPI4DL package is open-source, and hosted at the following URL:

https://github.com/OSU-Nowlab/MPI4DL.

For associated release information, please visit the following URL:

http://hidl.cse.ohio-state.edu<http://hidl.cse.ohio-state.edu/>
Sample performance numbers for MPI4DL using deep learning
application benchmarks can be viewed by visiting the `Performance' tab
of the above website.

All questions, feedback, and bug reports are welcome. Please post to
hidl-discuss at lists.osu.edu<mailto:hidl-discuss at lists.osu.edu>.

Thanks,

The High-Performance Deep Learning (HiDL) Team
http://hidl.cse.ohio-state.edu<http://hidl.cse.ohio-state.edu/>

PS: The number of organizations using the HiDL stacks has crossed 88
(from 21 countries).  The HiDL team would like to thank all its users
and organizations!!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/hidl-discuss/attachments/20231109/e7378aef/attachment-0001.html>


More information about the Hidl-discuss mailing list