[Hidl-discuss] Announcing the release of MPI4DL 0.5

Wed Sep 6 09:06:35 EDT 2023

The High-Performance Deep Learning (HiDL) team is pleased to announce
the first release of MPI4DL 0.5, which is a distributed and
accelerated training framework for very high-resolution images that
integrates Spatial Parallelism, Layer Parallelism, and Pipeline
Parallelism with support for the high-performance MVAPICH2 CUDA-aware
MPI library as the backend.

The MPI4DL for Deep Learning applications and other associated
releases (such as MPI4cuML for Machine Learning, MPI4Spark for Spark
and MPI4Dask for Dask) allow MPI-driven converged software
infrastructure to extract maximum performance and scalability for HPC,
AI, Big Data and Data Science applications and workflows on modern
heterogeneous clusters consisting of diverse CPUs, GPUs, and
Interconnects (InfiniBand, ROCE, Omni-Path, iWARP, and SlingShot).

The first release of the MPI4DL package is equipped with the following
features:

* MPI4DL 0.5:

    - Based on PyTorch
    - Support for training very high-resolution images
        - Distributed training support for:
            - Layer Parallelism (LP)
            - Pipeline Parallelism (PP)
            - Spatial Parallelism (SP)
            - Spatial and Layer Parallelism (SP+LP)
            - Spatial and Pipeline Parallelism (SP+PP)
        - Support for AmoebaNet and ResNet models
        - Support for different image sizes and custom datasets
    - Exploits collective features of the MVAPICH2-GDR MPI library
    - Compatible with
        - NVIDIA GPU A100 and V100
        - CUDA [11.6, 11.7]
        - Python >= 3.8
        - PyTorch [1.12.1 , 1.13.1]
        - MVAPICH2-GDR = 2.3.7

The MPI4DL package is open-source, and hosted at the following URL:

https://github.com/OSU-Nowlab/MPI4DL.

For associated release information, please visit the following URL:

http://hidl.cse.ohio-state.edu

Sample performance numbers for MPI4DL using deep learning application
benchmarks can be viewed by visiting the `Performance' tab of the
above website.

All questions, feedback, and bug reports are welcome. Please post to
hidl-discuss at lists.osu.edu.

Thanks,

The High-Performance Deep Learning (HiDL) Team
http://hidl.cse.ohio-state.edu

PS: The number of organizations using the HiDL stack has crossed 85
(from 21 countries).  The HiDL team would like to thank all its users
and organizations!!