From panda at cse.ohio-state.edu Wed Sep 6 09:06:35 2023 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Wed, 6 Sep 2023 13:06:35 +0000 Subject: [Hidl-discuss] Announcing the release of MPI4DL 0.5 Message-ID: The High-Performance Deep Learning (HiDL) team is pleased to announce the first release of MPI4DL 0.5, which is a distributed and accelerated training framework for very high-resolution images that integrates Spatial Parallelism, Layer Parallelism, and Pipeline Parallelism with support for the high-performance MVAPICH2 CUDA-aware MPI library as the backend. The MPI4DL for Deep Learning applications and other associated releases (such as MPI4cuML for Machine Learning, MPI4Spark for Spark and MPI4Dask for Dask) allow MPI-driven converged software infrastructure to extract maximum performance and scalability for HPC, AI, Big Data and Data Science applications and workflows on modern heterogeneous clusters consisting of diverse CPUs, GPUs, and Interconnects (InfiniBand, ROCE, Omni-Path, iWARP, and SlingShot). The first release of the MPI4DL package is equipped with the following features: * MPI4DL 0.5: - Based on PyTorch - Support for training very high-resolution images - Distributed training support for: - Layer Parallelism (LP) - Pipeline Parallelism (PP) - Spatial Parallelism (SP) - Spatial and Layer Parallelism (SP+LP) - Spatial and Pipeline Parallelism (SP+PP) - Support for AmoebaNet and ResNet models - Support for different image sizes and custom datasets - Exploits collective features of the MVAPICH2-GDR MPI library - Compatible with - NVIDIA GPU A100 and V100 - CUDA [11.6, 11.7] - Python >= 3.8 - PyTorch [1.12.1 , 1.13.1] - MVAPICH2-GDR = 2.3.7 The MPI4DL package is open-source, and hosted at the following URL: https://github.com/OSU-Nowlab/MPI4DL. For associated release information, please visit the following URL: http://hidl.cse.ohio-state.edu Sample performance numbers for MPI4DL using deep learning application benchmarks can be viewed by visiting the `Performance' tab of the above website. All questions, feedback, and bug reports are welcome. Please post to hidl-discuss at lists.osu.edu. Thanks, The High-Performance Deep Learning (HiDL) Team http://hidl.cse.ohio-state.edu PS: The number of organizations using the HiDL stack has crossed 85 (from 21 countries). The HiDL team would like to thank all its users and organizations!!