From panda at cse.ohio-state.edu  Sun Dec 24 02:00:29 2023
From: panda at cse.ohio-state.edu (Panda, Dhabaleswar)
Date: Sun, 24 Dec 2023 07:00:29 +0000
Subject: [Mvapich] Announcing the release of MVAPICH-Plus 3.0rc
Message-ID: <CO1PR01MB7242C37CCA99ED13C4FC1A3FDA9AA@CO1PR01MB7242.prod.exchangelabs.com>

The MVAPICH team is pleased to announce the release of MVAPICH-Plus
3.0rc.

The new MVAPICH-Plus series is an advanced version of the MVAPICH MPI
library.  It is targeted to support unified MVAPICH2-GDR and
MVAPICH2-X features. It is also targeted to provide optimized support
for modern platforms (CPU, GPU, and interconnects) for HPC, Deep
Learning, Machine Learning, Big Data and Data Science applications.

The major features and enhancements available in MVAPICH-Plus 3.0rc are
as follows:

    - Based on MVAPICH 3.0
    - Support for various high-performance communication fabrics
        - InfiniBand, Slingshot-10/11, Omni-Path, OPX, RoCE, and Ethernet
    - Supports naive CPU staging for small message collective operations
        - Tuned naive limits for the following systems
            - Pitzer at OSC, Owens at OSC, Ascend at OSC, Frontera at TACC, Lonestar6 at TACC,
              ThetaGPU at ALCF, Polaris at ALCF, Tioga at LLNL
        - Initial support for blocking collectives on NVIDIA and AMD GPUs
            - Reduce_local, Reduce_scatter_block
    - Initial support for blocking collectives on NVIDIA and AMD GPUs
        - Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather,
          Gatherv, Reduce, Reduce_scatter, Scatter, Scatterv
    - Initial support for non-blocking GPU collectives on NVIDIA and AMD GPUs
        - Iallgather, Iallgatherv, Iallreduce, Ialltoall, Ialltoallv, Ibcast,
          Igather, Igatherv, Ireduce, Ireduce_scatter, Iscatter, Iscatterv
    - Enhanced collective tuning for NVIDIA V100, A100, H100 GPUs
    - Enhanced collective tuning for AMD MI100, and MI250x GPUs
    - Enhanced support for blocking GPU to GPU point-to-point operations on
      NVIDIA and AMD GPUs
        - Send, Recv
        - NVIDIA GDRCopy, AMD LargeBar support
        - CUDA and ROCM IPC support
    - Support for non-blocking GPU to GPU point-to-point operations on
      NVIDIA and AMD GPUs
        - Isend, Irecv
    - Tested with
        - Various HPC applications, mini-applications, and benchmarks
        - MPI4cuML (a custom cuML package with MPI support)
    - Tested with CUDA <= 12.3
    - Tested with ROCM <= 5.6.0

For downloading MVAPICH-Plus 3.0rc library and associated user guide,
please visit the following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedback, bug reports, hints for performance tuning,
patches, and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu).

Thanks,

The MVAPICH Team

PS: We are also happy to inform that the number of organizations using
MVAPICH2 libraries (and registered at the MVAPICH site) has crossed
3,325 worldwide (in 90 countries). The number of downloads from the
MVAPICH site has crossed 1,747,000 (1.74 million).  The MVAPICH team
would like to thank all its users and organizations!!


From panda at cse.ohio-state.edu  Sun Dec 24 11:28:36 2023
From: panda at cse.ohio-state.edu (Panda, Dhabaleswar)
Date: Sun, 24 Dec 2023 16:28:36 +0000
Subject: [Mvapich] Announcing the release of MCR-DL v0.1
Message-ID: <CO1PR01MB724203FD845915A4A12DFC3EDA9AA@CO1PR01MB7242.prod.exchangelabs.com>

The High-Performance Deep Learning (HiDL) team is pleased to announce
the release of MCR-DL 0.1. This is an interface between PyTorch and
communication backends such as MPI and NCCL that enables
high-performance communication, simple communication extensibility,
and a suite of PyTorch communication benchmarks. This helps an user to
achieve the best performance and scalability for distributed DL
training by mixing-and-matching communication backends.

The new features available with this release of the MCR-DL package are
as follows:

* MCR-DL 0.1:
      ? (NEW) Support for several communication backends, enabling
         MPI communication without PyTorch source builds
          o (NEW) PyTorch distributed module
          o (NEW) Pure MPI
          o (NEW) Pure NCCL
      ? (NEW) PyTorch communication benchmarking suite
      ? (NEW) Testing suite
      ? (NEW) Tested with
          . NVIDIA GPU A100 and H100
          . CUDA [11.7, 12.1]
          . Python >= 3.8
          . PyTorch [1.13.1 , 2.0.1]
          . MVAPICH2-GDR = 2.3.7
          . MVAPICH-Plus = 3.0b

The MCR-DL package is open-source, and hosted at the following URL:

https://github.com/OSU-Nowlab/MCR-DL.

For associated release information, please visit the following URL:

http://hidl.cse.ohio-state.edu

All questions, feedback, and bug reports are welcome. Please post to
hidl-discuss at lists.osu.edu.

Thanks,

The High-Performance Deep Learning (HiDL) Team
http://hidl.cse.ohio-state.edu

PS: The number of organizations using the HiDL stacks has crossed 88
(from 21 countries).  The HiDL team would like to thank all its users
and organizations!!