From panda at cse.ohio-state.edu  Tue May  5 21:55:00 2026
From: panda at cse.ohio-state.edu (Panda, Dhabaleswar)
Date: Wed, 6 May 2026 01:55:00 +0000
Subject: [Mvapich-discuss] Announcing the release of a new HPC-Accelerated
 AI (HPC-AI) v1.0 software stack
Message-ID: <DM6PR01MB4315C1A29163D07A90E50142DA3F2@DM6PR01MB4315.prod.exchangelabs.com>

The HPC-Accelerated AI (HPC-AI) team, formerly known as the
High-Performance Deep Learning (HiDL) team, is pleased to announce the
1.0 release of the HPC-AI software stack.  The HPC-AI project
introduces a vendor neutral software stack to implement
high-performance and scalable distributed training and inference using
the popular MVAPICH-Plus MPI-based communication library supporting
excellent scale-up and scale-out with modern CPUs, GPUs, and
interconnects. The objective of the HPC-AI project is to exploit
modern HPC technologies to provide high-performance and scalable
solutions for foundational model training, agentic workflows, and
reinforcement learning.

The 1.0 release of the HPC-AI stack introduces the following key
 features:

    - Full-stack integration of Training and Inference Frameworks:
      PyTorch, DeepSpeed, vLLM, and SGLang with MVAPICH-Plus
    - Native PyTorch Distributed Data Parallel (DDP) training
      with MPI backend
    - Advanced decoding method (MAC-Attention) and communication
      runtime (MCR-DL)
    - Efficient large-message collectives (e.g., Allreduce) on
      various CPUs and GPUs
    - GPU-Direct Ring and Two-level multi-leader algorithms for
      Allreduce operations
    - Support for fork safety in distributed training and
      inference environments
    - Exploits efficient large message collectives in MVAPICH-Plus 4.1
      and later Open-source framework builds with advanced MPI
      backend support
    - Vendor-neutral stack with competitive performance to GPU-based
      collective libraries (e.g., NCCL, RCCL)
    - Battle tested on modern HPC clusters (e.g., OLCF Frontier,
      TACC Vista, SDSC Cosmos) with up-to-date accelerator
      generations (e.g., AMD, NVIDIA)
    - Compatible with
        - InfiniBand Networks: Mellanox InfiniBand adapters
          (EDR, FDR, HDR, NDR)
        - Slingshot Networks: HPE Slingshot
        - GPU&CPU Support:
            - NVIDIA GPU A100, H100, GH200
            - AMD MI250X, MI300A GPUs
        - Software Stack:
            - CUDA [12.x] and Latest CuDNN
            - (NEW)ROCm [7.x]
            - (NEW)PyTorch [2.10.0]
            - (NEW)Training & Inference: DeepSpeed, vLLM, SGLang
            - (NEW)Advanced: MAC-Attention, MCR-DL
            - (NEW)Python [3.x]

The HPC-AI 1.0 stack is available in an open-source manner from the
following location:
https://github.com/OSU-Nowlab/pytorch/tree/hpc_ai_v1.0

For setting up the HPC-AI stack and the associated user guide, please
visit the following URL:

https://hpc-ai.engineering.osu.edu/

Sample performance numbers for DDP training using the HPC-AI 1.0 stack
on a set of representative systems is available from:
https://hpc-ai.engineering.osu.edu/performance/pytorch-ddp-gpu/

All questions, feedback, and bug reports are welcome. Please post to
hidl-discuss at lists.osu.edu.

Thanks,

The HPC-Accelerated AI (HPC-AI) Team
https://hpc-ai.engineering.osu.edu/

From panda at cse.ohio-state.edu  Sun May 10 08:16:09 2026
From: panda at cse.ohio-state.edu (Panda, Dhabaleswar)
Date: Sun, 10 May 2026 12:16:09 +0000
Subject: [Mvapich-discuss] MUG '26 Call for Presentation
Message-ID: <DM6PR01MB431532A53EC9E33BAE45FDAADA3B2@DM6PR01MB4315.prod.exchangelabs.com>

The MVAPICH team is excited to host the 14th annual MVAPICH User Group (MUG) conference. It will take place from August 17-19, 2026 in Columbus, Ohio, USA. It will be held in a hybrid manner. The MUG conference aims to bring together the MVAPICH users, researchers, developers, and system administrators to share their experiences, knowledge and to learn from each other. The event includes keynote talks, invited tutorials, invited talks, contributed presentations, Open Mic session, hands-on sessions with the MVAPICH developers, etc.

This year, we will be holding special tutorials and demo/hands-on sessions during the first day of the event (August 17th). Other talks and sessions will be held on August 18-19. 

A set of short contributed presentations from the MVAPICH users will be included in the event.

Topics for presentations include, but are not limited to:

- Case studies and best practices of novel applications from different application domains, such as: astronomy, bioinformatics, biology, earth and atmospheric sciences, fluid dynamics, materials science and engineering, medicine, physics, and AI (machine learning, deep learning (training and inference), and agentic computing)

- Performance and scalability studies of applications on large-scale systems

- Special tuning and optimization strategies to exploit maximum performance and scalability

- Tools and code instrumentation for measuring and monitoring performance and/or resilience

- Tools for parallel program development (e.g. debuggers and integrated development environments)

- Unique usage scenarios with GPUs, DPUs, APUs, FPGAs, Energy-Awareness, Virtualization, Quantum simulation, etc. 

The submission should include the title of the presentation, speaker(s), short bio of the speaker(s), and a draft version of the presentation (around 10-15 slides in PDF or PowerPoint format).

Please send your submission in a single file to mug at cse.ohio-state.edu. 

Presentation Submission Deadline: 	July 13, 2026
Notification of Acceptance: 	                July 20, 2026

More details on the conference and Call for Presentations are available from 
http://mug.mvapich.cse.ohio-state.edu/

Thanks, 

The MVAPICH Team

From panda at cse.ohio-state.edu  Sat May 16 18:36:07 2026
From: panda at cse.ohio-state.edu (Panda, Dhabaleswar)
Date: Sat, 16 May 2026 22:36:07 +0000
Subject: [Mvapich-discuss] Announcing the release of MVAPICH-Plus 5.0 GA
In-Reply-To: <DM6PR01MB43159D93A5634E27DBA32049DA052@DM6PR01MB4315.prod.exchangelabs.com>
References: <DM6PR01MB43159D93A5634E27DBA32049DA052@DM6PR01MB4315.prod.exchangelabs.com>
Message-ID: <DM6PR01MB4315FEA0392EE91E6ADB9880DA052@DM6PR01MB4315.prod.exchangelabs.com>

The MVAPICH team is pleased to announce the release of MVAPICH-Plus
5.0 GA.

The new MVAPICH-Plus series is an advanced version of the MVAPICH MPI
library.  It is targeted to support unified MVAPICH2-GDR and
MVAPICH2-X features. It is also targeted to provide optimized support
for modern platforms (CPU, GPU, and interconnects) for HPC, AI
(training and inference), Big Data and Data Science applications.

The major features and enhancements available in MVAPICH-Plus 5.0 GA
are as follows:

* Features and Enhancements (since 4.1)
        - Based on MPICH 5.0.0
    - Added new algorithms for alltoall
        - Added pipelined pairwise composition and push/pull alltoall
    - Added support GPU on-the-fly compression with default tuning
        - Can be enabled for all communicators with CVAR
        - Can be enabled for select communicators using hints
    - Enabled enhanced collective support on AMD systems with XNACK support

* Bug Fixes (since 4.1)
    - Resolved performance degradation in Alltoall compression
    - Resolved error with HIP based compression in certain applications
        - Thanks to Natalie Beams @UTK for the report

For downloading MVAPICH-Plus 5.0 GA library and associated user guide,
please visit the following URL:

http://mvapich.cse.ohio-state.edu

All questions, feedback, bug reports, hints for performance tuning,
patches, and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu).

Thanks,

The MVAPICH Team

PS: We are also happy to announce that the number of organizations
using MVAPICH libraries (and registered at the MVAPICH site) has
crossed 3,500 worldwide (in 94 countries). The number of downloads
from the MVAPICH site has crossed 2,000,000 (2.0 million).  The
MVAPICH team would like to thank all its users and organizations!!