From panda at cse.ohio-state.edu Tue May 5 21:55:00 2026 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Wed, 6 May 2026 01:55:00 +0000 Subject: [Mvapich-discuss] Announcing the release of a new HPC-Accelerated AI (HPC-AI) v1.0 software stack Message-ID: The HPC-Accelerated AI (HPC-AI) team, formerly known as the High-Performance Deep Learning (HiDL) team, is pleased to announce the 1.0 release of the HPC-AI software stack. The HPC-AI project introduces a vendor neutral software stack to implement high-performance and scalable distributed training and inference using the popular MVAPICH-Plus MPI-based communication library supporting excellent scale-up and scale-out with modern CPUs, GPUs, and interconnects. The objective of the HPC-AI project is to exploit modern HPC technologies to provide high-performance and scalable solutions for foundational model training, agentic workflows, and reinforcement learning. The 1.0 release of the HPC-AI stack introduces the following key features: - Full-stack integration of Training and Inference Frameworks: PyTorch, DeepSpeed, vLLM, and SGLang with MVAPICH-Plus - Native PyTorch Distributed Data Parallel (DDP) training with MPI backend - Advanced decoding method (MAC-Attention) and communication runtime (MCR-DL) - Efficient large-message collectives (e.g., Allreduce) on various CPUs and GPUs - GPU-Direct Ring and Two-level multi-leader algorithms for Allreduce operations - Support for fork safety in distributed training and inference environments - Exploits efficient large message collectives in MVAPICH-Plus 4.1 and later Open-source framework builds with advanced MPI backend support - Vendor-neutral stack with competitive performance to GPU-based collective libraries (e.g., NCCL, RCCL) - Battle tested on modern HPC clusters (e.g., OLCF Frontier, TACC Vista, SDSC Cosmos) with up-to-date accelerator generations (e.g., AMD, NVIDIA) - Compatible with - InfiniBand Networks: Mellanox InfiniBand adapters (EDR, FDR, HDR, NDR) - Slingshot Networks: HPE Slingshot - GPU&CPU Support: - NVIDIA GPU A100, H100, GH200 - AMD MI250X, MI300A GPUs - Software Stack: - CUDA [12.x] and Latest CuDNN - (NEW)ROCm [7.x] - (NEW)PyTorch [2.10.0] - (NEW)Training & Inference: DeepSpeed, vLLM, SGLang - (NEW)Advanced: MAC-Attention, MCR-DL - (NEW)Python [3.x] The HPC-AI 1.0 stack is available in an open-source manner from the following location: https://github.com/OSU-Nowlab/pytorch/tree/hpc_ai_v1.0 For setting up the HPC-AI stack and the associated user guide, please visit the following URL: https://hpc-ai.engineering.osu.edu/ Sample performance numbers for DDP training using the HPC-AI 1.0 stack on a set of representative systems is available from: https://hpc-ai.engineering.osu.edu/performance/pytorch-ddp-gpu/ All questions, feedback, and bug reports are welcome. Please post to hidl-discuss at lists.osu.edu. Thanks, The HPC-Accelerated AI (HPC-AI) Team https://hpc-ai.engineering.osu.edu/ From panda at cse.ohio-state.edu Sun May 10 08:16:09 2026 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Sun, 10 May 2026 12:16:09 +0000 Subject: [Mvapich-discuss] MUG '26 Call for Presentation Message-ID: The MVAPICH team is excited to host the 14th annual MVAPICH User Group (MUG) conference. It will take place from August 17-19, 2026 in Columbus, Ohio, USA. It will be held in a hybrid manner. The MUG conference aims to bring together the MVAPICH users, researchers, developers, and system administrators to share their experiences, knowledge and to learn from each other. The event includes keynote talks, invited tutorials, invited talks, contributed presentations, Open Mic session, hands-on sessions with the MVAPICH developers, etc. This year, we will be holding special tutorials and demo/hands-on sessions during the first day of the event (August 17th). Other talks and sessions will be held on August 18-19. A set of short contributed presentations from the MVAPICH users will be included in the event. Topics for presentations include, but are not limited to: - Case studies and best practices of novel applications from different application domains, such as: astronomy, bioinformatics, biology, earth and atmospheric sciences, fluid dynamics, materials science and engineering, medicine, physics, and AI (machine learning, deep learning (training and inference), and agentic computing) - Performance and scalability studies of applications on large-scale systems - Special tuning and optimization strategies to exploit maximum performance and scalability - Tools and code instrumentation for measuring and monitoring performance and/or resilience - Tools for parallel program development (e.g. debuggers and integrated development environments) - Unique usage scenarios with GPUs, DPUs, APUs, FPGAs, Energy-Awareness, Virtualization, Quantum simulation, etc. The submission should include the title of the presentation, speaker(s), short bio of the speaker(s), and a draft version of the presentation (around 10-15 slides in PDF or PowerPoint format). Please send your submission in a single file to mug at cse.ohio-state.edu. Presentation Submission Deadline: July 13, 2026 Notification of Acceptance: July 20, 2026 More details on the conference and Call for Presentations are available from http://mug.mvapich.cse.ohio-state.edu/ Thanks, The MVAPICH Team From panda at cse.ohio-state.edu Sat May 16 18:36:07 2026 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Sat, 16 May 2026 22:36:07 +0000 Subject: [Mvapich-discuss] Announcing the release of MVAPICH-Plus 5.0 GA In-Reply-To: References: Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH-Plus 5.0 GA. The new MVAPICH-Plus series is an advanced version of the MVAPICH MPI library. It is targeted to support unified MVAPICH2-GDR and MVAPICH2-X features. It is also targeted to provide optimized support for modern platforms (CPU, GPU, and interconnects) for HPC, AI (training and inference), Big Data and Data Science applications. The major features and enhancements available in MVAPICH-Plus 5.0 GA are as follows: * Features and Enhancements (since 4.1) - Based on MPICH 5.0.0 - Added new algorithms for alltoall - Added pipelined pairwise composition and push/pull alltoall - Added support GPU on-the-fly compression with default tuning - Can be enabled for all communicators with CVAR - Can be enabled for select communicators using hints - Enabled enhanced collective support on AMD systems with XNACK support * Bug Fixes (since 4.1) - Resolved performance degradation in Alltoall compression - Resolved error with HIP based compression in certain applications - Thanks to Natalie Beams @UTK for the report For downloading MVAPICH-Plus 5.0 GA library and associated user guide, please visit the following URL: http://mvapich.cse.ohio-state.edu All questions, feedback, bug reports, hints for performance tuning, patches, and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu). Thanks, The MVAPICH Team PS: We are also happy to announce that the number of organizations using MVAPICH libraries (and registered at the MVAPICH site) has crossed 3,500 worldwide (in 94 countries). The number of downloads from the MVAPICH site has crossed 2,000,000 (2.0 million). The MVAPICH team would like to thank all its users and organizations!!