From panda at cse.ohio-state.edu Sun Nov 9 06:02:03 2025 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Sun, 9 Nov 2025 11:02:03 +0000 Subject: [Mvapich-discuss] Join the MVAPICH team for multiple events at Supercomputing '25 conference Message-ID: The MVAPICH team members will be participating in multiple events during Supercomputing '25 conference. The Ohio State University (OSU) booth (#414) will also feature leading speakers from academia (Sameer Shende from Univ. of Oregon, Scott Callaghan from University of Southern California, Toshihiro Hanawa from Univ. of Tokyo, Ahmad Abdelfattah from University of Tennessee), National Laboratories and HPC Centers (Mahidhar Tatineni of San Diego Supercomputer Center, John Cazes and Amit Ruhela, PhD of Texas Advanced Computing Center), and industry (Hemal Shah of Broadcom, Sameer Shende of ParaTools, Inc., Soham Ghosh of X-ScaleSolutions, and Parikshit Godbole of CDACINDIA). Join us for these events and talk in person with the project team members and the invited speakers!! More details of the events are provided at: http://mvapich.cse.ohio-state.edu/conference/997/talks/ Alternatively, you can use the attached QR code to view the event details. Thanks, The MVAPICH Team -------------- next part -------------- A non-text attachment was scrubbed... Name: sc25qr.png Type: image/png Size: 56279 bytes Desc: sc25qr.png URL: From mgs.rus.52 at gmail.com Thu Nov 13 05:55:24 2025 From: mgs.rus.52 at gmail.com (Alex) Date: Thu, 13 Nov 2025 10:55:24 +0000 Subject: [Mvapich-discuss] mvapich-4.0, libmpi_so_version In-Reply-To: References: Message-ID: Hmmm. 4.1 was out but the issue is still there. Cheers, Alex ??, 10 ???. 2025??. ? 20:53, Shineman, Nat : > Alex, > > Thanks for pointing out this oversight. There were some changes to our > release process and it looks like we missed updating the .so number for a > stable release version. We will have this fixed in future versions. > > Thanks, > > [image: The Ohio State University] > > *Nat Shineman* > Senior Software Engineer - MVAPICH > > *The Ohio State University* > College of Engineering > Network Based Computing Laboratory > 799 Dreese Lab > 2015 Neil Ave, Columbus, OH 43210 > shineman.5 at osu.edu > ------------------------------ > *From:* Mvapich-discuss on behalf > of Alex via Mvapich-discuss > *Sent:* Wednesday, July 9, 2025 16:37 > *To:* mvapich-discuss at lists.osu.edu > *Subject:* [Mvapich-discuss] mvapich-4.0, libmpi_so_version > > Hi, In mvapich-4. 0 libmpi_so_version is reset to 0: 0: 0. Is this how it > should be and if it is would you mind explaining why? Cheers, Alex ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? > Hi, > In mvapich-4.0 libmpi_so_version is reset to 0:0:0. Is this how it should > be and if it is would you mind explaining why? > > Cheers, Alex > -------------- next part -------------- An HTML attachment was scrubbed... URL: From panda at cse.ohio-state.edu Fri Nov 14 11:58:48 2025 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Fri, 14 Nov 2025 16:58:48 +0000 Subject: [Mvapich-discuss] Announcing the release of High-Performance Deep Learning (HiDL) 2.1 package with MPI backend Message-ID: The High-Performance Deep Learning (HiDL) team is pleased to announce the 2.1 release of HiDL, which is a high-performance deep learning vendor-neutral stack based on the MVAPICH-Plus MPI backend. HiDL 2.1 uses PyTorch 2.8.0 version with the MVAPICH-Plus backend to support large-scale Distributed Data Parallel (DDP) training workloads and targets modern GPU clusters and high-performance interconnects. This vendor-neutral approach does not require any vendor-supported collective communication library (such as NCCL or RCCL) and delivers competing performance on latest GPU clusters. The modified PyTorch 2.8.0 stack to use the latest MVAPICH-Plus is available in an open-source manner from the following location: https://github.com/OSU-Nowlab/pytorch/tree/HiDL-2.0-torch2.8.0 * HiDL 2.1: PyTorch 2.8.0 with MVAPICH-Plus Features - Support for PyTorch 2.8.0 - Full support for PyTorch Native Distributed Data Parallel (DDP) training - Optimized support for MPI communication backend in model training workloads - Efficient large-message collectives (e.g., Allreduce) on various CPUs and GPUs - GPU-Direct Ring and Two-level multi-leader algorithms for Allreduce operations - Support for fork safety in distributed training environments - Exploits efficient large message collectives in MVAPICH-Plus 4.1 and later - Open-source PyTorch version with advanced MPI backend support - Available in our PyTorch tag (https://github.com/OSU-Nowlab/pytorch/tree/HiDL-2.0-torch2.8.0) - Vendor-neutral stack with competitive performance and throughput to GPU-based collective libraries (etc. NCCL, RCCL) - Battle tested on modern HPC clusters (OLCF Frontier, TACC Vista, SDSC Cosmos (New) with up-to-date GPUs (NVIDIA and AMD) - Compatible with - InfiniBand Networks: Mellanox InfiniBand adapters (EDR, FDR, HDR, NDR) - Slingshot Networks: HPE Slingshot - GPU&CPU Support: - NVIDIA GPU A100, H100, GH200 - AMD MI250X and MI300A (NEW) GPUs - Software Stack: - CUDA [12.x] and Latest CuDNN - ROCm [6.x] - (NEW)PyTorch [2.8.0] - (NEW)Python [3.x] For setting up the HiDL stack and the associated user guide, please visit the following URL: http://hidl.cse.ohio-state.edu Sample performance numbers for DDP training using the HiDL 2.1 stack on a set of representative systems is available from: http://hidl.cse.ohio-state.edu/performance/pytorch-ddp-gpu/ All questions, feedback, and bug reports are welcome. Please post to hidl-discuss at lists.osu.edu. Thanks, The High-Performance Deep Learning (HiDL) Team http://hidl.cse.ohio-state.edu From christopher.washburn at villanova.edu Fri Nov 14 13:53:10 2025 From: christopher.washburn at villanova.edu (Christopher Washburn) Date: Fri, 14 Nov 2025 18:53:10 +0000 Subject: [Mvapich-discuss] Why are all RANK=0 when I'm using MVAPICH? Message-ID: Greeting community, I'm building mvapich with spack and trying to use it in production use. The build is fairly straightforward: Concretized -------------------------------- [+] mvapich at 3.0%gcc at 12.3.0~alloca~cuda~debug+regcache+wrapperrpath build_system=autotools ch3_rank_bits=32 file_systems=auto netmod=ucx pmi_version=simple process_managers=slurm threads=multiple arch=linux-ubuntu22.04-zen2 [e] ^bison at 3.8.2%gcc at 12.3.0~color build_system=autotools arch=linux-ubuntu22.04-zen2 [e] ^findutils at 4.8.0%gcc at 12.3.0 build_system=autotools patches=440b954 arch=linux-ubuntu22.04-zen2 [+] ^gcc-runtime at 12.3.0%gcc at 12.3.0 build_system=generic arch=linux-ubuntu22.04-zen2 [e] ^glibc at 2.35%gcc at 12.3.0 build_system=autotools arch=linux-ubuntu22.04-zen2 [e] ^gmake at 4.3%gcc at 12.3.0~guile build_system=generic patches=599f134 arch=linux-ubuntu22.04-zen2 [+] ^libpciaccess at 0.17%gcc at 12.3.0 build_system=autotools arch=linux-ubuntu22.04-zen2 [e] ^libtool at 2.4.6%gcc at 12.3.0 build_system=autotools arch=linux-ubuntu22.04-zen2 [+] ^util-macros at 1.19.3%gcc at 12.3.0 build_system=autotools arch=linux-ubuntu22.04-zen2 [+] ^libxml2 at 2.10.3%gcc at 12.3.0+pic~python+shared build_system=autotools arch=linux-ubuntu22.04-zen2 [+] ^xz at 5.4.6%gcc at 12.3.0~pic build_system=autotools libs=shared,static arch=linux-ubuntu22.04-zen2 [+] ^pkgconf at 2.2.0%gcc at 12.3.0 build_system=autotools arch=linux-ubuntu22.04-zen2 [e] ^slurm at 24.05.7%gcc at 12.3.0~cgroup~gtk~hdf5~hwloc~mariadb~nvml~pam~pmix+readline~restd~rsmi build_system=autotools sysconfdir=PREFIX/etc arch=linux-ubuntu22.04-zen2 [e] ^ucx at 1.18%gcc at 12.3.0~assertions~backtrace_detail~cma~cuda~dc~debug~dm+examples~gdrcopy~gtest~ib_hw_tm~java~knem~logging~mlx5_dv+openmp+optimizations~parameter_checking+pic~rc~rdmacm~rocm~thread_multiple~ucg~ud~verbs~vfs~xpmem build_system=autotools libs=shared,static opt=3 simd=auto arch=linux-ubuntu22.04-zen2 [+] ^zlib-ng at 2.1.6%gcc at 12.3.0+compat+new_strategies+opt+pic+shared build_system=autotools arch=linux-ubuntu22.04-zen2 And the application program is a small test case: #include #include #include #include #include #include int main(int argc, char **argv) { int rank; char hostname[256]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); gethostname(hostname, sizeof(hostname)); printf("Hello world from rank %d on %s\n", rank,hostname); if (rank == 0) { printf("zlib version: %s\n", ZLIB_VERSION); /* printf("zlib-ng version: %s\n", ZLIBNG_VERSION); */ } MPI_Finalize(); } The usual answer is that I might be using 2 different mpi packages (i.e. openmpi + mvapich), checked that and it does not appear to be the case: chris at augie:~$ ldd ./a.out linux-vdso.so.1 (0x00007ffe0e28d000) libmpi.so.12 => /mnt/beegfs/home/spack/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/mvapich-3.0-cosed2xwznawlvn3khfemwdzbcjozp3d/lib/libmpi.so.12 (0x00001491c9f59000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00001491c9d19000) libpciaccess.so.0 => /mnt/beegfs/home/spack/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/libpciaccess-0.17-fqdfrz6bxnslv5prv6iok55bz3i4oupu/lib/libpciaccess.so.0 (0x00001491c9d0b000) libxml2.so.2 => /mnt/beegfs/home/spack/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/libxml2-2.10.3-u6g3r33et3pxq62z4jjyypv5goormddr/lib/libxml2.so.2 (0x00001491c9b9a000) libucp.so.0 => /usr/lib/libucp.so.0 (0x00001491c9ac5000) libucs.so.0 => /usr/lib/libucs.so.0 (0x00001491c9a62000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00001491c997b000) /lib64/ld-linux-x86-64.so.2 (0x00001491cb068000) libz.so.1 => /mnt/beegfs/home/spack/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/zlib-ng-2.1.6-lzzj5kvgu4zn72nruzsbnod554swe4rb/lib/libz.so.1 (0x00001491c9952000) liblzma.so.5 => /mnt/beegfs/home/spack/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/xz-5.4.6-eusbwoeb7he7a3lyizkglevxnotl6d3x/lib/liblzma.so.5 (0x00001491c9922000) libuct.so.0 => /usr/lib/libuct.so.0 (0x00001491c98e5000) libucm.so.0 => /usr/lib/libucm.so.0 (0x00001491c98ca000) chris at augie:~$ ldd /mnt/beegfs/home/spack/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-12.3.0/mvapich-3.0-cosed2xwznawlvn3khfemwdzbcjozp3d/bin/mpiexec linux-vdso.so.1 (0x00007ffd9010c000) liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x000014ecd44ee000) libslurmfull.so => /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so (0x000014ecd42c8000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000014ecd409f000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000014ecd3fb8000) libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x000014ecd3fa4000) /lib64/ld-linux-x86-64.so.2 (0x000014ecd4550000) So what am I missing? Christopher A. Washburn Villanova University Research Computing Administrator Villanova Research 800 Lancaster Avenue Villanova, Pennsylvania 19085 Phone: 610 519-4711 Cell: 484 431-6619 Christopher.washburn at villanova.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From panda at cse.ohio-state.edu Fri Nov 21 16:31:32 2025 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Fri, 21 Nov 2025 21:31:32 +0000 Subject: [Mvapich-discuss] Announcing the release of OSU Micro-Benchmarks (OMB) 8.0b In-Reply-To: References: Message-ID: The MVAPICH team is pleased to announce the release of OSU Micro-Benchmarks (OMB) 8.0b. The new features, enhancements, and bug fixes for OSU Micro-Benchmarks (OMB) 8.0b are listed here: * New Features & Enhancements (since 7.5.1) - Fully rebuilt benchmarks on common template files. - Reduces variance between test code to facilitate easier community contributions - Allows for single point of change contributions - Added unified OMB launcher for batch testing of pt2pt, collective, and RMA tests - Allows one-step run of some or all benchmarks of a given type - Configurable behavior allows for faster comprehensive benchmarking - Supports batch testing of some or all tests in a given category - Reports test failures or incompatible arguments on exit - Fully backwards compatible with existing scripts - Full details for using the launcher available in README - Added support for colored terminal output for enhanced visual output * Bug Fixes - Fixed bug where a value of zero for warmup messages would result in test failure For downloading OMB 8.0b and associated README instructions, please visit the following URL: http://mvapich.cse.ohio-state.edu (under Benchmarks tab) All questions, feedback, bug reports, hints for performance tuning, patches, and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu). Thanks, The MVAPICH Team PS: We are also happy to inform you that the number of organizations using MVAPICH libraries (and registered at the MVAPICH site) has crossed 3,475 worldwide (in 93 countries). The number of downloads from the MVAPICH site has crossed 1,983,000 (1.983 million). The MVAPICH team would like to thank all its users and organizations!!