From subramoni.1 at osu.edu Wed Nov 1 14:17:36 2023 From: subramoni.1 at osu.edu (Subramoni, Hari) Date: Wed, 1 Nov 2023 18:17:36 +0000 Subject: [Mvapich] Announcing the release of MVAPICH-Plus 3.0b Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH-Plus 3.0b. Please let me know if you have any comments or feedback. The new MVAPICH-Plus series is an advanced version of the MVAPICH MPI library. It is targeted to support unified MVAPICH2-GDR and MVAPICH2-X features. It is also targeted to provide optimized support for modern platforms (CPU, GPU, and interconnects) for HPC, Deep Learning, Machine Learning, Big Data and Data Science applications. The major features and enhancements available in MVAPICH-Plus 3.0b are as follows: - Based on MVAPICH 3.0 - Support for various high-performance communication fabrics - InfiniBand, Slingshot-10/11, Omni-Path, OPX, RoCE, and Ethernet - Supports naive CPU staging for small message collective operations - Tuned naive limits for the following systems - Pitzer at OSC, Owens at OSC, Ascend at OSC, Frontera at TACC, Lonestar6 at TACC, ThetaGPU at ALCF, Polaris at ALCF, Tioga at LLNL - Initial support for blocking collectives on NVIDIA and AMD GPUs - Reduce_local, Reduce_scatter_block - Initial support for blocking collectives on NVIDIA and AMD GPUs - Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather, Gatherv, Reduce, Reduce_scatter, Scatter, Scatterv - Initial support for non-blocking GPU collectives on NVIDIA and AMD GPUs - Iallgather, Iallgatherv, Iallreduce, Ialltoall, Ialltoallv, Ibcast, Igather, Igatherv, Ireduce, Ireduce_scatter, Iscatter, Iscatterv - Enhanced support for blocking GPU to GPU point-to-point operations on NVIDIA and AMD GPUs - Send, Recv - NVIDIA GDRCopy, AMD LargeBar support - CUDA and ROCM IPC support - Alpha support for non-blocking GPU to GPU point-to-point operations on NVIDIA and AMD GPUs - Isend, Irecv - Tested with - Various HPC applications, mini-applications, and benchmarks - MPI4cuML (a custom cuML package with MPI support) - Tested with CUDA <= 11.8 and CUDA 12.0 - Tested with ROCM <= 5.6.0 For downloading MVAPICH-Plus 3.0b library and associated user guide, please visit the following URL: http://mvapich.cse.ohio-state.edu All questions, feedback, bug reports, hints for performance tuning, patches, and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu). Thanks, The MVAPICH Team PS: We are also happy to inform that the number of organizations using MVAPICH2 libraries (and registered at the MVAPICH site) has crossed 3,325 worldwide (in 90 countries). The number of downloads from the MVAPICH site has crossed 1,732,000 (1.73 million). The MVAPICH team would like to thank all its users and organizations!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From panda at cse.ohio-state.edu Wed Nov 8 11:56:32 2023 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Wed, 8 Nov 2023 16:56:32 +0000 Subject: [Mvapich] Join the MVAPICH team for multiple events at SC '23 Message-ID: The MVAPICH team members will be participating in multiple events during Supercomputing '23 conference. The Ohio State University (OSU) booth (#1680) will also feature leading speakers from academia (Case Western Reserve University, KAUST-Saudi Arabia, and Univ. of Oregon), national laboratories/centers (ETRI-South Korea, Idaho National Lab, Ohio Supercomputer Center, and San Diego Supercomputer Center), and industry (Broadcom, C-DAC-India, Dell, ParaTools, and X-ScaleSolutions!! Join us for these events and talk in person with the project team members and the invited speakers!! More details of the events are provided at: http://mvapich.cse.ohio-state.edu/conference/964/talks/ Alternatively, you can use the attached QR code to view the event details. Pick-up a free T-shirt at the OSU Booth after attending the events! Thanks, The MVAPICH Team -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: sc23-qr-code.png Type: image/png Size: 2075 bytes Desc: sc23-qr-code.png URL: From panda at cse.ohio-state.edu Thu Nov 9 14:10:10 2023 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Thu, 9 Nov 2023 19:10:10 +0000 Subject: [Mvapich] Announcing the release of MPI4DL 0.6 Message-ID: The High-Performance Deep Learning (HiDL) team is pleased to announce the release of MPI4DL 0.6, which is a distributed and accelerated training framework for very high-resolution images that integrates Spatial Parallelism, Layer Parallelism, Bidirectional Parallelism, and Pipeline Parallelism with support for the MVAPICH2-GDR high-performance CUDA-aware communication backend. This library allows MPI-driven converged software infrastructure to extract maximum performance and scalability for AI, Big Data and Data Science applications and workflows on modern heterogeneous clusters consisting of diverse CPUs, GPUs, and Interconnects (InfiniBand, ROCE, Omni-Path, iWARP, and SlingShot). The new features available with this release of the MPI4DL package are as follows: * MPI4DL 0.6: * Based on PyTorch * (NEW) Support for training very high-resolution images * Distributed training support for: * Layer Parallelism (LP) * Pipeline Parallelism (PP) * Spatial Parallelism (SP) * Spatial and Layer Parallelism (SP+LP) * Spatial and Pipeline Parallelism (SP+PP) * (NEW) Bidirectional and Layer Parallelism (GEMS+LP) * (NEW) Bidirectional and Pipeline Parallelism (GEMS+PP) * (NEW) Spatial, Bidirectional and Layer Parallelism (SP+GEMS+LP) * (NEW) Spatial, Bidirectional and Pipeline Parallelism (SP+GEMS+PP) * (NEW) Support for AmoebaNet and ResNet models * (NEW) Support for different image sizes and custom datasets * Exploits collective features of MVAPICH2-GDR * Compatible with * NVIDIA GPU A100 and V100 * CUDA [11.6, 11.7] * Python >= 3.8 * PyTorch [1.12.1 , 1.13.1] * MVAPICH2-GDR = 2.3.7 The MPI4DL package is open-source, and hosted at the following URL: https://github.com/OSU-Nowlab/MPI4DL. For associated release information, please visit the following URL: http://hidl.cse.ohio-state.edu Sample performance numbers for MPI4DL using deep learning application benchmarks can be viewed by visiting the `Performance' tab of the above website. All questions, feedback, and bug reports are welcome. Please post to hidl-discuss at lists.osu.edu. Thanks, The High-Performance Deep Learning (HiDL) Team http://hidl.cse.ohio-state.edu PS: The number of organizations using the HiDL stacks has crossed 88 (from 21 countries). The HiDL team would like to thank all its users and organizations!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From panda at cse.ohio-state.edu Thu Nov 9 17:42:39 2023 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Thu, 9 Nov 2023 22:42:39 +0000 Subject: [Mvapich] Announcing the release of ParaInfer-X v1.0 for High-Performance Parallel Inference Message-ID: The High-Performance Deep Learning (HiDL) team is pleased to announce the release of ParaInfer-X v1.0, which is a collection of parallel inference techniques that can facilitate the deployment of emerging AI models on edge devices and HPC clusters. This package leverages highly performant GPU kernels that maximize computational throughput, intelligent scheduling strategies that ensure optimal load balancing across resources, and sophisticated distributed communication libraries that facilitate large-scale inference by enabling seamless data exchange and coordination among distributed systems. ParaInfer-X v1.0 proposes a temporal fusion framework, named Flover, to smartly batch multiple requests during LLM generation, which is also known as temporal fusion/in-flight batching. The new features available with this release of the ParaInfer-X package are as follows: * Based on Faster Transformer * (NEW) Support for inference of various large language models: * (NEW) GPT-J 6B * (NEW) LlaMA 7B * (NEW) LlaMA 13B * (NEW) LlaMA 33B * (NEW) LlaMA 65B * (NEW) Support for persistent model inference stream * (NEW) Support for temporal fusion/in-flight batching of multiple requests * (NEW) Support for multiple GPU tensor parallelism * (NEW) Support for asynchronous memory reordering for evicting finished requests * (NEW) Support for float32, float16, bfloat16 for model inference * Compatible with * (NEW) NVIDIA GPU A100 and V100 * (NEW) CUDA [11.2, 11.3, 11.4, 11.6] * (NEW) GCC >= 8.5.0 * (NEW) CMAKE >= 3.18 * (NEW) Intel oneTBB >= v2020.0 * (NEW) Customized CUDA kernels * (NEW) Support for visualization output of inference progress The ParaInfer-X package is open-source, and hosted at the following URL: https://github.com/OSU-Nowlab/Flover For associated release information, please visit the following URL: http://hidl.cse.ohio-state.edu Sample performance numbers for ParaInfer-X using inference benchmarks can be viewed by visiting the `Performance' tab of the above website. All questions, feedback, and bug reports are welcome. Please post to hidl-discuss at lists.osu.edu. Thanks, The High-Performance Deep Learning (HiDL) Team http://hidl.cse.ohio-state.edu PS: The number of organizations using the HiDL stacks has crossed 88 (from 21 countries). The HiDL team would like to thank all its users and organizations!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From subramoni.1 at osu.edu Thu Nov 9 21:35:51 2023 From: subramoni.1 at osu.edu (Subramoni, Hari) Date: Fri, 10 Nov 2023 02:35:51 +0000 Subject: [Mvapich] Announcing the release of MVAPICH 3.0rc Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH 3.0rc. The major features and enhancements available in MVAPICH 3.0rc are as follows: * Features and Enhancements (since 2.3.7): - Name changed from MVAPICH2 to MVAPICH - Based on MPICH 3.4.3 - Added support for the ch4:ucx and ch4:ofi devices * Support for MVAPICH enhanced collectives over OFI and UCX - Enhanced MVAPICH-based designs for intra-node communication operations * Supports using shared-memory/kernel-based protocols for different message sizes * Supports using UCX/OFI for inter-node and MVAPICH-based shared-memory for intra-node communication operations - Added support for the Cray Slingshot 11 interconnect over OFI - Supports Cray Slingshot 11 network adapters - Added support for the Cornelis OPX library over OFI - Supports Intel Omni-Path adapters - Added support for the Intel PSM3 library over OFI - Supports Intel Columbiaville network adapters - Added support for IB verbs over UCX - Supports IB and RoCE network adapters - Unified MVAPICH environment variables with the MPICH CVAR interface - Disabled ch3:mrail device - Removed ch3:psm device - Re-implemented MPI_T PVAR support * Bug Fixes (since 2.3.7): - Fixed error in Slurm startup using PMI1 - Thanks to Christof Kohler @Universitaet Bremen for the report - Fixed segfault in bespoke CPU mapping - Thanks to Brian Smith and James Erwin @Cornelis for the report For downloading MVAPICH 3.0rc library and associated user guide, please visit the following URL: http://mvapich.cse.ohio-state.edu All questions, feedback, bug reports, hints for performance tuning, patches, and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu). Thanks, The MVAPICH Team PS: We are also happy to inform that the number of organizations using MVAPICH2 libraries (and registered at the MVAPICH site) has crossed 3,325 worldwide (in 90 countries). The number of downloads from the MVAPICH site has crossed 1,735,000 (1.74 million). The MVAPICH team would like to thank all its users and organizations!! -------------- next part -------------- An HTML attachment was scrubbed... URL: