From panda at cse.ohio-state.edu Sat Mar 9 14:54:14 2024 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Sat, 9 Mar 2024 19:54:14 +0000 Subject: [Mvapich] Announcing the release of MVAPICH-Plus 3.0 GA Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH-Plus 3.0 GA. The new MVAPICH-Plus series is an advanced version of the MVAPICH MPI library. It is targeted to support unified MVAPICH2-GDR and MVAPICH2-X features. It is also targeted to provide optimized support for modern platforms (CPU, GPU, and interconnects) for HPC, Deep Learning, Machine Learning, Big Data and Data Science applications. The major features and enhancements available in MVAPICH-Plus 3.0 GA are as follows: - Based on MVAPICH 3.0 - Support for various high-performance communication fabrics - InfiniBand, Slingshot-10/11, Omni-Path, OPX, RoCE, and Ethernet - Support naive CPU staging approach for collectives for small messages - Tune naive limits for the following systems - Frontier at OLCF, Pitzer at OSC, Owens at OSC, Ascend at OSC, Frontera at TACC, Lonestar6 at TACC, ThetaGPU at ALCF, Polaris at ALCF, Tioga at LLNL - Initial support for blocking collectives on NVIDIA and AMD GPUs - Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather, Gatherv, Reduce, Reduce_scatter, Scatter, Scatterv, Reduce_local, Reduce_scatter_block - Initial support for non-blocking GPU collectives on NVIDIA and AMD GPUs - Iallgather, Iallgatherv, Iallreduce, Ialltoall, Ialltoallv, Ibcast, Igather, Igatherv, Ireduce, Ireduce_scatter, Iscatter, Iscatterv - Enhanced collective and pt2pt tuning for NVIDIA Grace-Hopper systems - Enhanced collective tuning for NVIDIA V100, A100, H100 GPUs - Enhanced collective tuning for AMD MI100, and MI250x GPUs - Enhanced support for blocking and non-blocking GPU to GPU point-to-point operations on NVIDIA and AMD GPUs taking advantage of: - NVIDIA GDRCopy, AMD LargeBar support - CUDA and ROCM IPC support - Enhanced CPU tuning on various HPC systems and architectures - Stampede3 at TACC, Frontier at OLCF, Lonestar6 at TACC - AMD Rome, AMD Millan, Intel Sapphire Rapids - Tested with - Various HPC applications, mini-applications, and benchmarks - HiDL, MPI4DL, and MCR-DL packages for MPI-driven distributed training - MPI4cuML (a custom cuML package with MPI support) for scalable machine learning - Tested with CUDA <= 12.3 - Tested with ROCM <= 5.6.0 For downloading MVAPICH-Plus 3.0 GA library and associated user guide, please visit the following URL: http://mvapich.cse.ohio-state.edu All questions, feedback, bug reports, hints for performance tuning, patches, and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu). Thanks, The MVAPICH Team PS: We are also happy to inform that the number of organizations using MVAPICH libraries (and registered at the MVAPICH site) has crossed 3,375 worldwide (in 91 countries). The number of downloads from the MVAPICH site has crossed 1,765,000 (1.765 million). The MVAPICH team would like to thank all its users and organizations!! From subramoni.1 at osu.edu Mon Mar 11 11:07:00 2024 From: subramoni.1 at osu.edu (Subramoni, Hari) Date: Mon, 11 Mar 2024 15:07:00 +0000 Subject: [Mvapich] Announcing the Release of OSU InfiniBand Analysis and Monitoring (INAM) Tool v1.1 Message-ID: The MVAPICH team is pleased to announce the release of OSU InfiniBand Network Analysis and Monitoring (INAM) Tool v1.1. OSU INAM monitors InfiniBand clusters in real time by querying various subnet management entities in the network. It is also capable of interacting with the MVAPICH2-X software stack to gain insights into the communication pattern of the application and classify the data transferred into Point-to-Point, Collective and Remote Memory Access (RMA). OSU INAM can also remotely monitor several parameters of MPI processes in conjunction with MVAPICH2-X. OSU INAM v1.1 (03/11/2024) * Major Features & Enhancements (since 1.0): - Support for ClickHouse Database to support real-time querying and visualization of very large HPC clusters (20,000+ nodes) - Support for up to 64 parallel insertions for multiple sources of profiling data - Support for up to 64 concurrent users to access OSU INAM with sub-second latency by using ClickHouse - Improved stability of OSU INAM operation - Reduced disk space by using ClickHouse - Change Default Bulk Insertion Size based on Database used to improve real-time view of network traffic - Extending notifications to support multiple criteria * Bug fixes - Fix issues loading certain switch nickname files - Fix a bug for showing link level information for live jobs For downloading OSU INAM v0.9.8 and associated user guide, please visit the following URL: http://mvapich.cse.ohio-state.edu All questions, feedback, bug reports, hints for performance tuning, and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu). Thanks, The MVAPICH Team PS: We are also happy to inform that the number of organizations using MVAPICH libraries (and registered at the MVAPICH site) has crossed 3,375 worldwide (in 91 countries). The number of downloads from the MVAPICH site has crossed 1,765,000 (1.765 million). The MVAPICH team would like to thank all its users and organizations!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From panda at cse.ohio-state.edu Mon Mar 11 18:41:45 2024 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar K.) Date: Mon, 11 Mar 2024 22:41:45 +0000 Subject: [Mvapich] Save the Dates for MUG '24 Conference Message-ID: We are happy to indicate that the 12th annual MVAPICH User Group (MUG) conference will take place in Columbus, OH, USA during August 19-21, 2024. It will be an in-person event with an option for remote attendance. Please save the dates and stay tuned for future announcements!! More details on the conference are available from http://mug.mvapich.cse.ohio-state.edu/ Thanks, The MUG '24 Organizers PS: Interested in getting announcements related to the MUG events? Please subscribe to the MUG Conference Mailing list (available from the MUG conference page).