From panda at cse.ohio-state.edu Fri May 13 00:42:23 2022 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Fri, 13 May 2022 04:42:23 +0000 Subject: [Mvapich] Save the dates for the 10th Annual MVAPICH User Group Conference (MUG '22) In-Reply-To: References: Message-ID: We have finalized the dates for the 10th annual MVAPICH User Group Conference (MUG '22). It will be held from August 22nd to 24th. Like the previous years, the first day (Aug 22nd) will be tutorials by sponsors and the main conference will happen on August 23rd and 24th. More details are available from http://mug.mvapich.cse.ohio-state.edu/ At this time, we are planning for the event to be in person with an option for remote attendance. Like the previous years, the in-person event will be held in Columbus, Ohio, USA. We encourage all speakers and many of the attendees to attend the event in person. Please save these dates on your calendar. During the coming weeks, more information about the conference will be provided. Looking forward to your participation in MUG '22!! Thanks, The MVAPICH Team From panda at cse.ohio-state.edu Thu May 19 04:46:36 2022 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Thu, 19 May 2022 08:46:36 +0000 Subject: [Mvapich] Join the MVAPICH team for multiple events at the ISC '22 conference (May 29- June 2, 2022) Message-ID: The MVAPICH team members will be participating in multiple events at the upcoming ISC '22 conference to be held in Hamburg, Germany. Join us for these events and interact with the project team members!! More details of the events are provided at: https://mvapich.cse.ohio-state.edu/conference/896/talks/ Thanks, The MVAPICH Team From panda at cse.ohio-state.edu Sat May 28 07:56:10 2022 From: panda at cse.ohio-state.edu (Panda, Dhabaleswar) Date: Sat, 28 May 2022 11:56:10 +0000 Subject: [Mvapich] Announcing the release of MVAPICH2-GDR 2.3.7 GA Message-ID: The MVAPICH team is pleased to announce the release of MVAPICH2-GDR 2.3.7 GA. The MVAPICH2-GDR 2.3.7 release incorporates several novel features as listed below * Support for 'on-the-fly' compression of point-to-point messages used for GPU to GPU communication for NVIDIA GPUs. * Support for hybrid communication protocols using NCCL-based, CUDA-based, and IB verbs-based primitives for the following MPI blocking and non-blocking collective operations - MPI_Allreduce, MPI_Reduce, MPI_Allgather, MPI_Allgatherv, MPI_Alltoall, MPI_Alltoallv, MPI_Scatter, MPI_Scatterv, MPI_Gather, MPI_Gatherv, and MPI_Bcast. - MPI_Iallreduce, MPI_Ireduce, MPI_Iallgather, MPI_Iallgatherv, MPI_Ialltoall, MPI_Ialltoallv, MPI_Iscatter, MPI_Iscatterv, MPI_Igather, MPI_Igatherv, and MPI_Ibcast * Full support for NVIDIA DGX, NVIDIA DGX V-100, NVIDIA DGX A-100, and AMD systems with Mi100 GPUs. MVAPICH2-GDR 2.3.7 provides optimized support at MPI-level for HPC, deep learning, machine learning, and data science workloads. These include efficient large-message collectives (e.g. Allreduce) on CPUs and GPUs and GPU-Direct algorithms for all collective operations (including those commonly used for model-parallelism, e.g. Allgather and Alltoall). MVAPICH2-GDR 2.3.7 is based on the standard MVAPICH2 2.3.7 release and incorporates designs that take advantage of the GPUDirect RDMA (GDR) on NVIDIA GPUs and ROCmRDMA on AMD GPUs for inter-node data movement on GPU clusters with Mellanox InfiniBand interconnect. It also provides support for DGX-2, OpenPOWER, and NVLink2, GDRCopyv2, efficient intra-node CUDA-Aware unified memory communication and support for RDMA_CM, RoCE-V1, and RoCE-V2. Features, Enhancements, and Bug Fixes for MVAPICH2-GDR 2.3.7 GA are listed here. * Features and Enhancements (Since 2.3.6) - Enhanced performance for GPU-aware MPI_Alltoall and MPI_Alltoallv - Added automatic rebinding of processes to cores based on GPU NUMA domain - This is enabled by setting the env MV2_GPU_AUTO_REBIND=1 - Added NCCL communication substrate for various non-blocking MPI collectives - MPI_Iallreduce, MPI_Ireduce, MPI_Iallgather, MPI_Iallgatherv, MPI_Ialltoall, MPI_Ialltoallv, MPI_Iscatter, MPI_Iscatterv, MPI_Igather, MPI_Igatherv, and MPI_Ibcast - Enhanced point-to-point and collective tuning for AMD Milan processors with NVIDIA A-100 and AMD Mi100 GPUs - Enhanced point-to-point and collective tuning for NVIDIA DGX A-100 systems - Added support for Cray Slingshot-10 interconnect Further, MVAPICH2-GDR 2.3.7 GA provides support for GPU-Cluster using regular OFED (without GPUDirect RDMA). MVAPICH2-GDR 2.3.7 GA continues to deliver excellent performance. It provides inter-node Device-to-Device latency of 1.85 microseconds (8 bytes) with CUDA 10.1 and Volta GPUs. On OpenPOWER platforms with NVLink2, it delivers up to 70.4 GBps unidirectional intra-node Device-to-Device bandwidth for large messages. On DGX-2 platforms, it delivers up to 144.79 GBps unidirectional intra-node Device-to-Device bandwidth for large messages. More performance numbers are available from the MVAPICH website (under Performance->MV2-GDR->CUDA)). For downloading MVAPICH2-GDR 2.3.7 GA and associated user guides, please visit the following URL: http://mvapich.cse.ohio-state.edu All questions, feedback, bug reports, hints for performance tuning, patches, and enhancements are welcome. Please post it to the mvapich-discuss mailing list (mvapich-discuss at lists.osu.edu). Thanks, The MVAPICH Team PS: We are also happy to inform you that the number of organizations using MVAPICH2 libraries (and registered at the MVAPICH site) has crossed 3,200 worldwide (in 89 countries). The number of downloads from the MVAPICH has crossed 1,590,000 (1.59 million). The MVAPICH team would like to thank all its users and organizations!!