[mvapich-discuss] Question on the new APIs for checking CUDA support

Wed Jul 1 12:35:19 EDT 2020

Hello Dhabaleswar and the MVAPICH team,

Thank you for making the new release! Regarding the new feature (Added compilation and runtime methods for checking CUDA support), I’d like to know how I can use them. Is it documented somewhere? I couldn’t find it…

Thanks.

Sincerely,
Leo

---
Yao-Lung Leo Fang
Assistant Computational Scientist
Computational Science Initiative
Brookhaven National Laboratory
Bldg. 725, Room 2-169
P.O. Box 5000, Upton, NY 11973-5000
Office: (631) 344-3265
Email: leofang at bnl.gov<mailto:leofang at bnl.gov>
Website: https://urldefense.com/v3/__https://leofang.github.io/__;!!KGKeukY!hjlGEq0dhHxv8M7WkIJp1zDjIDIKX84iwc-z8TbGgNB8buu37KYikc3QpJlvHtLcq9ZUSdqpqdSI2461WCO5$ 

Panda, Dhabaleswar <panda at cse.ohio-state.edu<mailto:panda at cse.ohio-state.edu>> 於 2020年6月4日 下午10:32 寫道：

The MVAPICH team is pleased to announce the release of MVAPICH2-GDR
2.3.4 GA.

MVAPICH2-GDR 2.3.4 is based on the standard MVAPICH2 2.3.4 release and
incorporates designs that take advantage of the GPUDirect RDMA (GDR)
technology for inter-node data movement on NVIDIA GPUs clusters with
Mellanox InfiniBand interconnect. It also provides support for DGX-2,
OpenPOWER and NVLink2, GDRCopyv2, efficient intra-node CUDA-Aware
unified memory communication and support for RDMA_CM, RoCE-V1, and
RoCE-V2. Further, MVAPICH2-GDR 2.3.4 provides optimized large message
collectives (broadcast, reduce, and allreduce) for emerging Deep
Learning and Streaming frameworks.

Features, Enhancements, and Bug Fixes for MVAPICH2-GDR 2.3.4 GA are
listed here.

* Features and Enhancements (Since 2.3.3)

   - Based on MVAPICH2 2.3.4
   - Enhanced MPI_Allreduce performance on DGX-2 systems
   - Enhanced MPI_Allreduce performance on POWER9 systems
   - Reduced the CUDA interception overhead for non-CUDA symbols
   - Enhanced performance for point-to-point and collective operations on
     Frontera's RTX nodes
   - Add new runtime variable  'MV2_SUPPORT_DL' to replace
     'MV2_SUPPORT_TENSOR_FLOW'
   - Added compilation and runtime methods for checking CUDA support
   - Enhanced GDR output for runtime variable MV2_SHOW_ENV_INFO
   - Tested with Horovod and common DL Frameworks (TensorFlow, PyTorch, and
     MXNet)
   - Tested with PyTorch Distributed

Bug Fixes (Since 2.3.3)

   - Fix hang caused by the use of multiple communicators
   - Fix detection of Intel CPU Model name
   - Fix intermediate buffer size for Allreduce when DL workload is expected
   - Fix the random hangs in IMB4-RMA tests
   - Fix hang in OMP offloading
   - Fix hang with -w dynamic option when using one-sided benchmarks for
     device buffers
   - Add proper fallback and warning message when shared RMA window cannot be
     created
   - Fix potential FP exception error in MPI_Allreduce
     - Thanks to Shinichiro Takizawa at AIST for the report
   - Fix data validation issue of MPI_Allreduce
     - Thanks to Andreas Herten at JSC for the report
   - Fix the need for preloading libmpi.so
     - Thanks to Andreas Herten at JSC for the feedback
   - Fix compilation warnings and memory leaks

Further, MVAPICH2-GDR 2.3.4 GA provides support on GPU-Cluster using
regular OFED (without GPUDirect RDMA).

MVAPICH2-GDR 2.3.4 GA continues to deliver excellent performance. It
provides inter-node Device-to-Device latency of 1.85 microseconds (8
bytes) with CUDA 10.1 and Volta GPUs. On OpenPOWER platforms with
NVLink2, it delivers up to 70.4 GBps unidirectional intra-node
Device-to-Device bandwidth for large messages. On DGX-2 platforms, it
delivers up to 144.79 GBps unidirectional intra-node Device-to-Device
bandwidth for large messages. More performance numbers are available
from the MVAPICH website (under Performance link).

For downloading MVAPICH2-GDR 2.3.4 GA and associated user guides,
quick start guide, please visit the following URL:

https://urldefense.com/v3/__http://mvapich.cse.ohio-state.edu__;!!P4SdNyxKAPE!Uj5y-d0_XtgSXPJDi3av5vRCX9yTLATkHcY4b5hSR0-Ney5t0VgXwZRQzWcRepo$

All questions, feedback, bug reports, hints for performance tuning,
patches and enhancements are welcome. Please post it to the
mvapich-discuss mailing list (mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>).

Thanks,

The MVAPICH Team

PS: We are also happy to inform that the number of organizations using
MVAPICH2 libraries (and registered at the MVAPICH site) has crossed
3,075 worldwide (in 89 countries). The number of downloads from the
MVAPICH site has crossed 760,000 (0.76 million).  The MVAPICH team
would like to thank all its users and organizations!!

_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
https://urldefense.com/v3/__http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss__;!!P4SdNyxKAPE!Uj5y-d0_XtgSXPJDi3av5vRCX9yTLATkHcY4b5hSR0-Ney5t0VgXwZRQFVA1aJg$

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200701/24a5b076/attachment-0001.html>