[Mvapich-discuss] MVAPICH2 GDR from source code?
Shineman, Nat
shineman.5 at osu.edu
Wed Jan 12 11:16:45 EST 2022
Hi John,
Can you tell us the ofed version on your system?
Thanks,
Nat
________________________________
From: John Moore <john at flexcompute.com>
Sent: Wednesday, January 12, 2022 11:14
To: Shineman, Nat <shineman.5 at osu.edu>
Cc: Panda, Dhabaleswar <panda at cse.ohio-state.edu>; Maitham Alhubail <maitham at flexcompute.com>; mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: Re: [Mvapich-discuss] MVAPICH2 GDR from source code?
HI Nat,
We have been struggling to get the RPM to work for us -- we've been working on it for about a week. We are using this RPM:
http://mvapich.cse.ohio-state.edu/download/mvapich/gdr/2.3.6/mofed5.4/mvapich2-gdr-cuda11.3.mofed5.4.gnu8.4.1-2.3.6-1.el8.x86_64.rpm
If you could build us a custom RPM for our system, that would be very helpful.
We're running Ubuntu 20.04 kernel 5.4.0-92-generic
GCC version is: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
CUDA version is CUDA 11.4
CUDA driver: 470.82.01
Please let me know if there is any other information that you need.
Thanks,
John
On Wed, Jan 12, 2022 at 9:26 AM Shineman, Nat <shineman.5 at osu.edu<mailto:shineman.5 at osu.edu>> wrote:
Hi John,
You should be able to use the RPMs on Ubuntu by converting them with alien. Regarding the CUDA and compiler versioning, you will want to make sure CUDA is an exact match, but the compiler should only need to be the same major version. You will also want to make sure that you match the mofed major version as well, though we recommend matching the exact version if possible. Please take a look at the download page and see if any of the RPMs there match your needs. Otherwise, we would be happy to generate a custom RPM based on your system specifications.
Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces+shineman.5=osu.edu at lists.osu.edu<mailto:osu.edu at lists.osu.edu>> on behalf of John Moore via Mvapich-discuss <mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>>
Sent: Tuesday, January 11, 2022 14:58
To: Panda, Dhabaleswar <panda at cse.ohio-state.edu<mailto:panda at cse.ohio-state.edu>>
Cc: Maitham Alhubail <maitham at flexcompute.com<mailto:maitham at flexcompute.com>>; mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu> <mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>>
Subject: Re: [Mvapich-discuss] MVAPICH2 GDR from source code?
Hi DK,
Do the CUDA and GCC versions on our system need to match the RPM exactly? We are running on Ubuntu, and there is no GCC 8.4.1 on Ubuntu.
Thank you,
John
On Tue, Jan 11, 2022 at 2:55 PM Panda, Dhabaleswar <panda at cse.ohio-state.edu<mailto:panda at cse.ohio-state.edu>> wrote:
Hi,
Thanks for your note. For GPU support with MVAPICH2, it is strongly recommended to use the MVAPICH2-GDR package. This package supports many features related to GPUs and delivers the best performance and scalability on GPU clusters. Please use a suitable RPM package from the MVAPICH2-GDR download page for your system. Please refer to the corresponding user guide also. The MVAPICH2-GDR package can also be installed through Spack. Let us know if you experience any issues in using the MVAPICH2-GDR package on your GPU cluster.
Thanks,
DK
________________________________________
From: Mvapich-discuss <mvapich-discuss-bounces+panda.2=osu.edu at lists.osu.edu<mailto:osu.edu at lists.osu.edu>> on behalf of John Moore via Mvapich-discuss <mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>>
Sent: Tuesday, January 11, 2022 2:48 PM
To: mvapich-discuss at lists.osu.edu<mailto:mvapich-discuss at lists.osu.edu>
Cc: Maitham Alhubail
Subject: [Mvapich-discuss] MVAPICH2 GDR from source code?
Hello,
We have been struggling to get MVAPICH2 to work with cuda-aware support and RDMA. We have compiled MVAPICH2 from source, with the --enable-cuda option, but when we run the osu_bibw bandwidth test using Device to Device communication, we get a segmentation fault.
Below is the output from osu_bibw using MVAPICH2:
MVAPICH2-2.3.6 Parameters
---------------------------------------------------------------------
PROCESSOR ARCH NAME : MV2_ARCH_AMD_EPYC_7401_48
PROCESSOR FAMILY NAME : MV2_CPU_FAMILY_AMD
PROCESSOR MODEL NUMBER : 1
HCA NAME : MV2_HCA_MLX_CX_HDR
HETEROGENEOUS HCA : NO
MV2_EAGERSIZE_1SC : 0
MV2_SMP_EAGERSIZE : 16385
MV2_SMP_QUEUE_LENGTH : 65536
MV2_SMP_NUM_SEND_BUFFER : 16
MV2_SMP_BATCH_SIZE : 8
Tuning Table: : MV2_ARCH_AMD_EPYC_7401_48 MV2_HCA_MLX_CX_HDR
---------------------------------------------------------------------
# OSU MPI-CUDA Bi-Directional Bandwidth Test v5.7.1
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size Bandwidth (MB/s)
1 0.07
2 0.15
4 0.29
8 0.57
16 1.12
32 2.30
64 4.75
128 9.41
256 18.44
512 37.22
1024 74.82
2048 144.70
4096 289.96
8192 577.33
[cell3:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[cell3:mpi_rank_1][error_sighandler] Caught error: Segmentation fault (signal 11)
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 471850 RUNNING AT cell3
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
And this is with OpenMPI:
# OSU MPI-CUDA Bi-Directional Bandwidth Test v5.8
# Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
# Size Bandwidth (MB/s)
1 0.43
2 0.83
4 1.68
8 3.37
16 6.72
32 13.42
64 27.02
128 53.78
256 107.88
512 219.45
1024 437.81
2048 875.12
4096 1747.23
8192 3528.97
16384 7015.15
32768 13973.59
65536 27702.68
131072 51877.67
262144 94556.99
524288 157755.18
1048576 236772.67
2097152 333635.13
4194304 408865.93
Can GDR support be obtained by compiling from source like we are trying to do or do we have to use an RPM? We export MV2_USE_CUDA=1. Any recommendations would be greatly appreciated.
Thanks,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20220112/254a37f2/attachment-0022.html>
More information about the Mvapich-discuss
mailing list