[Mvapich-discuss] MVAPICH2 GDR from source code?
John Moore
john at flexcompute.com
Wed Jan 12 11:21:05 EST 2022
Great, thank you.
On Wed, Jan 12, 2022 at 11:20 AM Shineman, Nat <shineman.5 at osu.edu> wrote:
> John,
>
> Thanks, we will get started on generating this RPM shortly.
>
> Nat
> ------------------------------
> *From:* John Moore <john at flexcompute.com>
> *Sent:* Wednesday, January 12, 2022 11:19
> *To:* Shineman, Nat <shineman.5 at osu.edu>
> *Cc:* Panda, Dhabaleswar <panda at cse.ohio-state.edu>; Maitham Alhubail <
> maitham at flexcompute.com>; mvapich-discuss at lists.osu.edu <
> mvapich-discuss at lists.osu.edu>
> *Subject:* Re: [Mvapich-discuss] MVAPICH2 GDR from source code?
>
> Hi Nat,
>
> we are using: MLNX_OFED_LINUX-5.5-1.0.3.2-ubuntu20.04-x86_64
>
> Thanks,
> John
>
> On Wed, Jan 12, 2022 at 11:16 AM Shineman, Nat <shineman.5 at osu.edu> wrote:
>
> Hi John,
>
> Can you tell us the ofed version on your system?
>
> Thanks,
> Nat
> ------------------------------
> *From:* John Moore <john at flexcompute.com>
> *Sent:* Wednesday, January 12, 2022 11:14
> *To:* Shineman, Nat <shineman.5 at osu.edu>
> *Cc:* Panda, Dhabaleswar <panda at cse.ohio-state.edu>; Maitham Alhubail <
> maitham at flexcompute.com>; mvapich-discuss at lists.osu.edu <
> mvapich-discuss at lists.osu.edu>
> *Subject:* Re: [Mvapich-discuss] MVAPICH2 GDR from source code?
>
> HI Nat,
>
> We have been struggling to get the RPM to work for us -- we've been
> working on it for about a week. We are using this RPM:
>
> http://mvapich.cse.ohio-state.edu/download/mvapich/gdr/2.3.6/mofed5.4/mvapich2-gdr-cuda11.3.mofed5.4.gnu8.4.1-2.3.6-1.el8.x86_64.rpm
>
> If you could build us a custom RPM for our system, that would be very
> helpful.
>
> We're running Ubuntu 20.04 kernel 5.4.0-92-generic
>
> GCC version is: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
>
> CUDA version is CUDA 11.4
> CUDA driver: 470.82.01
>
> Please let me know if there is any other information that you need.
>
> Thanks,
> John
>
>
> On Wed, Jan 12, 2022 at 9:26 AM Shineman, Nat <shineman.5 at osu.edu> wrote:
>
> Hi John,
>
> You should be able to use the RPMs on Ubuntu by converting them with
> alien. Regarding the CUDA and compiler versioning, you will want to make
> sure CUDA is an exact match, but the compiler should only need to be the
> same major version. You will also want to make sure that you match the
> mofed major version as well, though we recommend matching the exact version
> if possible. Please take a look at the download page and see if any of the
> RPMs there match your needs. Otherwise, we would be happy to generate a
> custom RPM based on your system specifications.
>
> Thanks,
> Nat
> ------------------------------
> *From:* Mvapich-discuss <mvapich-discuss-bounces+shineman.5=
> osu.edu at lists.osu.edu> on behalf of John Moore via Mvapich-discuss <
> mvapich-discuss at lists.osu.edu>
> *Sent:* Tuesday, January 11, 2022 14:58
> *To:* Panda, Dhabaleswar <panda at cse.ohio-state.edu>
> *Cc:* Maitham Alhubail <maitham at flexcompute.com>;
> mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
> *Subject:* Re: [Mvapich-discuss] MVAPICH2 GDR from source code?
>
> Hi DK,
>
> Do the CUDA and GCC versions on our system need to match the RPM exactly?
> We are running on Ubuntu, and there is no GCC 8.4.1 on Ubuntu.
>
> Thank you,
> John
>
> On Tue, Jan 11, 2022 at 2:55 PM Panda, Dhabaleswar <
> panda at cse.ohio-state.edu> wrote:
>
> Hi,
>
> Thanks for your note. For GPU support with MVAPICH2, it is strongly
> recommended to use the MVAPICH2-GDR package. This package supports many
> features related to GPUs and delivers the best performance and scalability
> on GPU clusters. Please use a suitable RPM package from the MVAPICH2-GDR
> download page for your system. Please refer to the corresponding user guide
> also. The MVAPICH2-GDR package can also be installed through Spack. Let us
> know if you experience any issues in using the MVAPICH2-GDR package on your
> GPU cluster.
>
> Thanks,
>
> DK
>
>
> ________________________________________
> From: Mvapich-discuss <mvapich-discuss-bounces+panda.2=
> osu.edu at lists.osu.edu> on behalf of John Moore via Mvapich-discuss <
> mvapich-discuss at lists.osu.edu>
> Sent: Tuesday, January 11, 2022 2:48 PM
> To: mvapich-discuss at lists.osu.edu
> Cc: Maitham Alhubail
> Subject: [Mvapich-discuss] MVAPICH2 GDR from source code?
>
> Hello,
>
> We have been struggling to get MVAPICH2 to work with cuda-aware support
> and RDMA. We have compiled MVAPICH2 from source, with the --enable-cuda
> option, but when we run the osu_bibw bandwidth test using Device to Device
> communication, we get a segmentation fault.
>
> Below is the output from osu_bibw using MVAPICH2:
> MVAPICH2-2.3.6 Parameters
> ---------------------------------------------------------------------
> PROCESSOR ARCH NAME : MV2_ARCH_AMD_EPYC_7401_48
> PROCESSOR FAMILY NAME : MV2_CPU_FAMILY_AMD
> PROCESSOR MODEL NUMBER : 1
> HCA NAME : MV2_HCA_MLX_CX_HDR
> HETEROGENEOUS HCA : NO
> MV2_EAGERSIZE_1SC : 0
> MV2_SMP_EAGERSIZE : 16385
> MV2_SMP_QUEUE_LENGTH : 65536
> MV2_SMP_NUM_SEND_BUFFER : 16
> MV2_SMP_BATCH_SIZE : 8
> Tuning Table: : MV2_ARCH_AMD_EPYC_7401_48
> MV2_HCA_MLX_CX_HDR
> ---------------------------------------------------------------------
> # OSU MPI-CUDA Bi-Directional Bandwidth Test v5.7.1
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size Bandwidth (MB/s)
> 1 0.07
> 2 0.15
> 4 0.29
> 8 0.57
> 16 1.12
> 32 2.30
> 64 4.75
> 128 9.41
> 256 18.44
> 512 37.22
> 1024 74.82
> 2048 144.70
> 4096 289.96
> 8192 577.33
> [cell3:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
> (signal 11)
> [cell3:mpi_rank_1][error_sighandler] Caught error: Segmentation fault
> (signal 11)
>
>
> ===================================================================================
> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> = PID 471850 RUNNING AT cell3
> = EXIT CODE: 139
> = CLEANING UP REMAINING PROCESSES
> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> And this is with OpenMPI:
> # OSU MPI-CUDA Bi-Directional Bandwidth Test v5.8
> # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D)
> # Size Bandwidth (MB/s)
> 1 0.43
> 2 0.83
> 4 1.68
> 8 3.37
> 16 6.72
> 32 13.42
> 64 27.02
> 128 53.78
> 256 107.88
> 512 219.45
> 1024 437.81
> 2048 875.12
> 4096 1747.23
> 8192 3528.97
> 16384 7015.15
> 32768 13973.59
> 65536 27702.68
> 131072 51877.67
> 262144 94556.99
> 524288 157755.18
> 1048576 236772.67
> 2097152 333635.13
> 4194304 408865.93
>
>
> Can GDR support be obtained by compiling from source like we are trying to
> do or do we have to use an RPM? We export MV2_USE_CUDA=1. Any
> recommendations would be greatly appreciated.
>
> Thanks,
> John
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20220112/8b483d9b/attachment-0022.html>
More information about the Mvapich-discuss
mailing list