[Mvapich-discuss] mvapich2-3.0a and slingshot 11 - performance questions

Shineman, Nat shineman.5 at osu.edu
Tue May 9 08:10:55 EDT 2023


Hi Ben,

Looks like you are using the OFI sockets provider instead of Cray's CXI provider. We are still working on making sure that provider detection works in such a way that MVAPICH will automatically detect the best performing provider on a given system. For now though, please use your chosen launcher to set the environment variable MPIR_CVAR_OFI_USE_PROVIDER=cxi​ for your run. For additional safety, you can also set FI_PROVIDER=cxi​ to ensure that libfabrics itself will only support the cxi provider. The former is required, the latter is optional but can provide a nominal improvement in startup time.

Please let me know if you have any issues.

Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Ben Kirk via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Wednesday, May 3, 2023 15:30
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] mvapich2-3.0a and slingshot 11 - performance questions

Hi, I've recently installed mvapich2-3. 0a with the hopes of testing slingshot 11 support. When ./configuring with the recommended  --with-device=ch4: ofi --with-libfabric=/opt/cray/libfabric/1. 15. 2. 0 I am able to build the library, but find
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06ipq8rthBdxFCKveDmLblDOoEjery30SgspyufX-nPH7jTI-7DpDF4NjNf7L7ect23Y2JU83EUbKH6-W8PaMzb7J_wyiXUHuzc3gsU5VYN690MA5_DYcoTewBr3u7W_Sg$>
Report Suspicious

ZjQcmQRYFpfptBannerEnd

Hi, I've recently installed mvapich2-3.0a with the hopes of testing slingshot 11 support.


When ./configuring with the recommended


--with-device=ch4:ofi --with-libfabric=/opt/cray/libfabric/1.15.2.0<https://urldefense.com/v3/__http://1.15.2.0__;!!KGKeukY!yxPNJFqy5IE80I2_l0aHGTnWM9bVDokVT9ludhEL6Qkjh4leZnAVpFkvvSvA_9u5drvGqr4pPhttW1oOoLyqX4Q_Fwk$>


I am able to build the library, but find even simple pt2pt performance exceptionally slow.  Comparing intra (inside 1 node) to inter (across 2 nodes):


#********* Intra-Node-CPU (Bare Metal) *****************

#/glade/work/benkirk/codes/dev_stack/install/osu-micro-benchmarks-6.2-mvapich2-3.0a-cray_libfabric-derecho-gcc-12.2.0/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw

# OSU MPI Bandwidth Test v6.2

# Size      Bandwidth (MB/s)

1                       2.90

2                       6.10

4                      11.44

8                      24.28

16                     46.73

32                     93.20

64                    194.77

128                   346.13

256                   759.24

512                  1389.39

1024                 2655.62

2048                 5258.23

4096                 9112.13

8192                12012.65

16384               11235.18

32768               10769.48

65536               19743.95

131072              29157.27

262144              27715.51

524288              27497.12

1048576             13662.50

2097152             10807.83

4194304             10830.14


#********* Inter-Node-CPU (Bare Metal) *****************

#/glade/work/benkirk/codes/dev_stack/install/osu-micro-benchmarks-6.2-mvapich2-3.0a-cray_libfabric-derecho-gcc-12.2.0/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw

# OSU MPI Bandwidth Test v6.2

# Size      Bandwidth (MB/s)

1                       0.30

2                       0.60

4                       1.20

8                       2.40

16                      4.78

32                      9.52

64                     17.50

128                    34.09

256                    54.02

512                   101.24

1024                  196.80

2048                  195.56

4096                  262.10

8192                  316.01

16384                 291.34

32768                 289.64

65536                 290.30

131072                260.94

262144                298.70

524288                342.67

1048576               295.31

2097152               351.07

4194304               334.36


Are there any debugging variables or other tricks I should be aware of?


Thanks!!

--

Ben Kirk

NCAR Computational & Information Systems Laboratory

Consulting Services Group Head

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230509/0bc826c7/attachment-0006.html>


More information about the Mvapich-discuss mailing list