[Mvapich-discuss] mvapich2-3.0a and slingshot 11 - performance questions
Shineman, Nat
shineman.5 at osu.edu
Tue May 9 08:10:55 EDT 2023
Hi Ben,
Looks like you are using the OFI sockets provider instead of Cray's CXI provider. We are still working on making sure that provider detection works in such a way that MVAPICH will automatically detect the best performing provider on a given system. For now though, please use your chosen launcher to set the environment variable MPIR_CVAR_OFI_USE_PROVIDER=cxi for your run. For additional safety, you can also set FI_PROVIDER=cxi to ensure that libfabrics itself will only support the cxi provider. The former is required, the latter is optional but can provide a nominal improvement in startup time.
Please let me know if you have any issues.
Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Ben Kirk via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Wednesday, May 3, 2023 15:30
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] mvapich2-3.0a and slingshot 11 - performance questions
Hi, I've recently installed mvapich2-3. 0a with the hopes of testing slingshot 11 support. When ./configuring with the recommended --with-device=ch4: ofi --with-libfabric=/opt/cray/libfabric/1. 15. 2. 0 I am able to build the library, but find
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06ipq8rthBdxFCKveDmLblDOoEjery30SgspyufX-nPH7jTI-7DpDF4NjNf7L7ect23Y2JU83EUbKH6-W8PaMzb7J_wyiXUHuzc3gsU5VYN690MA5_DYcoTewBr3u7W_Sg$>
Report Suspicious
ZjQcmQRYFpfptBannerEnd
Hi, I've recently installed mvapich2-3.0a with the hopes of testing slingshot 11 support.
When ./configuring with the recommended
--with-device=ch4:ofi --with-libfabric=/opt/cray/libfabric/1.15.2.0<https://urldefense.com/v3/__http://1.15.2.0__;!!KGKeukY!yxPNJFqy5IE80I2_l0aHGTnWM9bVDokVT9ludhEL6Qkjh4leZnAVpFkvvSvA_9u5drvGqr4pPhttW1oOoLyqX4Q_Fwk$>
I am able to build the library, but find even simple pt2pt performance exceptionally slow. Comparing intra (inside 1 node) to inter (across 2 nodes):
#********* Intra-Node-CPU (Bare Metal) *****************
#/glade/work/benkirk/codes/dev_stack/install/osu-micro-benchmarks-6.2-mvapich2-3.0a-cray_libfabric-derecho-gcc-12.2.0/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw
# OSU MPI Bandwidth Test v6.2
# Size Bandwidth (MB/s)
1 2.90
2 6.10
4 11.44
8 24.28
16 46.73
32 93.20
64 194.77
128 346.13
256 759.24
512 1389.39
1024 2655.62
2048 5258.23
4096 9112.13
8192 12012.65
16384 11235.18
32768 10769.48
65536 19743.95
131072 29157.27
262144 27715.51
524288 27497.12
1048576 13662.50
2097152 10807.83
4194304 10830.14
#********* Inter-Node-CPU (Bare Metal) *****************
#/glade/work/benkirk/codes/dev_stack/install/osu-micro-benchmarks-6.2-mvapich2-3.0a-cray_libfabric-derecho-gcc-12.2.0/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw
# OSU MPI Bandwidth Test v6.2
# Size Bandwidth (MB/s)
1 0.30
2 0.60
4 1.20
8 2.40
16 4.78
32 9.52
64 17.50
128 34.09
256 54.02
512 101.24
1024 196.80
2048 195.56
4096 262.10
8192 316.01
16384 291.34
32768 289.64
65536 290.30
131072 260.94
262144 298.70
524288 342.67
1048576 295.31
2097152 351.07
4194304 334.36
Are there any debugging variables or other tricks I should be aware of?
Thanks!!
--
Ben Kirk
NCAR Computational & Information Systems Laboratory
Consulting Services Group Head
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20230509/0bc826c7/attachment-0006.html>
More information about the Mvapich-discuss
mailing list