[Mvapich-discuss] Very slow startup on mvapich 4.0

Shineman, Nat shineman.5 at osu.edu
Wed Aug 27 11:32:00 EDT 2025


Alex,

Thanks for reporting this. First, as a general rule we suggest not forcing the posix shmmod when building with UCX. UCX supports its own shared memory module and due to conflicts between the UCX and internal shared memory implementations, both MPICH and MVAPICH perform best when allowing the UCX shared memory to operate independently. This is the behavior observed when you do not set --with-ch4-shmmod.

That said, if you need to use the internal posix shmmod for some reason, I have attached a patch. MVAPICH and MVAPICH-Plus support a different eager module than the standard MPICH version that includes some optimizations for weakly ordered architectures. This is why the difference is observed in MVAPICH only. Looks like we were doing an unnecessary memset of the entire shmem region which grows exponentially with the local process count. However, based on my investigation, this is unnecessary.

Please let me know if you experence any issues with the patch. Otherwise we will include it in our next release with proper acknowledgement.

Thanks,
Nat
________________________________
From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of Alex via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Saturday, August 23, 2025 05:06
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Very slow startup on mvapich 4.0

Hi, Recently I've compared mvapich (based on mpich 4. 3. 0 as I recall) and mpich 4. 3. 1 on single node Intel Xeon 6972p (it has mellanox fabric but since it's single node it's not relevant). The application is quite tricky but similar
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/KGKeukY!vYQd06pJjE9AAhTaFS-M1rCxuGsuMAuBgcubeYoCIzfnLzi-eNInMvY0TONXlt2qBcPsx1kQkSd44s2C_U_rvIjthLybFDLcVaaOkMcJ_yXZ8RVfFiAN0NqO-okPEYu4Zsg4pA$>

ZjQcmQRYFpfptBannerEnd
Hi,
Recently I've compared mvapich (based on mpich 4.3.0 as I recall) and mpich 4.3.1 on single node Intel Xeon 6972p (it has mellanox fabric but since it's single node it's not relevant). The application is quite tricky but similar issue is observed in IMB: the more rank you start the longer the delay is (2 ranks start almost instantly). The test was as follows:
1. Both MPIs is configured similarly:
./configure  --prefix=$HOMEINIT/mvapich/4.0x-mt-ucx --enable-silent-rules \
    --with-device=ch4:ucx:shm --with-pm=hydra --enable-romio  --with-ch3-rank-bits=32 --enable-threads=multiple --without-ze --with-file-system=lustre+nfs \
    --enable-shared --with-hwloc=embedded  --with-ucx=embedded --with-libfabric=embedded --enable-fortran=all  --with-ch4-shmmods=posix \
   CC=icx F77=ifx FC=ifx CXX=icpx \
   MPICHLIB_CPPFLAGS="-I$WORKINIT/misc.libs/lustre-release/lustre/include -I$WORKINIT/misc.libs/lustre-release/lustre/include/uapi" \
   MPICHLIB_CFLAGS='-Wno-unused-but-set-variable -Wno-tautological-constant-compare -Wno-initializer-overrides' \
   MPICHLIB_FCFLAGS='-Wno-unused-but-set-variable -Wno-tautological-constant-compare -Wno-initializer-overrides' \
   MPICHLIB_CXXFLAGS='-Wno-unused-but-set-variable -Wno-tautological-constant-compare -Wno-initializer-overrides' \
   2>&1 | tee configure.log
(the only difference is the installation path)
2. Execute the application (mpiexec.hydra -launcher ssh -genvall -bind-to core:1 -np 192 ./app)
3. Review its report.
4. Recompile MVAPICH without --with-ch4-shmmods=posix
5. Repeat MVAPICH test.
So the results are as follows:
1. MPICH 4.3.1
  Initialization time :      4.02 s
  Elapsed time        :     94.39 s
2. MVAPICH
  Initialization time :     55.06 s
  Elapsed time        :    131.06 s
3. MVAPICH with no posix shmem
  Initialization time :      4.03 s
  Elapsed time        :    108.99 s

As you can see MVAPCIH is quite faster on execution stage (numbers are inclusive) but startup ruinis the "picture".

Is there any differences in shmem (apart from having its own MV_SHM or so) and how it can be fixed?
As I said earlier you can observe the same issue on IMB (presumably on all high pppn runs). The only reason  I took this application is because it writes its init phase :).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20250827/418cc1c7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shmem.patch
Type: text/x-patch
Size: 909 bytes
Desc: shmem.patch
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20250827/418cc1c7/attachment.bin>


More information about the Mvapich-discuss mailing list