[Mvapich-discuss] The MVAPICH Group is looking for a talented full-time software developer to join the team

Panda, Dhabaleswar panda at cse.ohio-state.edu
Thu Oct 24 23:57:03 EDT 2024


ZQ,

This warning will appear anytime you allocate a full subscription job with the UCX netmod enabled. It is based on observed behaviour of UCX, which we have seen allocated an extra progress thread for each process that calls ucx_init. I cannot guarantee this will happen in every case, especially if you are using your own UCX version, but we have observed it in most if not all of our testing with UCX. If you are not seeing any performance impacts and can observe no oversubscription, then I would encourage you to disregard the warning and trust what you are observing.

Regarding your second question, the warning is coming from MVAPICH, so that is why it does not appear in OpenMPI. I cannot speak to how OpenMPI handles these additional UCX progress threads or what if anything they might do to notify a user.

Thanks,
Nat
________________________________
From: You, Zhi-Qiang <zyou at osc.edu>
Sent: Thursday, October 31, 2024 12:11
To: Shineman, Nat <shineman.5 at osu.edu>; mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: Re: Handling MVAPICH 3.0 full subscription warning


Hi Nat,



Thank you for the suggestion. I have a few questions:



  1.  Does this message indicate that oversubscription is occurring, or is it simply a warning that appears every time a full-node job is run? In one user’s case, I did not observe any oversubscription, although the warning was present.
  2.  UCX is also the default for OpenMPI, but I did not see a similar warning when running a full-node job with OpenMPI. Why does this only happen with MVAPICH?



Thank you,

ZQ



From: Shineman, Nat <shineman.5 at osu.edu>
Date: Wednesday, October 16, 2024 at 2:16 PM
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>, You, Zhi-Qiang <zyou at osc.edu>
Subject: Re: Handling MVAPICH 3.0 full subscription warning

Hi ZQ,



You are probably seeing degraded performance because you are still running the application at full subscription and requesting that MVAPICH reserve 2 cores per process. The warning should probably more accurately state that you should cap your runs at 1/2 subscription and set the listed environment variable. This would prevent you from oversubscribing cores.



However, if you are seeing satisfactory performance with oversubscribed cores in full subscription, please feel free to ignore the warning.



Thanks,

Nat

________________________________

From: Mvapich-discuss <mvapich-discuss-bounces at lists.osu.edu> on behalf of You, Zhi-Qiang via Mvapich-discuss <mvapich-discuss at lists.osu.edu>
Sent: Wednesday, October 16, 2024 11:50
To: mvapich-discuss at lists.osu.edu <mvapich-discuss at lists.osu.edu>
Subject: [Mvapich-discuss] Handling MVAPICH 3.0 full subscription warning



Hi,



We encountered the following warning message while running a full-node MPI job with MVAPICH 3.0:

[][mvp_generate_implicit_cpu_mapping] WARNING: You appear to be running at full subscription for this job. UCX spawns an additional thread for each process which may result in oversubscribed cores and poor performance. Please consider reserving at least 2 cores per node for the additional threads, enabling SMT, or setting MVP_THREADS_PER_PROCESS=2 to ensure that sufficient resources are available.

The suggestion to set MVP_THREADS_PER_PROCESS=2 not only fails to improve performance but actually degrades it. Can this warning message be safely ignored, or is there any action I need to take to address it?



Best,

ZQ


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osu.edu/pipermail/mvapich-discuss/attachments/20241031/3ffc943f/attachment-0001.html>


More information about the Mvapich-discuss mailing list