[mvapich-discuss] Default of MV2_ENABLE_AFFINITY: why 1?

Tue Jun 19 00:48:20 EDT 2012

Hi Jonathan,

If Torque or Slurm was to do this it would need to create a potentially
different task set for each node right? I just want to make sure we're on
the same page.

Thanks,

Steve

On Mon, Jun 18, 2012 at 9:48 PM, Jonathan Perkins <
perkinjo at cse.ohio-state.edu> wrote:

> On Mon, Jun 18, 2012 at 01:17:28PM -0400, Stephen Cousins wrote:
> > Hi,
> >
> > I am revisiting some affinity issues. For jobs that run on all cores of a
> > node I can see that having MV2_ENABLE_AFFINITY=1 is beneficial. However,
> if
> > you ever have users that run on a subset of the cores, where other jobs
> > might get scheduled on this node too this is a big problem. It is fine
> for
> > the first job. The second job though uses the same cores as the first and
> > performance goes down dramatically.  We are using Torque and Moab.
> >
> > I checked the mvapich2 configure script to see if there was a way to
> > compile the code with a different default but I didn't find any way to do
> > this. Rather than changing the code I have set the environment variable
> in
> > the module file for loading the MVAPICH2 environment.
> >
> > To my mind setting it to 0 as the default is a better choice since
> > currently the consequence of it being set wrong is much more dramatically
> > bad than the benefit of it being right. That is, currently if you get it
> > wrong you see at least a 100% time penalty in your job, whereas if you
> get
> > it right (that is, you really do want affinity set) then you get maybe a
> > 10% to 20% benefit.
> >
> > In general, I'd much rather have Affinity enabled but not the way it is
> > currently implemented.
> > How about if Affinity is enabled, then when new processes are started
> make
> > sure they are started on cores that aren't already being used, at least
> not
> > by other MVAPICH2 programs. Non-MVAPICH2 programs (at least the ones I'm
> > seeing with CHARM or OpenMP, I'll have to check with OpenMPI jobs) the
> > Linux scheduler seems to bounce them around appropriately scattering them
> > amongst the free sockets/cores.
> >
> > I have seen in the list that a general answer to this problem is to use
> CPU
> > Mapping but unless this can be done automatically by Moab/Torque this
> will
> > not work. For one thing, each node that is assigned to the job may
> require
> > a different mapping depending on what else is running on the nodes.
> >
> > What do you think?
>
> Thank you for your note.  We are discussing the issue that you're
> bringing up to see if we can do anything additional to address this
> situation.
>
> It seems that it would be ideal for the job manager such as Torque or
> SLURM to set the cpuset that the jobs can run on.  If this is configured
> then this should not be an issue.
>
> --
> Jonathan Perkins
> http://www.cse.ohio-state.edu/~perkinjo
>

-- 
______________________________________________________________________
Steve Cousins - Supercomputer Engineer/Administrator - Univ of Maine
Marine Sciences, 452 Aubert Hall       Target Tech, 20 Godfrey Drive
Orono, ME 04469    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     Orono, ME 04473
(207) 581-4302     ~ steve.cousins at maine.edu ~     (207) 866-6552
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120619/23c19c70/attachment-0001.html