[mvapich-discuss] Cannot utilize CPU core

MORRIS LAW morris at hkbu.edu.hk
Tue Apr 6 22:35:00 EDT 2010


Dear Krishna,

Thanks for your advice.  Our 2048 cores (in 256 nodes) cluster use mainly mvapich ver 1.1 as most software are compiled with this version and gcc.   And it relied on maui and torque to submit parallel jobs into the queue.   

Therefore, it may come to a situation that one parallel jobs requiring 8 cores to run will be given 4/8 cores in 1st node, 3/8 in 2nd node and 1/8 in 3rd node.   VIADEV_USE_AFFINITY=0 will avoid CPU affinity so that each jobs allocate to the cores will not share/compete with other jobs running in the same node.  And I am still not sure how to implement VIADEV_USE_AFFINITY=0 with maui or torque after consulting the manual.

For VIADEV_CPU_MAPPING, is it a hard mapping for processes to core?  If in case where VIADEV_CPU_MAPPING=0:4:1:5, process 0 used up 5 CPUs and process 1 cannot map to CPU 4, will process 1 be allocated other idle cores with VIADEV_USE_AFFINITY=0?

Sorry for my stupid question as many processes are now running very slowly and competing within some CPU cores.  They are running in 25-46% of the CPU time.  I have to solved it quickly to avoid wasting of CPU time.

Thanks in advance.

--
Morris


----- Original Message -----
From: Krishna Chaitanya <kandalla at cse.ohio-state.edu>
Date: Wednesday, April 7, 2010 9:56 am
Subject: Re: [mvapich-discuss] Cannot utilize CPU core
To: MORRIS LAW <morris at hkbu.edu.hk>
Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>, mvapich-discuss at cse.ohio-state.edu


> Morris,
>              If a parallel job  is run with VIADEV_USE_AFFINITY=1, we 
> try to
>  map the processes to cores during MPI_Init and the processes stay 
> bound to
>  those cores for the rest of the time. We allow users to experiment with
>  various CPU mapping patterns by using the run-time variable :
>  VIADEV_CPU_MAPPING. You can use these variables with the mpirun_rsh command.
>    You can find more details about these variables here : (
>  http://nowlab.cse.ohio-state.edu/mvapich-website/support/mvapich_user_guide-1.2rc1.html#x1-1340009.6.5).
>  
>             However, if the job is run with VIADEV_USE_AFFINITY=0, we 
> let the
>  kernel take care of binding processes to cores.  In this case, the kernel
>  "can" move the processes in any fashion during the application execution,
>  which might not be good for the performance of applications.
>             In your case, since you are trying to run multiple jobs on 
> the
>  same node at the same time, it would be better if you set
>  VIADEV_USE_AFFINITY to 0.  Or,  you can also use VIADEV_CPU_MAPPING 
> to bind
>  the processes of different jobs such that they do not compete for the 
> same
>  set of cores. (This is assuming that you do have idle cores when you 
> are
>  submitting the second job).
>             Please let us know if this helps. Also, as Dr. Panda had
>  indicated in the last mail, we would recommend you to use the latest 
> version
>  of MVAPICH2, as we now offer better CPU mapping techniques.
>  
>  Thanks,
>  Krishna
>  
>  
>  On Tue, Apr 6, 2010 at 9:12 PM, MORRIS LAW <morris at hkbu.edu.hk> wrote:
>  
>  > The 'affinity' problem is very new to me.  May I know exactly how I 
> can
>  > control running of the jobs with VIADEV_USE_AFFINITY=0?
>  >
>  > Should I place the line in /etc/profile or in the mpirun script?
>  >
>  > Best Regards,
>  >
>  > --
>  > Morris
>  >
>  >
>  > ----- Original Message -----
>  > From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
>  > Date: Tuesday, March 30, 2010 8:29 pm
>  > Subject: Re: [mvapich-discuss] Cannot utilize CPU core
>  > To: MORRIS LAW <morris at hkbu.edu.hk>
>  > Cc: mvapich-discuss at cse.ohio-state.edu
>  >
>  >
>  > > You are seeing the effect of `affinity' of processes to cores here.
>  > > Try to
>  > >  run your jobs with VIADEV_USE_AFFINITY=0. More details on this are
>  > >  available from MVAPICH 1.1 user guide at the following location:
>  > >
>  > >
>  > http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide-1.1.html#x1-1340009.6.5
>  > >
>  > >  Let us know if the problem goes away with this option.
>  > >
>  > >  Also note that MVAPICH2 has many more flexible ways to bind processes
>  > > to
>  > >  cores. You can use the latest version of MVAPICH2 (1.4.1) to take
>  > >  advantage of these features.
>  > >
>  > >  DK
>  > >
>  > >
>  > >
>  > >  On Tue, 30 Mar 2010, MORRIS LAW wrote:
>  > >
>  > >  > Dear all,
>  > >  >
>  > >  > I am new to the discussion group.
>  > >  >
>  > >  > Recently I found a problem of running mvapich 1.1 on Gen2-IB device
>  > > (Qlogic 9120).  When subsequent run of mvapich job was delivered 
> to
>  > > the nodes, the later jobs will not run on free CPU cores but will
>  > > compete with the current running CPU cores.  Thus the whole node
>  > > cannot be utilized.  I don't know where the problem is.  Is it related
>  > > to the IB switch or some parameter when I built the mvapich 1.1?
>  > >  >
>  > >  > I built mvapich 1.1 using gcc 4.1 on CentOS 5.3.  I have also 
> built
>  > > another version of mvapich 1.1 using icc and ifort on the same CentOS
>  > > 5.3.  Both run jobs having similar problem.
>  > >  >
>  > >  > Would someone give me some hints to tackle the problem?
>  > >  >
>  > >  > Thanks in advance.
>  > >  >
>  > >  > --
>  > >  > Morris Law
>  > >  > HK Baptist University
>  > >  >
>  > >  > _______________________________________________
>  > >  > mvapich-discuss mailing list
>  > >  > mvapich-discuss at cse.ohio-state.edu
>  > >  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>  > >  >
>  > >
>  > >
>  > _______________________________________________
>  > mvapich-discuss mailing list
>  > mvapich-discuss at cse.ohio-state.edu
>  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>  >
>  


More information about the mvapich-discuss mailing list