[mvapich-discuss] Cannot utilize CPU core

Krishna Chaitanya Kandalla kandalla at cse.ohio-state.edu
Fri Apr 16 13:16:08 EDT 2010


Morris,
           Good to know that you are now able to get things to work. To 
answer your questions :
1. When you are launching a job with mpirun_rsh,  you can turn affinity 
off for all the processes if you set the parameter 
MV2_ENABLE_AFFINITY=0, in the manner we had discussed in the previous 
mails.
2. Mpirun_rsh is a scalable, hierarchical job startup mechanism that we 
use in MVAPICH2 and we have observed better startup costs on large scale 
clusters when compared to mpdboot.
3. Yes, we recommend using mpirun_rsh for launching jobs with mvapich2

            Please let us know if you have any further queries.

Thanks,
Krishna


MORRIS LAW wrote:
> Dear Krishna,
>
> Thanks for your kind help detail indications.  It works with mpirun_rsh and MV2_ENABLE_AFFINITY=0.  The CPU affinity problem vanishes now.
>
> May I follow up and ask,
>
> 1. Can I setup mpirun_rsh so that all process will run with MV2_ENABLE_AFFINITY=0?
> 2. What is the difference between mpirun and mpirun_rsh in MVAPICH2?
> 3. Shall I abandon using mpdboot for other applications compiled with MVAPICH2?
>
> Thanks in advance.
>
> --
> Morris Law
> Hong Kong Baptist University
>
>
>
>
>
> ----- Original Message -----
> From: Krishna Chaitanya Kandalla <kandalla at cse.ohio-state.edu>
> Date: Thursday, April 15, 2010 7:45 pm
> Subject: Re: [mvapich-discuss] Cannot utilize CPU core
> To: MORRIS LAW <morris at hkbu.edu.hk>
> Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>, mvapich2 <mvapich-core at cse.ohio-state.edu>
>
>
>   
>> Hi Morris,
>>                In MVAPICH2, we rely on "mpirun_rsh" - which is more 
>>  scalable than mpiexec and we no longer use mpdboot. Is it possible 
>> for 
>>  you to find out if the scheduler is using mpirun_rsh or not?  If you 
>> are 
>>  able to launch jobs from the command-line, instead of submitting them 
>>
>>  into a queue, you should be able to turn off affinity by doing :
>>  
>>  mpirun_rsh -np <num> -hostfile <hosts>  MV2_ENABLE_AFFINITY=0 ./<executable>
>>  
>>               Please let us know if this helps.
>>   
>>  Thanks,
>>  Krishna
>>  
>>  MORRIS LAW wrote:
>>  > Dear Krishna,
>>  >
>>  > I have tried to export the variable with .bashrc and used -v option 
>> through the scheduler.  It works with mvapich1.  However, it doesn't 
>> work with mvapich2.  Both mpirun, mpiexec of mvapich2 were tested in 
>> foreground/background/queue.  The jobs will compete CPU resources with 
>> the existing jobs.   Is there anything related to mpdboot?
>>  >
>>  > Best Regards,
>>  >
>>  > --
>>  > Morris Law
>>  > Hong Kong Baptist University
>>  >
>>  >
>>  > ----- Original Message -----
>>  > From: Krishna Chaitanya Kandalla <kandalla at cse.ohio-state.edu>
>>  > Date: Tuesday, April 13, 2010 9:21 am
>>  > Subject: Re: [mvapich-discuss] Cannot utilize CPU core
>>  > To: MORRIS LAW <morris at hkbu.edu.hk>
>>  > Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>, mvapich2 <mvapich-core at cse.ohio-state.edu>
>>  >
>>  >
>>  >   
>>  >> Morris,
>>  >>              Just following up on our previous discussion. We were 
>>
>>  >>  wondering if you  were able to export the variable or if you 
>> tried 
>>  >> using 
>>  >>  the -v option through the scheduler. Please let us know if there 
>> were 
>>  >>
>>  >>  any developments.
>>  >>  
>>  >>  Thanks,
>>  >>  Krishna
>>  >>  
>>  >>  Krishna Chaitanya Kandalla wrote:
>>  >>  > Morris,
>>  >>  >           Since you are not fully aware of how the scheduler is 
>>
>>  >>  > behaving and which cores are already being used on a given 
>> node, 
>>  >>  > probably you would be better off by setting AFFINITY to 0 and 
>>  >> letting 
>>  >>  > the kernel deal with it. But, to answer your other question, 
>>  >> suppose 
>>  >>  > you run a 4 process job on one compute node with 
>>  >>  > VIADEV_CPU_MAPPING=0:4:1:5 , the MVAPICH library will bind 
>>  >> processes 
>>  >>  > to cores in the following manner:
>>  >>  >
>>  >>  > process 0 to core 0
>>  >>  > process 1 to core 4
>>  >>  > process 2 to core 1
>>  >>  > process 3 to core 5
>>  >>  >
>>  >>  >           About your scheduler, do you happen to know if it 
>> uses 
>>  >>  > mpiexec or mpirun_rsh to launch the parallel jobs? I am not 
>> very 
>>  >> sure 
>>  >>  > if this can help, but is it possible for you to try exporting 
>> the 
>>  >>  > VIADEV_USE_AFFINITY parameter to 0, in your job submission 
>> script? 
>>  >> Can 
>>  >>  > you please let us know if this makes a difference in the behavior?
>>  >>  >
>>  >>  > Thanks,
>>  >>  > Krishna
>>  >>  >
>>  >>  >
>>  >>  > MORRIS LAW wrote:
>>  >>  >> Dear Krishna,
>>  >>  >>
>>  >>  >> Thanks for your advice.  Our 2048 cores (in 256 nodes) cluster 
>> use 
>>  >>
>>  >>  >> mainly mvapich ver 1.1 as most software are compiled with this 
>>
>>  >>  >> version and gcc.   And it relied on maui and torque to submit 
>>
>>  >>  >> parallel jobs into the queue.  
>>  >>  >> Therefore, it may come to a situation that one parallel jobs 
>>  >>  >> requiring 8 cores to run will be given 4/8 cores in 1st node, 
>> 3/8 
>>  >> in 
>>  >>  >> 2nd node and 1/8 in 3rd node.   VIADEV_USE_AFFINITY=0 will 
>> avoid 
>>  >> CPU 
>>  >>  >> affinity so that each jobs allocate to the cores will not 
>>  >>  >> share/compete with other jobs running in the same node.  And I 
>> am 
>>  >>
>>  >>  >> still not sure how to implement VIADEV_USE_AFFINITY=0 with 
>> maui or 
>>  >>
>>  >>  >> torque after consulting the manual.
>>  >>  >>
>>  >>  >> For VIADEV_CPU_MAPPING, is it a hard mapping for processes to 
>>
>>  >> core?  
>>  >>  >> If in case where VIADEV_CPU_MAPPING=0:4:1:5, process 0 used up 
>> 5 
>>  >> CPUs 
>>  >>  >> and process 1 cannot map to CPU 4, will process 1 be allocated 
>>
>>  >> other 
>>  >>  >> idle cores with VIADEV_USE_AFFINITY=0?
>>  >>  >>
>>  >>  >> Sorry for my stupid question as many processes are now running 
>>
>>  >> very 
>>  >>  >> slowly and competing within some CPU cores.  They are running 
>> in 
>>  >>  >> 25-46% of the CPU time.  I have to solved it quickly to avoid 
>>
>>  >> wasting 
>>  >>  >> of CPU time.
>>  >>  >>
>>  >>  >> Thanks in advance.
>>  >>  >>
>>  >>  >> -- 
>>  >>  >> Morris
>>  >>  >>
>>  >>  >>
>>  >>  >> ----- Original Message -----
>>  >>  >> From: Krishna Chaitanya <kandalla at cse.ohio-state.edu>
>>  >>  >> Date: Wednesday, April 7, 2010 9:56 am
>>  >>  >> Subject: Re: [mvapich-discuss] Cannot utilize CPU core
>>  >>  >> To: MORRIS LAW <morris at hkbu.edu.hk>
>>  >>  >> Cc: Dhabaleswar Panda <panda at cse.ohio-state.edu>, 
>>  >>  >> mvapich-discuss at cse.ohio-state.edu
>>  >>  >>
>>  >>  >>
>>  >>  >>  
>>  >>  >>> Morris,
>>  >>  >>>              If a parallel job  is run with 
>>  >> VIADEV_USE_AFFINITY=1, 
>>  >>  >>> we try to
>>  >>  >>>  map the processes to cores during MPI_Init and the processes 
>>
>>  >> stay 
>>  >>  >>> bound to
>>  >>  >>>  those cores for the rest of the time. We allow users to 
>>  >> experiment 
>>  >>  >>> with
>>  >>  >>>  various CPU mapping patterns by using the run-time variable 
>> :
>>  >>  >>>  VIADEV_CPU_MAPPING. You can use these variables with the 
>>  >> mpirun_rsh 
>>  >>  >>> command.
>>  >>  >>>    You can find more details about these variables here : (
>>  >>  >>>  
>>  >> 
>> http://nowlab.cse.ohio-state.edu/mvapich-website/support/mvapich_user_guide-1.2rc1.html#x1-1340009.6.5). 
>>
>>  >>
>>  >>  >>>
>>  >>  >>>  
>>  >>  >>>             However, if the job is run with 
>>  >> VIADEV_USE_AFFINITY=0, 
>>  >>  >>> we let the
>>  >>  >>>  kernel take care of binding processes to cores.  In this 
>> case, 
>>  >> the 
>>  >>  >>> kernel
>>  >>  >>>  "can" move the processes in any fashion during the 
>> application 
>>  >>  >>> execution,
>>  >>  >>>  which might not be good for the performance of applications.
>>  >>  >>>             In your case, since you are trying to run 
>> multiple 
>>  >> jobs 
>>  >>  >>> on the
>>  >>  >>>  same node at the same time, it would be better if you set
>>  >>  >>>  VIADEV_USE_AFFINITY to 0.  Or,  you can also use 
>>  >> VIADEV_CPU_MAPPING 
>>  >>  >>> to bind
>>  >>  >>>  the processes of different jobs such that they do not 
>> compete 
>>  >> for 
>>  >>  >>> the same
>>  >>  >>>  set of cores. (This is assuming that you do have idle cores 
>> when 
>>  >>
>>  >>  >>> you are
>>  >>  >>>  submitting the second job).
>>  >>  >>>             Please let us know if this helps. Also, as Dr. 
>> Panda 
>>  >> had
>>  >>  >>>  indicated in the last mail, we would recommend you to use 
>> the 
>>  >>  >>> latest version
>>  >>  >>>  of MVAPICH2, as we now offer better CPU mapping techniques.
>>  >>  >>>  
>>  >>  >>>  Thanks,
>>  >>  >>>  Krishna
>>  >>  >>>  
>>  >>  >>>  
>>  >>  >>>  On Tue, Apr 6, 2010 at 9:12 PM, MORRIS LAW 
>> <morris at hkbu.edu.hk> 
>>  >> wrote:
>>  >>  >>>  
>>  >>  >>>  > The 'affinity' problem is very new to me.  May I know 
>> exactly 
>>  >> how 
>>  >>  >>> I can
>>  >>  >>>  > control running of the jobs with VIADEV_USE_AFFINITY=0?
>>  >>  >>>  >
>>  >>  >>>  > Should I place the line in /etc/profile or in the mpirun script?
>>  >>  >>>  >
>>  >>  >>>  > Best Regards,
>>  >>  >>>  >
>>  >>  >>>  > --
>>  >>  >>>  > Morris
>>  >>  >>>  >
>>  >>  >>>  >
>>  >>  >>>  > ----- Original Message -----
>>  >>  >>>  > From: Dhabaleswar Panda <panda at cse.ohio-state.edu>
>>  >>  >>>  > Date: Tuesday, March 30, 2010 8:29 pm
>>  >>  >>>  > Subject: Re: [mvapich-discuss] Cannot utilize CPU core
>>  >>  >>>  > To: MORRIS LAW <morris at hkbu.edu.hk>
>>  >>  >>>  > Cc: mvapich-discuss at cse.ohio-state.edu
>>  >>  >>>  >
>>  >>  >>>  >
>>  >>  >>>  > > You are seeing the effect of `affinity' of processes to 
>>
>>  >> cores 
>>  >>  >>> here.
>>  >>  >>>  > > Try to
>>  >>  >>>  > >  run your jobs with VIADEV_USE_AFFINITY=0. More details 
>> on 
>>  >> this 
>>  >>  >>> are
>>  >>  >>>  > >  available from MVAPICH 1.1 user guide at the following 
>> location:
>>  >>  >>>  > >
>>  >>  >>>  > >
>>  >>  >>>  > 
>>  >>  >>> 
>>  >> 
>> http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide-1.1.html#x1-1340009.6.5 
>>
>>  >>
>>  >>  >>>
>>  >>  >>>  > >
>>  >>  >>>  > >  Let us know if the problem goes away with this option.
>>  >>  >>>  > >
>>  >>  >>>  > >  Also note that MVAPICH2 has many more flexible ways to 
>> bind 
>>  >>
>>  >>  >>> processes
>>  >>  >>>  > > to
>>  >>  >>>  > >  cores. You can use the latest version of MVAPICH2 
>> (1.4.1) 
>>  >> to take
>>  >>  >>>  > >  advantage of these features.
>>  >>  >>>  > >
>>  >>  >>>  > >  DK
>>  >>  >>>  > >
>>  >>  >>>  > >
>>  >>  >>>  > >
>>  >>  >>>  > >  On Tue, 30 Mar 2010, MORRIS LAW wrote:
>>  >>  >>>  > >
>>  >>  >>>  > >  > Dear all,
>>  >>  >>>  > >  >
>>  >>  >>>  > >  > I am new to the discussion group.
>>  >>  >>>  > >  >
>>  >>  >>>  > >  > Recently I found a problem of running mvapich 1.1 on 
>>
>>  >> Gen2-IB 
>>  >>  >>> device
>>  >>  >>>  > > (Qlogic 9120).  When subsequent run of mvapich job was 
>>  >>  >>> delivered to
>>  >>  >>>  > > the nodes, the later jobs will not run on free CPU cores 
>> but 
>>  >> will
>>  >>  >>>  > > compete with the current running CPU cores.  Thus the 
>> whole 
>>  >> node
>>  >>  >>>  > > cannot be utilized.  I don't know where the problem is.  
>> Is 
>>  >> it 
>>  >>  >>> related
>>  >>  >>>  > > to the IB switch or some parameter when I built the 
>> mvapich 
>>  >> 1.1?
>>  >>  >>>  > >  >
>>  >>  >>>  > >  > I built mvapich 1.1 using gcc 4.1 on CentOS 5.3.  I 
>> have 
>>  >>
>>  >>  >>> also built
>>  >>  >>>  > > another version of mvapich 1.1 using icc and ifort on 
>> the 
>>  >> same 
>>  >>  >>> CentOS
>>  >>  >>>  > > 5.3.  Both run jobs having similar problem.
>>  >>  >>>  > >  >
>>  >>  >>>  > >  > Would someone give me some hints to tackle the problem?
>>  >>  >>>  > >  >
>>  >>  >>>  > >  > Thanks in advance.
>>  >>  >>>  > >  >
>>  >>  >>>  > >  > --
>>  >>  >>>  > >  > Morris Law
>>  >>  >>>  > >  > HK Baptist University
>>  >>  >>>  > >  >
>>  >>  >>>  > >  > _______________________________________________
>>  >>  >>>  > >  > mvapich-discuss mailing list
>>  >>  >>>  > >  > mvapich-discuss at cse.ohio-state.edu
>>  >>  >>>  > >  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>  >>  >>>  > >  >
>>  >>  >>>  > >
>>  >>  >>>  > >
>>  >>  >>>  > _______________________________________________
>>  >>  >>>  > mvapich-discuss mailing list
>>  >>  >>>  > mvapich-discuss at cse.ohio-state.edu
>>  >>  >>>  > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>  >>  >>>  >
>>  >>  >>>  
>>  >>  >>>     
>>  >>  >>
>>  >>  >>
>>  >>  >>   
>>  >>  >
>>  >>  
>>  >>     
>>  >
>>  >
>>  >   
>>  
>>     
>
>
>   


More information about the mvapich-discuss mailing list