[mvapich-discuss] (no subject)

Hari Subramoni subramoni.1 at osu.edu
Mon Jun 8 13:29:22 EDT 2015


Hi Prasad,

It looks like oversubscription is happening. There may be some interaction
happening here between MVAPICH2's internal process mapping and SLURM's
process mapping (if you're using it). Can you please send the output of
"mpiname -a" so that we can see if MVAPICH2 has been configured properly to
run with SLURM?

Could you please tell us how many HCA's you've on your system?

In the mean time, please try to run with MV2_ENABLE_AFFINITY=0. This
disables MVAPICH2's internal process mapping feature. This may result in
degraded performance when compared to a case with proper CPU mapping, but
you should see that oversubscription does not happen.

Regards,
Hari.

On Mon, Jun 8, 2015 at 1:01 PM, Prasad Maddumage <MHEMANTHA at fsu.edu> wrote:

>  Hello Hari,
>
> Thank you for your quick response. We have run two test jobs and following
> are the results. Both jobs get allocated with correct number of cores but
> uses half of the available cores.
>
> Test 1:
>
> $ cat submit.sh
> #!/bin/sh
>
> #SBATCH -N 2
> #SBATCH --ntasks-per-node=4
>
> export MV2_SHOW_CPU_BINDING=1
> srun --cpu_bind=verbose ./mbp-gnu-mvapich2 primates.nex
>
>
> $ cat slurm-732.out
> cpu_bind=NULL - hpc-tc-2, task  6  2 [6819]: mask 0x20
> cpu_bind=NULL - hpc-tc-1, task  3  3 [1691]: mask 0x80
> cpu_bind=NULL - hpc-tc-2, task  5  1 [6818]: mask 0x8
> cpu_bind=NULL - hpc-tc-2, task  7  3 [6820]: mask 0x80
> cpu_bind=NULL - hpc-tc-2, task  4  0 [6817]: mask 0x2
> cpu_bind=NULL - hpc-tc-1, task  0  0 [1688]: mask 0x2
> cpu_bind=NULL - hpc-tc-1, task  1  1 [1689]: mask 0x8
> cpu_bind=NULL - hpc-tc-1, task  2  2 [1690]: mask 0x20
> -------------CPU AFFINITY-------------
> RANK:0  CPU_SET:   1
> RANK:1  CPU_SET:   3
> RANK:2  CPU_SET:   1
> RANK:3  CPU_SET:   3
> -------------------------------------
>                                MrBayes v3.1.2
>
>                       (Bayesian Analysis of Phylogeny)
>
>                              (Parallel version)
>                          (8 processors available)
>
>
> Test 2:
>
>  #!/bin/bash
>  #
>  #SBATCH --ntasks=8
>  #SBATCH --nodes=2
>  #SBATCH --core-spec=8
>  #SBATCH --ntasks-per-node=4
>  #SBATCH --ntasks-per-core=1
>  #SBATCH --ntasks-per-socket=4
>  #SBATCH --cpus-per-task=1
>  #SBATCH --extra-node-info=2:4:1
>  #SBATCH --job-name=mva_test
>  #SBATCH -p genacc_q
>  #SBATCH -t 00:30:00
>  #SBATCH --mail-type=ALL
>
>  module purge
>  module load gnu-mvapich2
>
>  echo $SLURM_NODELIST
>  export MV2_SHOW_CPU_BINDING=1
>  srun --cpu_bind=verbose,cores  ./micro_gnu_pich2
>  ~
>
>
>
>
>   hpc-tc-[1-2]
>  cpu_bind_cores=UNK  - hpc-tc-1, task  1  1 [1579]: mask 0x4 set
>  cpu_bind_cores=UNK  - hpc-tc-2, task  6  2 [6797]: mask 0x10 set
>  cpu_bind_cores=UNK  - hpc-tc-1, task  0  0 [1578]: mask 0x1 set
>  cpu_bind_cores=UNK  - hpc-tc-1, task  2  2 [1580]: mask 0x10 set
>  cpu_bind_cores=UNK  - hpc-tc-1, task  3  3 [1581]: mask 0x40 set
>  cpu_bind_cores=UNK  - hpc-tc-2, task  4  0 [6795]: mask 0x1 set
>  cpu_bind_cores=UNK  - hpc-tc-2, task  5  1 [6796]: mask 0x4 set
>  cpu_bind_cores=UNK  - hpc-tc-2, task  7  3 [6798]: mask 0x40 set
>  -------------CPU AFFINITY-------------
>  RANK:0  CPU_SET:   0
>  RANK:1  CPU_SET:   2
>  RANK:2  CPU_SET:   0
>  RANK:3  CPU_SET:   2
>  -------------------------------------
>
> Please note that no other jobs exist on this test cluster.
>
> Best,
> Prasad
>
>
> On 06/08/2015 12:35 PM, Hari Subramoni wrote:
>
> Hello Prasad,
>
>  It could be due to oversubscription of processes i.e multiple processes
> getting bound to the same CPU core. Can you please run your program after
> setting MV2_SHOW_CPU_BINDING=1 and see how the process mapping is happening?
>
>  Regards,
> Hari.
>
> On Mon, Jun 8, 2015 at 12:12 PM, Prasad Maddumage <MHEMANTHA at fsu.edu>
> wrote:
>
>> X-MS-Exchange-Transport-CrossTenantHeader
>> Hi,
>>
>> I have installed mvapich2 latest version and getting 50% CPU
>> utilization. This is on a test cluster with no other jobs running. We
>> have hwloc 1.10 already installed and I did not make any changes to the
>> hwloc 1.9 came with mvapich2. Could this be a problem? Is it possible to
>> configure mvapich2 with (already available) hwloc 1.10?
>>
>> Thank you
>> Prasad Maddumage
>>
>> --
>> Prasad Maddumage
>> Application Specialist, Research Computing Center
>> 150F, Dirac Science Library, Florida State University
>> Tallahassee Fl 32306-4120
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>
>
> --
> Prasad Maddumage
> Application Specialist, Research Computing Center
> 150F, Dirac Science Library, Florida State University
> Tallahassee Fl 32306-4120
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150608/0164db65/attachment-0001.html>


More information about the mvapich-discuss mailing list