[mvapich-discuss] mvapich2 slow with default mapping
Götz Waschk
goetz.waschk at gmail.com
Fri Apr 29 07:41:28 EDT 2016
Dear Hari,
this is the output with default CPU binding and the benchmark result
of the PingPong:
-------------CPU AFFINITY-------------
RANK:0 CPU_SET: 4
RANK:2 CPU_SET: 6
RANK:4 CPU_SET: 4
RANK:6 CPU_SET: 6
RANK:8 CPU_SET: 1
RANK:10 CPU_SET: 7
RANK:12 CPU_SET: 1
RANK:14 CPU_SET: 7
-------------------------------------
#------------------------------------------------------------
# Intel (R) MPI Benchmarks 4.1, MPI-1 part
#------------------------------------------------------------
# Date : Fri Apr 29 13:36:33 2016
# Machine : x86_64
# System : Linux
# Release : 3.10.0-327.10.1.el7.x86_64
# Version : #1 SMP Tue Feb 16 06:09:11 CST 2016
# MPI Version : 3.0
# MPI Thread Environment:
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# /opt/ohpc/pub/libs/gnu/mvapich2/imb/4.1/bin/IMB-MPI1
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
# PingPing
# Sendrecv
# Exchange
# Allreduce
# Reduce
# Reduce_scatter
# Allgather
# Allgatherv
# Gather
# Gatherv
# Scatter
# Scatterv
# Alltoall
# Alltoallv
# Bcast
# Barrier
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
# ( 14 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 39.98 0.00
1 1000 1.59 0.60
2 1000 1.55 1.23
4 1000 1.54 2.48
8 1000 1.55 4.91
16 1000 1.63 9.33
32 1000 1.66 18.43
64 1000 1.69 36.09
128 1000 1.92 63.60
256 1000 3.10 78.71
512 1000 3.37 144.87
1024 1000 3.94 248.16
2048 1000 15.67 124.60
4096 1000 16.61 235.20
8192 1000 19.52 400.14
16384 1000 54.38 287.33
32768 1000 44.83 697.01
65536 640 62.69 996.96
131072 320 119.91 1042.41
262144 160 225.87 1106.84
524288 80 789.38 633.41
1048576 40 875.05 1142.79
2097152 20 1784.90 1120.51
4194304 10 18072.80 221.33
This is the Infiniband information of one node, the other looks the same:
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.7.000
node_guid: 0018:8b90:97fe:ef8d
sys_image_guid: 0018:8b90:97fe:ef90
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xA0
board_id: DEL08C0000009
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 318
port_lid: 127
port_lmc: 0x00
link_layer: InfiniBand
port: 2
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
Regards, Götz Waschk
On Thu, Apr 28, 2016 at 5:57 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:
> Hello Götz,
>
> It looks like some sort of oversubscription is happening here. Could you
> please send us the following information?
>
> 1. Output of program run after setting MV2_SHOW_CPU_BINDING=1
>
> 2. Output of ibv_devinfo executed on the system where you're seeing the
> degradation.
>
> Thanks,
> Hari.
>
> On Apr 28, 2016 10:11 AM, "Götz Waschk" <goetz.waschk at gmail.com> wrote:
>>
>> Dear Mvapich2 experts,
>>
>> I'm currently evaluating OpenHPC packages, including mvapich2 2.1.
>> I've tested the speed using the Intel MPI benchmarks and I have
>> noticed, that the first benchmark PingPong is behaving differently
>> when run with 16 cores vs. 2 cores, although only two cores are in use
>> and the remaining processes simply wait. The full results are in
>> OpenHPC's issue tracker on github:
>> https://github.com/openhpc/ohpc/issues/207#issuecomment-212319647
>>
>> As you can see there, the configuration change to set these variables
>> helped:
>>
>> export MV2_SHOW_CPU_BINDING=1
>> export MV2_CPU_MAPPING=0:1:2:3:4:5:6:7
>>
>> I still wonder why they have such an influence and the default setting
>> isn't sufficient here.
>>
>> Regards,
>> Götz Waschk
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
AL I:40: Do what thou wilt shall be the whole of the Law.
More information about the mvapich-discuss
mailing list