[mvapich-discuss] Performance differences between mvapich2-1.0
and mvapich2-1.2 (fwd)
wei huang
huanwei at cse.ohio-state.edu
Wed Jul 30 10:47:16 EDT 2008
Hi Bernd,
Thanks for trying out mvapich2-1.2rc1 and let us know the problem. We are
in the process of performance tuning and are looking at this issue. We
will get back to you soon. Thanks.
-- Wei
> ---------- Forwarded message ----------
> Date: Tue, 29 Jul 2008 18:45:38 +0200
> From: Bernd Kallies <kallies at zib.de>
> To: mvapich-discuss at cse.ohio-state.edu
> Subject: [mvapich-discuss] Performance differences between mvapich2-1.0 and
> mvapich2-1.2
>
> It seems to me that mvapich2-1.2rc1 seems to be slower that previous
> versions when compiling/using defaults. I'd like to know if I forgot
> some secret preprocessor flag or configure option for 1.2.
>
> I compiled the nighty build for mvapich2-1.0 as of July 28 (I guess it
> is something like mvapich2-1.0.5) with the following settings:
>
> export CC=icc
> export CXX=icpc
> export F77=ifort
> export F90=ifort
> export CFLAGS='-D_EM64T_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED
> -DMPIDI_CH3_CHANNEL_RNDV -DMPID_USE_SEQUENCE_NUMBERS -DRDMA_CM -O2'
> configure --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd
> --disable-romio --enable-sharedlibs=gcc --without-mpe
>
> I compiled the tarball source of mvapich2-1.2rc1 with
> unset CFLAGS
> ./configure --enable-romio --with-file-system=lustre+nfs
> --enable-fast=defopt --with-rdma=gen2 --with-thread-package
> --enable-sharedlibs=gcc --without-mpe
>
> I get the following when running osu_alltoall with 1 task per node on
> two nodes after setting MV2_NUM_PORTS=2 MV2_ENABLE_AFFINITY=0:
>
> mvapich2-1.0.5-intel:
> # OSU MPI All-to-All Personalized Exchange Latency Test v3.1
> # Size Latency (us)
> 1 1.62
> 2 1.71
> 4 1.66
> 8 1.64
> 16 1.68
> 32 1.74
> 64 1.97
> 128 3.04
> 256 3.42
> 512 4.01
> 1024 5.26
> 2048 6.62
> 4096 9.45
> 8192 15.20
> 16384 17.76
> 32768 23.21
> 65536 38.60
> 131072 76.32
> 262144 151.70
> 524288 296.74
> 1048576 591.68
>
> mvapich2-1.2rc1-intel:
> # OSU MPI All-to-All Personalized Exchange Latency Test v3.1
> # Size Latency (us)
> 1 1.87
> 2 1.80
> 4 1.81
> 8 1.82
> 16 1.86
> 32 1.92
> 64 2.10
> 128 3.16
> 256 3.53
> 512 4.07
> 1024 5.33
> 2048 6.79
> 4096 9.54
> 8192 15.34
> 16384 17.48
> 32768 22.88
> 65536 38.78
> 131072 76.55
> 262144 149.74
> 524288 297.11
> 1048576 591.25
>
> Other OSU benchmarks yield no visible differences between the two
> builds, e.g. osu_mbw_mr with 2 nodes and 4 tasks per node:
>
> mvapich2-1.0.5-intel:
> # OSU MPI Multiple Bandwidth / Message Rate Test v3.1
> # [ pairs: 4 ] [ window size: 64 ]
> # Size MB/s Messages/s
> 1 3.45 3447336.26
> 2 6.93 3463236.43
> 4 13.83 3458551.26
> 8 27.68 3460000.08
> 16 62.91 3931824.03
> 32 109.74 3429389.41
> 64 213.14 3330258.12
> 128 353.90 2764881.74
> 256 624.27 2438548.84
> 512 980.57 1915173.15
> 1024 1241.38 1212281.33
> 2048 1463.71 714703.42
> 4096 1612.25 393616.25
> 8192 1721.11 210096.00
> 16384 1851.29 112993.94
> 32768 2051.28 62600.09
> 65536 2062.08 31464.92
> 131072 2065.59 15759.17
> 262144 2074.04 7911.82
> 524288 2082.66 3972.35
> 1048576 2087.94 1991.22
> 2097152 2090.20 996.69
> 4194304 2075.23 494.77
>
> mvapich2-1.2rc1-intel:
> # OSU MPI Multiple Bandwidth / Message Rate Test v3.1
> # [ pairs: 4 ] [ window size: 64 ]
> # Size MB/s Messages/s
> 1 3.42 3424686.07
> 2 6.92 3459442.70
> 4 13.73 3431691.09
> 8 27.59 3449218.84
> 16 62.63 3914337.15
> 32 108.91 3403302.14
> 64 210.89 3295101.65
> 128 347.89 2717920.88
> 256 621.49 2427687.32
> 512 982.32 1918595.24
> 1024 1246.40 1217187.35
> 2048 1490.18 727625.11
> 4096 1684.54 411264.55
> 8192 1768.11 215833.58
> 16384 1852.36 113059.37
> 32768 2048.83 62525.18
> 65536 2062.01 31463.76
> 131072 2066.38 15765.20
> 262144 2074.90 7915.12
> 524288 2082.75 3972.54
> 1048576 2088.07 1991.34
> 2097152 2090.04 996.61
> 4194304 2077.47 495.31
>
> I also compiled the quantum chemistry code CPMD 3.11.1 with both libs.
> The code has own profiling. A benchmark run yields for a run with 64
> nodes, 1 task per node, 1 thread per task, application-defined task
> pinning, MV2_NUM_PORTS=2 MV2_ENABLE_AFFINITY=0:
>
> mvapich2-1.0.5-intel:
> ..
> CPU TIME : 0 HOURS 17 MINUTES 7.53 SECONDS
> ELAPSED TIME : 0 HOURS 17 MINUTES 40.26 SECONDS
> ..
> ================================================================
> = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
> = SEND/RECEIVE 36385. BYTES 722421. =
> = BROADCAST 37880. BYTES 368. =
> = GLOBAL SUMMATION 393974. BYTES 10556. =
> = GLOBAL MULTIPLICATION 0. BYTES 1. =
> = ALL TO ALL COMM 484310. BYTES 46464. =
> = PERFORMANCE TOTAL TIME =
> = SEND/RECEIVE 681.133 MB/S 38.591 SEC =
> = BROADCAST 87.115 MB/S 0.160 SEC =
> = GLOBAL SUMMATION 1520.563 MB/S 16.410 SEC =
> = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
> = ALL TO ALL COMM 86.898 MB/S 258.959 SEC =
> = SYNCHRONISATION 1.750 SEC =
> ================================================================
>
> mvapich2-1.2rc1-intel:
> ..
> CPU TIME : 0 HOURS 18 MINUTES 59.23 SECONDS
> ELAPSED TIME : 0 HOURS 19 MINUTES 31.68 SECONDS
> ..
> ================================================================
> = COMMUNICATION TASK AVERAGE MESSAGE LENGTH NUMBER OF CALLS =
> = SEND/RECEIVE 36385. BYTES 722421. =
> = BROADCAST 37880. BYTES 368. =
> = GLOBAL SUMMATION 393974. BYTES 10556. =
> = GLOBAL MULTIPLICATION 0. BYTES 1. =
> = ALL TO ALL COMM 484310. BYTES 46464. =
> = PERFORMANCE TOTAL TIME =
> = SEND/RECEIVE 699.651 MB/S 37.570 SEC =
> = BROADCAST 87.114 MB/S 0.160 SEC =
> = GLOBAL SUMMATION 1557.608 MB/S 16.020 SEC =
> = GLOBAL MULTIPLICATION 0.000 MB/S 0.001 SEC =
> = ALL TO ALL COMM 61.302 MB/S 367.082 SEC =
> = SYNCHRONISATION 1.950 SEC =
> ================================================================
>
> The difference is reproducible (mvapich2-1.2rc1-intel is slower, seems
> to be the reason of slow all to all comm.), also compared to
> mvapich2-1.0.3 from tarball, or mvapich2-1.0.1 and mvapich-0.9.9 (both
> precompiled from SGI, available from SGI). Note that the benchmarks are
> run with no intra-node communication.
>
> Sincerely, BK
> --
> Dr. Bernd Kallies
> Konrad-Zuse-Zentrum für Informationstechnik Berlin
> Takustr. 7
> 14195 Berlin
> Tel: +49-30-84185-270
> Fax: +49-30-84185-311
> e-mail: kallies at zib.de
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
More information about the mvapich-discuss
mailing list