[mvapich-discuss] proposal to fix MPI_allreduce bandwidth.
amith rajith mamidala
mamidala at cse.ohio-state.edu
Wed Apr 18 11:03:35 EDT 2007
Hi Shalnov,
Thanks for sending the performance data. We are looking into this,
Thanks,
Amith
On Wed, 18 Apr 2007, Shalnov, Sergey wrote:
> Hello,
> I had downloaded fresh version of mvapich-0.9.9 by svn and made several
> experiments with collective operations like MPI_Allreduce and
> MPI_Allgatherv. I found that bandwidth for MPI_Allreduce has some kind
> of hole in case of block size from 16k to 512k transmission. I am not
> sure about different architectures but it appears on my Intel based
> infiniband clusters (I tested it on two clusters but results are from
> one of them).
>
> In attached Microsoft spreadsheet with results and graphs to help you to
> examine my results. There are three columns:
> 1 - mvapich-0.9.9 is the version of mvapich-0.9.9 from tar ball.
> 2 - mvapich-0.9.9-trunk is the version from main trunk (Dmitri Mishura's
> fix
> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2007-April/0007
> 34.html included)
> 3 - mvapich-0.9.9-fixed1 is #2 + attached patch.
>
> Attached patch file is the patch for
> $MVAPICH_BUILD_HOME/src/coll/intra_fns_new.c:73 file.
> This line looks like
>
> #define SHMEM_COLL_ALLREDUCE_THRESHOLD (1<<19)
>
> And I can propose to change it to:
>
> #define SHMEM_COLL_ALLREDUCE_THRESHOLD (1<<15)
>
> This change improves bandwidth on my cluster as showed below:
>
> Size of messages in bytes mvapich-0.9.9 mvapich-0.9.9-fixed1
> mvapich-0.9.9-trunk
> 4 096 89.6467 93.0733
> 93.2853
> 8 192 117.332 139.493
> 137.496
> 16 384 142.787 184.153
> 185.812
> 32 768 158.847 286.147
> 206.245
> 65 536 144.555 328.089
> 192.31
> 131 072 152.266 289.667
> 190.743
> 262 144 166.436 279.73
> 203.1
> 524 288 32.6395 253.501
> 252.428
> 1 048 576 30.8811 231.03
> 229.957
> 2 097 152 27.8332 199.249
> 201.419
> 4 194 304 26.4895 191.914
> 192.835
> 8 388 608 26.157 183.449
> 184.363
> 16 777 216 25.4449 178.985
> 181.572
> 33 554 432 25.9249 177.411
> 179.012
>
> The testing method is to send same amount of bytes (167772160 bytes) on
> each iteration by different size of block (size of messages in bytes).
> It means each iteration we can measure network bandwidth for particular
> message size in MPI collective operation.
>
> Thank you
> Sergey
>
>
>
>
More information about the mvapich-discuss
mailing list