[mvapich-discuss] the bug in mvapich-0.9.9-beta2

Shalnov, Sergey Sergey.Shalnov at intel.com
Wed Apr 11 12:30:27 EDT 2007


Amith,
I forgot to mention this MPI_Allreduce behavior exists with more than
64cpu run. I have same results for 64, 128, 256 and 512 cores run on our
cluster.

Thank you
Sergey


-----Original Message-----
From: amith rajith mamidala [mailto:mamidala at cse.ohio-state.edu] 
Sent: Wednesday, April 11, 2007 20:13
To: Shalnov, Sergey
Cc: mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] the bug in mvapich-0.9.9-beta2

Hi Shalnov,

Thanks for letting us know about this. We are looking into this,

Thanks,
Amith

On Wed, 11 Apr 2007, Shalnov, Sergey wrote:

> Hello,
> I started working with mvapich-0.9.9-beta2 and found that performance
of
> collective operation like MPI_Allreduce dramatically decreases after
> 512kb of message size. Following is the numbers for my experiments
>
> Size of messages in bytes	mvapich-0.9.9
> 4 096					89.6467
> 8 192					117.332
> 16 384				142.787
> 32 768				158.847
> 65 536				144.555
> 131 072				152.266
> 262 144				166.436
> 524 288				32.6395
> 1 048 576				30.8811
> 2 097 152				27.8332
> 4 194 304				26.4895
> 8 388 608				26.157
> 16 777 216				25.4449
> 33 554 432				25.9249
>
> First column is size of message for MPI_allreduce routine and second
> column is network bandwidth in MB/s.
>
> We looked into code and found some bug in
> $MVAPICH_BUILD_HOME/src/coll/intra_fns_new.c:103. This line is array
> coll_table definition. Second dimension of this array is macro
COLL_SIZE
> that defined as 5 in line 98. As I understand this is not correct to
> define COLL_SIZE as 5 - it must be defined as 4 and definition of
> coll_table must be rewritten as int
> coll_table[COLL_COUNT][COLL_SIZE+1]...
> This should be done because in line 555 at the same file we can see
> following code:
> If(lgn > COLL_SIZE) lgn = COLL_SIZE;
> After this lgn is using as array index that outbound the arrays.
>
> After made small fix in this file I found following results:
>
> Size of messages in bytes	mvapich-0.9.9-fixed
> 4 096					87.7509
> 8 192					118.255
> 16 384				139.281
> 32 768				153.702
> 65 536				137.334
> 131 072				141.985
> 262 144				152.65
> 524 288				187.648
> 1 048 576				153.005
> 2 097 152				120.034
> 4 194 304				108.731
> 8 388 608				99.7531
> 16 777 216				97.5817
> 33 554 432				96.4089
>
> So, I think I found the bug in mvapich-0.9.9-beta2 code.
>
> Thank you
> Sergey
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list