[mvapich-discuss] Bug in mvapich-0.9.9-beta collective operations
amith rajith mamidala
mamidala at cse.ohio-state.edu
Thu Apr 12 11:41:50 EDT 2007
Hi all,
We have applied this patch to the trunk and the branch. Thanks !
Amith
On Wed, 11 Apr 2007, Mishura, Dmitri wrote:
> Hi all,
> Sorry, my previous mail doesn't appear to be sent to the list properly.
>
> I would like to post one patch, which fixes collectives bandwidth issue with large (>0.5Mb) vector size in mvapich-0.9.9-beta.
> Without this patch mvapich shows poor bandwidth on core counts >=64 on several Intel clusters. This appears to be due to error in indexing of collective threshold table in file intra_fns_new.c. This issue causes unexpected switching to old method (e.g. recursive doubling in intra_AllReduce). After this fix bandwidth (in this particular case this was allreduce) was substantially improved (6x on our Infiniband clusters: from 25Mb/s to 150Mb/s on 64 cores for sizes bigger than 512KB).
>
> Patch of src/coll/intra_fns_new.c:
> ========================================
> 98c98
> < #define COLL_SIZE 4
> ---
> > #define COLL_SIZE 5
> 103c103
> < int coll_table[COLL_COUNT][COLL_SIZE+1] = {{-1, -1, -1, 16384, 16384},
> ---
> > int coll_table[COLL_COUNT][COLL_SIZE] = {{-1, -1, -1, 16384, 16384},
>
> =========================================
>
>
> Dmitry Mishura, Intel Nizhny Novgorod Lab
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
More information about the mvapich-discuss
mailing list