[mvapich-discuss] Bug in mvapich-0.9.9-beta collective operations

amith rajith mamidala mamidala at cse.ohio-state.edu
Thu Apr 12 11:41:50 EDT 2007


Hi all,

We have applied this patch to the trunk and the branch. Thanks !

Amith

On Wed, 11 Apr 2007, Mishura, Dmitri wrote:

> Hi all,
> Sorry, my previous mail doesn't appear to be sent to the list properly.
>
> I would like to post one patch, which fixes collectives bandwidth issue with large (>0.5Mb) vector size in mvapich-0.9.9-beta.
> Without this patch mvapich shows poor bandwidth on core counts >=64 on several Intel clusters. This appears to be due to error in indexing of collective threshold table in file intra_fns_new.c. This issue causes unexpected switching to old method (e.g. “recursive doubling” in intra_AllReduce). After this fix bandwidth (in this particular case this was allreduce) was substantially improved (6x on our Infiniband clusters: from 25Mb/s to 150Mb/s on 64 cores for sizes bigger than 512KB).
>  
> Patch of src/coll/intra_fns_new.c:
> ========================================
> 98c98
> < #define COLL_SIZE  4
> ---
> > #define COLL_SIZE  5
> 103c103
> < int coll_table[COLL_COUNT][COLL_SIZE+1] = {{-1, -1, -1, 16384, 16384},
> ---
> > int coll_table[COLL_COUNT][COLL_SIZE] = {{-1, -1, -1, 16384, 16384},
>
> =========================================
>
>
> Dmitry Mishura, Intel Nizhny Novgorod Lab
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>





More information about the mvapich-discuss mailing list