[mvapich-discuss] ROMIO bug in ADIOI_LUSTRE_Get_striping_info

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Apr 1 11:31:07 EDT 2011


Mark:
Hi, thanks for using MVAPICH2.  We usually update ROMIO when we rebase
our code with MPICH2 (upstream).  Thanks for this bug report, we'll
look at taking in this fix and any others we find for our next
release.

On Fri, Apr 1, 2011 at 9:56 AM, Mark Dixon <m.c.dixon at leeds.ac.uk> wrote:
> Thanks for all the work on MVAPICH2 - it's really a great MPI.
>
> However, I've been looking at a problem one of my users is having with
> MVAPICH2 1.6. His code fails because an MPI-IO routine attempts to allocate
> a 4Gb buffer for every process.
>
> I've traced the problem ADIOI_LUSTRE_Get_striping_info, in file
> mvapich2-1.6/src/mpi/romio/adio/ad_lustre/ad_lustre_aggregate.c, where it
> contains this (from line 304):
>
>    if (avail_cb_nodes ==  CO_nodes) {
>        do {
>            /* find the divisor of CO_nodes */
>            divisor = 1;
>            do {
>                divisor ++;
>            } while (CO_nodes % divisor);
>            CO_nodes = CO_nodes / divisor;
>            /* if stripe_count*CO is a prime number, change nothing */
>            if ((CO_nodes <= avail_cb_nodes) && (CO_nodes != 1)) {
>                avail_cb_nodes = CO_nodes;
>                break;
>            }
>        } while (CO_nodes != 1);
>    }
>
> When his program enters this segment, both avail_cb_nodes and CO_nodes equal
> 1. Very bad things then happen in the innermost while loop, as it attempts
> and fails to count to infinity.
>
> This code seems to have been replaced in the mpich2 codebase (I think this
> is the upstream?) in r6323 by Pascal Deveze. I don't know what
> avail_cb_nodes *should* be set to, but a quick check shows that "1" looks a
> whole lot more sensible than the "4294967295" I'm getting now.
>
> Is anyone looking at refreshing the lustre ROMIO driver in MVAPICH2?
>
> How does the upstream/downstream business work with ROMIO anyway? From the
> outside, it looks like MPICH2, MVAPICH2 and OpenMPI (for example), maintain
> their own separate copies.
>
> Thanks,
>
> Mark
> --
> -----------------------------------------------------------------
> Mark Dixon                       Email    : m.c.dixon at leeds.ac.uk
> HPC/Grid Systems Support         Tel (int): 35429
> Information Systems Services     Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -----------------------------------------------------------------
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list