[mvapich-discuss] Request to detect need for --with-ch3-rank-bits=32

Jonathan Perkins perkinjo at cse.ohio-state.edu
Tue Sep 6 21:06:41 EDT 2016


Hi Adam, thanks for this note. We've applied a fix that addresses this
issue in the MV2 v2.2rc2 release. Let us know if you're still facing any
issues related to this.

On Tue, Sep 6, 2016, 8:28 PM Adam T. Moody <moody20 at llnl.gov> wrote:

> Hi Jonathan,
> I didn't have a chance to look at the code.  Does MV2-2.2rc2 now include
> a catch for this problem?
> Thanks,
> -Adam
>
>
> On 06/16/2016 12:20 PM, Jonathan Perkins wrote:
> > Hi Adam, that sounds like a good idea.  We'll take a look into it and
> check
> > with MPICH as well.
> >
> > On Thu, Jun 16, 2016 at 2:17 PM Adam T. Moody <moody20 at llnl.gov> wrote:
> >
> >> Hi MVAPICH team,
> >> We've got a new system that we're bringing online with +80k cores.  I
> >> hit a hang in the first call to MPI_Gather in mpiBench above a certain
> >> node count.  After a binary search, I found that things ran fine at
> >> 32768 procs but hang at 32769 procs or larger.  This suggested we were
> >> overflowing some bit field, and that led me to CH3_RANK_BITS which
> >> apparently defaults to 16 bits unless you throw the
> >> --with-ch3-rank-bits=32 flag during configure.
> >>
> >> I think it's fine to default to 16 bits here, since most users will not
> >> need the larger rank count, and I'm guessing there could be some
> >> performance penalty when using 32 bits (if not, perhaps just bump the
> >> default to 32).
> >>
> >> It would be helpful to detect this problem in MPI and throw a fatal
> >> error pointing users to the option.  Would you please add a patch like
> >> the following:
> >>
> >> #include mpichconf.h
> >>
> >> #if CH3_RANK_BITS == 16
> >> if (numprocs > 32768) {
> >>    // inform user about --with-ch3-rank-bits=32 configure option
> >>    // bail out with fatal error (it's not going to work anyway)
> >> }
> >> #endif
> >>
> >> This could go in MPI_Init, or to handle dynamic proc support, it should
> >> go into comm creation.
> >>
> >> If the upstream MPICH does not already have something like this, let's
> >> also elevate this request up the chain.
> >> Thanks!
> >> -Adam
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160907/804679c5/attachment.html>


More information about the mvapich-discuss mailing list