[mvapich-discuss] Request to detect need for --with-ch3-rank-bits=32

Adam T. Moody moody20 at llnl.gov
Thu Jun 16 14:16:42 EDT 2016

Hi MVAPICH team,
We've got a new system that we're bringing online with +80k cores.  I 
hit a hang in the first call to MPI_Gather in mpiBench above a certain 
node count.  After a binary search, I found that things ran fine at 
32768 procs but hang at 32769 procs or larger.  This suggested we were 
overflowing some bit field, and that led me to CH3_RANK_BITS which 
apparently defaults to 16 bits unless you throw the 
--with-ch3-rank-bits=32 flag during configure.

I think it's fine to default to 16 bits here, since most users will not 
need the larger rank count, and I'm guessing there could be some 
performance penalty when using 32 bits (if not, perhaps just bump the 
default to 32).

It would be helpful to detect this problem in MPI and throw a fatal 
error pointing users to the option.  Would you please add a patch like 
the following:

#include mpichconf.h

#if CH3_RANK_BITS == 16
if (numprocs > 32768) {
  // inform user about --with-ch3-rank-bits=32 configure option
  // bail out with fatal error (it's not going to work anyway)

This could go in MPI_Init, or to handle dynamic proc support, it should 
go into comm creation.

If the upstream MPICH does not already have something like this, let's 
also elevate this request up the chain.

More information about the mvapich-discuss mailing list