[mvapich-discuss] Request to detect need for --with-ch3-rank-bits=32
Adam T. Moody
moody20 at llnl.gov
Thu Jun 16 14:16:42 EDT 2016
Hi MVAPICH team,
We've got a new system that we're bringing online with +80k cores. I
hit a hang in the first call to MPI_Gather in mpiBench above a certain
node count. After a binary search, I found that things ran fine at
32768 procs but hang at 32769 procs or larger. This suggested we were
overflowing some bit field, and that led me to CH3_RANK_BITS which
apparently defaults to 16 bits unless you throw the
--with-ch3-rank-bits=32 flag during configure.
I think it's fine to default to 16 bits here, since most users will not
need the larger rank count, and I'm guessing there could be some
performance penalty when using 32 bits (if not, perhaps just bump the
default to 32).
It would be helpful to detect this problem in MPI and throw a fatal
error pointing users to the option. Would you please add a patch like
the following:
#include mpichconf.h
#if CH3_RANK_BITS == 16
if (numprocs > 32768) {
// inform user about --with-ch3-rank-bits=32 configure option
// bail out with fatal error (it's not going to work anyway)
}
#endif
This could go in MPI_Init, or to handle dynamic proc support, it should
go into comm creation.
If the upstream MPICH does not already have something like this, let's
also elevate this request up the chain.
Thanks!
-Adam
More information about the mvapich-discuss
mailing list