[mvapich-discuss] MVAPICH2 2.1a: Code stalls on Sandy Bridge, works on Westmere

Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson at nasa.gov
Tue Jan 6 09:58:29 EST 2015


Hari,

First, this is MVAPICH2 2.1a we are using. I've asked the admins to 
install MVAPICH2 2.1rc1 for us, so I'll update with more info when that 
occurs.

For this experiment, the hanging jobs I've been experimenting with are 
96 processes, 16-procsx6-nodes on Sandy Bridge, 12x8 on Westmere.

As for building MVAPICH2, we used:

> (1098) $ mpiname -a
> MVAPICH2 2.1a Sun Sep 21 12:00:00 EDT 2014 ch3:mrail
>
> Compilation
> CC: icc -fpic -m64   -DNDEBUG -DNVALGRIND -O2
> CXX: icpc -fpic -m64  -DNDEBUG -DNVALGRIND -O2
> F77: ifort -L/lib -L/lib -m64 -fpic  -O2
> FC: ifort -m64 -fpic  -O2
>
> Configuration
> --without-cma --disable-wrapper-rpath --with-device=ch3:mrail
--with-rdma=gen2 CC=icc CXX=icpc F77=ifort FC=ifort CFLAGS=-fpic -m64
CXXFLAGS=-fpic -m64 FFLAGS=-m64 -fpic FCFLAGS=-m64 -fpic --enable-f77
--enable-fc --enable-cxx --enable-romio --enable-threads=default
--with-hwloc -disable-multi-aliases -enable-xrc=yes -enable-hybrid
--prefix=/usr/local/other/SLES11.1/mvapich2/2.1a/intel-13.1.2.183

As you can see, we compile with --without-cma because we must. MVAPICH2 
seems to assume that all supercomputers are now running Linux 3.2 or 
higher as, I think, that is when Cross Memory Attach was added. (The 
first time we tried to compile MVAPICH2 where CMA was on by default it 
failed quite excitingly due to missing kernel modules.)

Sadly, discover at NCCS is running SLES 11 SP1 which is Linux 
2.6.32.54-0.3-default. Even after an upcoming upgrade to SLES 11 SP3, I 
think we'll only be running Linux 3.0 or so, if NAS-pleiades is any 
indication; though, perhaps, SuSE has backported CMA?

Matt

On 01/06/2015 09:35 AM, Hari Subramoni wrote:
> Hello Matt,
>
> Sorry to hear that you're seeing issues with MVAPICH2-2.1rc1. Could you
> please give us some more information about the experimental setup like
> number of processes, number of nodes, processes per node as well as the
> config flags and compilers used to build MVAPICH2? This will enable us
> to debug the issue further.
>
> Are you using CMA here? If not, could you please try using CMA
> (MV2_SMP_USE_CMA=1) to see if the hang goes away?
>
> Regards,
> Hari.
>
> On Mon, Jan 5, 2015 at 12:37 PM, Thompson, Matt (GSFC-610.1)[SCIENCE
> SYSTEMS AND APPLICATIONS INC] <matthew.thompson at nasa.gov
> <mailto:matthew.thompson at nasa.gov>> wrote:
>
>     All,
>
>     I'm trying to diagnose an issue that is appearing in a model I work
>     on: GEOS-5. The problem seems to be architecture-dependent and, most
>     likely, due to MVAPICH2 (as the same code compiled with Intel MPI 5
>     and the same Fortran compiler seems to have no problem).
>
>     I can try to go into more detail (for example if I start adding
>     print statements to find the stall, it can sometimes cure it!), but
>     my first question is:
>
>        Are there environment variables that control architecture-dependent
>        behaviour of MVAPICH2?
>
>     I ask because I saw in the recent MVAPICH2 2.1rc1 announcement:
>
>        (NEW) MVAPICH2 2.1rc1 (based on MPICH 3.1.3) with ...
>         *optimization and tuning for Haswell architecture*
>
>     (I tried searching the User's Guide for "Haswell", but no luck.
>     Could you point me to possible switches?)
>
>     Note, also, that this could also not be due to Westmere/Sandy Bridge
>     tuning, but to the underlying fabric. Here at NCCS, the Westmeres, I
>     believe, are on DDR interconnects while the Sandy Bridges I was
>     using are on FDR (which, I think, is actually connected to a QDR
>     main switch) and some are on QDR.
>
>     If I turn on MV2_SHOW_ENV_INFO=2, I see these differences (left,
>     Sandy; right, Westmere):
>
>         PROCESSOR ARCH NAME         : MV2_ARCH_INTEL_XEON_E5_2670_16 |
>         PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_XEON_X5650_12
>         PROCESSOR MODEL NUMBER      : 45                              |
>         PROCESSOR MODEL NUMBER         : 44
>         HCA NAME                    : MV2_HCA_MLX_CX_FDR              |
>         HCA NAME                       : MV2_HCA_MLX_CX_DDR
>         MV2_RDMA_FAST_PATH_BUF_SIZE : 5120                            |
>         MV2_RDMA_FAST_PATH_BUF_SIZE    : 9216
>         MV2_EAGERSIZE_1SC           : 8192                            |
>         MV2_EAGERSIZE_1SC              : 4096
>         MV2_SMP_EAGERSIZE           : 32769                           |
>         MV2_SMP_EAGERSIZE              : 65537
>         MV2_SMPI_LENGTH_QUEUE       : 131072                          |
>         MV2_SMPI_LENGTH_QUEUE          : 262144
>         MV2_SMP_NUM_SEND_BUFFER     : 16                              |
>         MV2_SMP_NUM_SEND_BUFFER        : 32
>         MPISPAWN_MPIRUN_HOST        : borg01y001                      |
>         MPISPAWN_MPIRUN_HOST           : borgi117
>         MPISPAWN_MPIRUN_ID          : 21662                           |
>         MPISPAWN_MPIRUN_ID             : 23359
>         MPISPAWN_NNODES             : 6                       |
>         MPISPAWN_NNODES                : 8
>         PMI_PORT                    : borg01y001:44036                |
>         PMI_PORT                       : borgi117:37003
>         MV2_DEFAULT_MTU             : 4                       |
>         MV2_DEFAULT_MTU                : 3
>         MV2_DEFAULT_PKEY            : 393216                          |
>         MV2_DEFAULT_PKEY               : 524288
>         MV2_NUM_NODES_IN_JOB        : 6                       |
>         MV2_NUM_NODES_IN_JOB           : 8
>
>
>     Now some of these can be ignored (MPISPAWN, PROCESSOR, etc.), but of
>     the MV2_ flag differences here, there is an opportunity.
>
>     Some testing showed that if we set:
>
>         MV2_SMP_NUM_SEND_BUFFER=32
>
>     on the Sandy Bridge, the issue was avoided. Huzzah, right? Well,
>     when an end-user tried it...it hanged for him at some point.
>     So...yeah. Should I perhaps use all 5 settings from the DDR run?
>
>     Any ideas from the experts on why IMPI 5 would not be affected in
>     the same situation?
>
>     Matt
>
>     --
>     Matt Thompson          SSAI, Sr Software Test Engr
>     NASA GSFC, Global Modeling and Assimilation Office
>     Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
>     Phone: 301-614-6712 <tel:301-614-6712>              Fax:
>     301-614-6246 <tel:301-614-6246>
>     _________________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-__state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.__edu/mailman/listinfo/mvapich-__discuss
>     <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>
>


-- 
Matt Thompson          SSAI, Sr Software Test Engr
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712              Fax: 301-614-6246


More information about the mvapich-discuss mailing list