[mvapich-discuss] MVAPICH2 2.1a: Code stalls on Sandy Bridge, works on Westmere

Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson at nasa.gov
Wed Jan 7 09:15:20 EST 2015


Hari, et al,

Our admins installed MVAPICH2 2.1rc1 for us and it shows the same hang. 
So it looks like whatever is happening was not corrected/changed between 
2.1a1 and 2.1rc1.

Matt

On 01/06/2015 09:58 AM, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] wrote:
> Hari,
>
> First, this is MVAPICH2 2.1a we are using. I've asked the admins to
> install MVAPICH2 2.1rc1 for us, so I'll update with more info when that
> occurs.
>
> For this experiment, the hanging jobs I've been experimenting with are
> 96 processes, 16-procsx6-nodes on Sandy Bridge, 12x8 on Westmere.
>
> As for building MVAPICH2, we used:
>
>> (1098) $ mpiname -a
>> MVAPICH2 2.1a Sun Sep 21 12:00:00 EDT 2014 ch3:mrail
>>
>> Compilation
>> CC: icc -fpic -m64   -DNDEBUG -DNVALGRIND -O2
>> CXX: icpc -fpic -m64  -DNDEBUG -DNVALGRIND -O2
>> F77: ifort -L/lib -L/lib -m64 -fpic  -O2
>> FC: ifort -m64 -fpic  -O2
>>
>> Configuration
>> --without-cma --disable-wrapper-rpath --with-device=ch3:mrail
> --with-rdma=gen2 CC=icc CXX=icpc F77=ifort FC=ifort CFLAGS=-fpic -m64
> CXXFLAGS=-fpic -m64 FFLAGS=-m64 -fpic FCFLAGS=-m64 -fpic --enable-f77
> --enable-fc --enable-cxx --enable-romio --enable-threads=default
> --with-hwloc -disable-multi-aliases -enable-xrc=yes -enable-hybrid
> --prefix=/usr/local/other/SLES11.1/mvapich2/2.1a/intel-13.1.2.183
>
> As you can see, we compile with --without-cma because we must. MVAPICH2
> seems to assume that all supercomputers are now running Linux 3.2 or
> higher as, I think, that is when Cross Memory Attach was added. (The
> first time we tried to compile MVAPICH2 where CMA was on by default it
> failed quite excitingly due to missing kernel modules.)
>
> Sadly, discover at NCCS is running SLES 11 SP1 which is Linux
> 2.6.32.54-0.3-default. Even after an upcoming upgrade to SLES 11 SP3, I
> think we'll only be running Linux 3.0 or so, if NAS-pleiades is any
> indication; though, perhaps, SuSE has backported CMA?
>
> Matt
>
> On 01/06/2015 09:35 AM, Hari Subramoni wrote:
>> Hello Matt,
>>
>> Sorry to hear that you're seeing issues with MVAPICH2-2.1rc1. Could you
>> please give us some more information about the experimental setup like
>> number of processes, number of nodes, processes per node as well as the
>> config flags and compilers used to build MVAPICH2? This will enable us
>> to debug the issue further.
>>
>> Are you using CMA here? If not, could you please try using CMA
>> (MV2_SMP_USE_CMA=1) to see if the hang goes away?
>>
>> Regards,
>> Hari.
>>
>> On Mon, Jan 5, 2015 at 12:37 PM, Thompson, Matt (GSFC-610.1)[SCIENCE
>> SYSTEMS AND APPLICATIONS INC] <matthew.thompson at nasa.gov
>> <mailto:matthew.thompson at nasa.gov>> wrote:
>>
>>     All,
>>
>>     I'm trying to diagnose an issue that is appearing in a model I work
>>     on: GEOS-5. The problem seems to be architecture-dependent and, most
>>     likely, due to MVAPICH2 (as the same code compiled with Intel MPI 5
>>     and the same Fortran compiler seems to have no problem).
>>
>>     I can try to go into more detail (for example if I start adding
>>     print statements to find the stall, it can sometimes cure it!), but
>>     my first question is:
>>
>>        Are there environment variables that control
>> architecture-dependent
>>        behaviour of MVAPICH2?
>>
>>     I ask because I saw in the recent MVAPICH2 2.1rc1 announcement:
>>
>>        (NEW) MVAPICH2 2.1rc1 (based on MPICH 3.1.3) with ...
>>         *optimization and tuning for Haswell architecture*
>>
>>     (I tried searching the User's Guide for "Haswell", but no luck.
>>     Could you point me to possible switches?)
>>
>>     Note, also, that this could also not be due to Westmere/Sandy Bridge
>>     tuning, but to the underlying fabric. Here at NCCS, the Westmeres, I
>>     believe, are on DDR interconnects while the Sandy Bridges I was
>>     using are on FDR (which, I think, is actually connected to a QDR
>>     main switch) and some are on QDR.
>>
>>     If I turn on MV2_SHOW_ENV_INFO=2, I see these differences (left,
>>     Sandy; right, Westmere):
>>
>>         PROCESSOR ARCH NAME         : MV2_ARCH_INTEL_XEON_E5_2670_16 |
>>         PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_XEON_X5650_12
>>         PROCESSOR MODEL NUMBER      : 45                              |
>>         PROCESSOR MODEL NUMBER         : 44
>>         HCA NAME                    : MV2_HCA_MLX_CX_FDR              |
>>         HCA NAME                       : MV2_HCA_MLX_CX_DDR
>>         MV2_RDMA_FAST_PATH_BUF_SIZE : 5120                            |
>>         MV2_RDMA_FAST_PATH_BUF_SIZE    : 9216
>>         MV2_EAGERSIZE_1SC           : 8192                            |
>>         MV2_EAGERSIZE_1SC              : 4096
>>         MV2_SMP_EAGERSIZE           : 32769                           |
>>         MV2_SMP_EAGERSIZE              : 65537
>>         MV2_SMPI_LENGTH_QUEUE       : 131072                          |
>>         MV2_SMPI_LENGTH_QUEUE          : 262144
>>         MV2_SMP_NUM_SEND_BUFFER     : 16                              |
>>         MV2_SMP_NUM_SEND_BUFFER        : 32
>>         MPISPAWN_MPIRUN_HOST        : borg01y001                      |
>>         MPISPAWN_MPIRUN_HOST           : borgi117
>>         MPISPAWN_MPIRUN_ID          : 21662                           |
>>         MPISPAWN_MPIRUN_ID             : 23359
>>         MPISPAWN_NNODES             : 6                       |
>>         MPISPAWN_NNODES                : 8
>>         PMI_PORT                    : borg01y001:44036                |
>>         PMI_PORT                       : borgi117:37003
>>         MV2_DEFAULT_MTU             : 4                       |
>>         MV2_DEFAULT_MTU                : 3
>>         MV2_DEFAULT_PKEY            : 393216                          |
>>         MV2_DEFAULT_PKEY               : 524288
>>         MV2_NUM_NODES_IN_JOB        : 6                       |
>>         MV2_NUM_NODES_IN_JOB           : 8
>>
>>
>>     Now some of these can be ignored (MPISPAWN, PROCESSOR, etc.), but of
>>     the MV2_ flag differences here, there is an opportunity.
>>
>>     Some testing showed that if we set:
>>
>>         MV2_SMP_NUM_SEND_BUFFER=32
>>
>>     on the Sandy Bridge, the issue was avoided. Huzzah, right? Well,
>>     when an end-user tried it...it hanged for him at some point.
>>     So...yeah. Should I perhaps use all 5 settings from the DDR run?
>>
>>     Any ideas from the experts on why IMPI 5 would not be affected in
>>     the same situation?
>>
>>     Matt
>>
>>     --
>>     Matt Thompson          SSAI, Sr Software Test Engr
>>     NASA GSFC, Global Modeling and Assimilation Office
>>     Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
>>     Phone: 301-614-6712 <tel:301-614-6712>              Fax:
>>     301-614-6246 <tel:301-614-6246>
>>     _________________________________________________
>>     mvapich-discuss mailing list
>>     mvapich-discuss at cse.ohio-__state.edu
>>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>>
>> http://mailman.cse.ohio-state.__edu/mailman/listinfo/mvapich-__discuss
>>     <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>>
>>
>
>


-- 
Matt Thompson          SSAI, Sr Software Test Engr
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712              Fax: 301-614-6246


More information about the mvapich-discuss mailing list