[mvapich-discuss] MVAPICH2 2.1a: Code stalls on Sandy Bridge, works on Westmere
Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
matthew.thompson at nasa.gov
Wed Jan 7 09:15:20 EST 2015
Hari, et al,
Our admins installed MVAPICH2 2.1rc1 for us and it shows the same hang.
So it looks like whatever is happening was not corrected/changed between
2.1a1 and 2.1rc1.
Matt
On 01/06/2015 09:58 AM, Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND
APPLICATIONS INC] wrote:
> Hari,
>
> First, this is MVAPICH2 2.1a we are using. I've asked the admins to
> install MVAPICH2 2.1rc1 for us, so I'll update with more info when that
> occurs.
>
> For this experiment, the hanging jobs I've been experimenting with are
> 96 processes, 16-procsx6-nodes on Sandy Bridge, 12x8 on Westmere.
>
> As for building MVAPICH2, we used:
>
>> (1098) $ mpiname -a
>> MVAPICH2 2.1a Sun Sep 21 12:00:00 EDT 2014 ch3:mrail
>>
>> Compilation
>> CC: icc -fpic -m64 -DNDEBUG -DNVALGRIND -O2
>> CXX: icpc -fpic -m64 -DNDEBUG -DNVALGRIND -O2
>> F77: ifort -L/lib -L/lib -m64 -fpic -O2
>> FC: ifort -m64 -fpic -O2
>>
>> Configuration
>> --without-cma --disable-wrapper-rpath --with-device=ch3:mrail
> --with-rdma=gen2 CC=icc CXX=icpc F77=ifort FC=ifort CFLAGS=-fpic -m64
> CXXFLAGS=-fpic -m64 FFLAGS=-m64 -fpic FCFLAGS=-m64 -fpic --enable-f77
> --enable-fc --enable-cxx --enable-romio --enable-threads=default
> --with-hwloc -disable-multi-aliases -enable-xrc=yes -enable-hybrid
> --prefix=/usr/local/other/SLES11.1/mvapich2/2.1a/intel-13.1.2.183
>
> As you can see, we compile with --without-cma because we must. MVAPICH2
> seems to assume that all supercomputers are now running Linux 3.2 or
> higher as, I think, that is when Cross Memory Attach was added. (The
> first time we tried to compile MVAPICH2 where CMA was on by default it
> failed quite excitingly due to missing kernel modules.)
>
> Sadly, discover at NCCS is running SLES 11 SP1 which is Linux
> 2.6.32.54-0.3-default. Even after an upcoming upgrade to SLES 11 SP3, I
> think we'll only be running Linux 3.0 or so, if NAS-pleiades is any
> indication; though, perhaps, SuSE has backported CMA?
>
> Matt
>
> On 01/06/2015 09:35 AM, Hari Subramoni wrote:
>> Hello Matt,
>>
>> Sorry to hear that you're seeing issues with MVAPICH2-2.1rc1. Could you
>> please give us some more information about the experimental setup like
>> number of processes, number of nodes, processes per node as well as the
>> config flags and compilers used to build MVAPICH2? This will enable us
>> to debug the issue further.
>>
>> Are you using CMA here? If not, could you please try using CMA
>> (MV2_SMP_USE_CMA=1) to see if the hang goes away?
>>
>> Regards,
>> Hari.
>>
>> On Mon, Jan 5, 2015 at 12:37 PM, Thompson, Matt (GSFC-610.1)[SCIENCE
>> SYSTEMS AND APPLICATIONS INC] <matthew.thompson at nasa.gov
>> <mailto:matthew.thompson at nasa.gov>> wrote:
>>
>> All,
>>
>> I'm trying to diagnose an issue that is appearing in a model I work
>> on: GEOS-5. The problem seems to be architecture-dependent and, most
>> likely, due to MVAPICH2 (as the same code compiled with Intel MPI 5
>> and the same Fortran compiler seems to have no problem).
>>
>> I can try to go into more detail (for example if I start adding
>> print statements to find the stall, it can sometimes cure it!), but
>> my first question is:
>>
>> Are there environment variables that control
>> architecture-dependent
>> behaviour of MVAPICH2?
>>
>> I ask because I saw in the recent MVAPICH2 2.1rc1 announcement:
>>
>> (NEW) MVAPICH2 2.1rc1 (based on MPICH 3.1.3) with ...
>> *optimization and tuning for Haswell architecture*
>>
>> (I tried searching the User's Guide for "Haswell", but no luck.
>> Could you point me to possible switches?)
>>
>> Note, also, that this could also not be due to Westmere/Sandy Bridge
>> tuning, but to the underlying fabric. Here at NCCS, the Westmeres, I
>> believe, are on DDR interconnects while the Sandy Bridges I was
>> using are on FDR (which, I think, is actually connected to a QDR
>> main switch) and some are on QDR.
>>
>> If I turn on MV2_SHOW_ENV_INFO=2, I see these differences (left,
>> Sandy; right, Westmere):
>>
>> PROCESSOR ARCH NAME : MV2_ARCH_INTEL_XEON_E5_2670_16 |
>> PROCESSOR ARCH NAME : MV2_ARCH_INTEL_XEON_X5650_12
>> PROCESSOR MODEL NUMBER : 45 |
>> PROCESSOR MODEL NUMBER : 44
>> HCA NAME : MV2_HCA_MLX_CX_FDR |
>> HCA NAME : MV2_HCA_MLX_CX_DDR
>> MV2_RDMA_FAST_PATH_BUF_SIZE : 5120 |
>> MV2_RDMA_FAST_PATH_BUF_SIZE : 9216
>> MV2_EAGERSIZE_1SC : 8192 |
>> MV2_EAGERSIZE_1SC : 4096
>> MV2_SMP_EAGERSIZE : 32769 |
>> MV2_SMP_EAGERSIZE : 65537
>> MV2_SMPI_LENGTH_QUEUE : 131072 |
>> MV2_SMPI_LENGTH_QUEUE : 262144
>> MV2_SMP_NUM_SEND_BUFFER : 16 |
>> MV2_SMP_NUM_SEND_BUFFER : 32
>> MPISPAWN_MPIRUN_HOST : borg01y001 |
>> MPISPAWN_MPIRUN_HOST : borgi117
>> MPISPAWN_MPIRUN_ID : 21662 |
>> MPISPAWN_MPIRUN_ID : 23359
>> MPISPAWN_NNODES : 6 |
>> MPISPAWN_NNODES : 8
>> PMI_PORT : borg01y001:44036 |
>> PMI_PORT : borgi117:37003
>> MV2_DEFAULT_MTU : 4 |
>> MV2_DEFAULT_MTU : 3
>> MV2_DEFAULT_PKEY : 393216 |
>> MV2_DEFAULT_PKEY : 524288
>> MV2_NUM_NODES_IN_JOB : 6 |
>> MV2_NUM_NODES_IN_JOB : 8
>>
>>
>> Now some of these can be ignored (MPISPAWN, PROCESSOR, etc.), but of
>> the MV2_ flag differences here, there is an opportunity.
>>
>> Some testing showed that if we set:
>>
>> MV2_SMP_NUM_SEND_BUFFER=32
>>
>> on the Sandy Bridge, the issue was avoided. Huzzah, right? Well,
>> when an end-user tried it...it hanged for him at some point.
>> So...yeah. Should I perhaps use all 5 settings from the DDR run?
>>
>> Any ideas from the experts on why IMPI 5 would not be affected in
>> the same situation?
>>
>> Matt
>>
>> --
>> Matt Thompson SSAI, Sr Software Test Engr
>> NASA GSFC, Global Modeling and Assimilation Office
>> Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
>> Phone: 301-614-6712 <tel:301-614-6712> Fax:
>> 301-614-6246 <tel:301-614-6246>
>> _________________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-__state.edu
>> <mailto:mvapich-discuss at cse.ohio-state.edu>
>>
>> http://mailman.cse.ohio-state.__edu/mailman/listinfo/mvapich-__discuss
>> <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>>
>>
>
>
--
Matt Thompson SSAI, Sr Software Test Engr
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712 Fax: 301-614-6246
More information about the mvapich-discuss
mailing list