[mvapich-discuss] MVAPICH2 2.1a: Code stalls on Sandy Bridge, works on Westmere

Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC] matthew.thompson at nasa.gov
Mon Jan 5 12:37:19 EST 2015


All,

I'm trying to diagnose an issue that is appearing in a model I work on: 
GEOS-5. The problem seems to be architecture-dependent and, most likely, 
due to MVAPICH2 (as the same code compiled with Intel MPI 5 and the same 
Fortran compiler seems to have no problem).

I can try to go into more detail (for example if I start adding print 
statements to find the stall, it can sometimes cure it!), but my first 
question is:

   Are there environment variables that control architecture-dependent
   behaviour of MVAPICH2?

I ask because I saw in the recent MVAPICH2 2.1rc1 announcement:

   (NEW) MVAPICH2 2.1rc1 (based on MPICH 3.1.3) with ...
    *optimization and tuning for Haswell architecture*

(I tried searching the User's Guide for "Haswell", but no luck. Could 
you point me to possible switches?)

Note, also, that this could also not be due to Westmere/Sandy Bridge 
tuning, but to the underlying fabric. Here at NCCS, the Westmeres, I 
believe, are on DDR interconnects while the Sandy Bridges I was using 
are on FDR (which, I think, is actually connected to a QDR main switch) 
and some are on QDR.

If I turn on MV2_SHOW_ENV_INFO=2, I see these differences (left, Sandy; 
right, Westmere):

>PROCESSOR ARCH NAME         : MV2_ARCH_INTEL_XEON_E5_2670_16 |	PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_XEON_X5650_12
>PROCESSOR MODEL NUMBER      : 45			      |	PROCESSOR MODEL NUMBER         : 44
>HCA NAME                    : MV2_HCA_MLX_CX_FDR	      |	HCA NAME                       : MV2_HCA_MLX_CX_DDR
>MV2_RDMA_FAST_PATH_BUF_SIZE : 5120			      |	MV2_RDMA_FAST_PATH_BUF_SIZE    : 9216
>MV2_EAGERSIZE_1SC           : 8192			      |	MV2_EAGERSIZE_1SC              : 4096
>MV2_SMP_EAGERSIZE           : 32769			      |	MV2_SMP_EAGERSIZE              : 65537
>MV2_SMPI_LENGTH_QUEUE       : 131072		              |	MV2_SMPI_LENGTH_QUEUE          : 262144
>MV2_SMP_NUM_SEND_BUFFER     : 16			      |	MV2_SMP_NUM_SEND_BUFFER        : 32
>MPISPAWN_MPIRUN_HOST        : borg01y001		      |	MPISPAWN_MPIRUN_HOST           : borgi117
>MPISPAWN_MPIRUN_ID          : 21662			      |	MPISPAWN_MPIRUN_ID             : 23359
>MPISPAWN_NNODES             : 6			      |	MPISPAWN_NNODES                : 8
>PMI_PORT                    : borg01y001:44036	              |	PMI_PORT                       : borgi117:37003
>MV2_DEFAULT_MTU             : 4			      |	MV2_DEFAULT_MTU                : 3
>MV2_DEFAULT_PKEY            : 393216		              |	MV2_DEFAULT_PKEY               : 524288
>MV2_NUM_NODES_IN_JOB        : 6			      |	MV2_NUM_NODES_IN_JOB           : 8

Now some of these can be ignored (MPISPAWN, PROCESSOR, etc.), but of the 
MV2_ flag differences here, there is an opportunity.

Some testing showed that if we set:

    MV2_SMP_NUM_SEND_BUFFER=32

on the Sandy Bridge, the issue was avoided. Huzzah, right? Well, when an 
end-user tried it...it hanged for him at some point. So...yeah. Should I 
perhaps use all 5 settings from the DDR run?

Any ideas from the experts on why IMPI 5 would not be affected in the 
same situation?

Matt

-- 
Matt Thompson          SSAI, Sr Software Test Engr
NASA GSFC, Global Modeling and Assimilation Office
Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
Phone: 301-614-6712              Fax: 301-614-6246


More information about the mvapich-discuss mailing list