[mvapich-discuss] (no subject)

Tue Feb 23 15:04:33 EST 2016

Hi Hari,

Thank you for your fast reply!

I picked a healthy node (identical hw) and getting the same env from 
both (copied below). OSU bw tests are running 50% slower on the problem 
node. This is really puzzling.

First the 'tail' for results:

Bad node:
=====================
1048576              4464.94
2097152              4473.40
4194304              4182.60

Healthy node:
=====================

1048576              8688.42
2097152              8662.93
4194304              8440.66

Then, the details you requested:

$ mpiname -a
MVAPICH2 2.1rc1 Thu Dec 18 20:00:00 EDT 2014 ch3:mrail

Compilation
CC: icc    -DNDEBUG -DNVALGRIND -O2
CXX: icpc   -DNDEBUG -DNVALGRIND -O2
F77: ifort -L/lib -L/lib   -O2
FC: ifort   -O2

Configuration
--prefix=/usr/local/pacerepov1/mvapich2/2.1/intel-15.0/ --with-hwloc 
--enable-romio --with-file-system=ufs+nfs --enable-shared 
--enable-sharedlibs=gcc

$ mpirun -np 2 -env MV2_ENABLE_AFFINITY=1 -env MV2_SHOW_CPU_BINDING=1 
-env MV2_SHOW_ENV_INFO=2 osu_bw

(the copied part is identical for both nodes, diff'ed to confirm)

  MVAPICH2-2.1rc1 Parameters
---------------------------------------------------------------------
     PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_XEON_E5_2670_16
     PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
     PROCESSOR MODEL NUMBER         : 45
     HCA NAME                       : MV2_HCA_MLX_CX_QDR
     HETEROGENEOUS HCA              : NO
     MV2_EAGERSIZE_1SC              : 0
     MV2_SMP_EAGERSIZE              : 32769
     MV2_SMPI_LENGTH_QUEUE          : 131072
     MV2_SMP_NUM_SEND_BUFFER        : 16
     MV2_SMP_BATCH_SIZE             : 8
---------------------------------------------------------------------

  MVAPICH2 All Parameters
     MV2_COMM_WORLD_LOCAL_RANK           : 0
     MPIRUN_RSH_LAUNCH                   : 0
     MV2_3DTORUS_SUPPORT                 : 0
     MV2_NUM_SA_QUERY_RETRIES            : 20
     MV2_NUM_SLS                         : 8
     MV2_DEFAULT_SERVICE_LEVEL           : 0
     MV2_PATH_SL_QUERY                   : 0
     MV2_USE_QOS                         : 0
     MV2_ALLGATHER_BRUCK_THRESHOLD       : 524288
     MV2_ALLGATHER_RD_THRESHOLD          : 81920
     MV2_ALLGATHER_REVERSE_RANKING       : 1
     MV2_ALLGATHERV_RD_THRESHOLD         : 0
     MV2_ALLREDUCE_2LEVEL_MSG            : 262144
     MV2_ALLREDUCE_SHORT_MSG             : 2048
     MV2_ALLTOALL_MEDIUM_MSG             : 16384
     MV2_ALLTOALL_SMALL_MSG              : 2048
     MV2_ALLTOALL_THROTTLE_FACTOR        : 4
     MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE     : 64
     MV2_GATHER_SWITCH_PT                : 0
     MV2_INTRA_SHMEM_REDUCE_MSG          : 2048
     MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
     MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
     MV2_KNOMIAL_INTER_LEADER_THRESHOLD  : 65536
     MV2_KNOMIAL_INTER_NODE_FACTOR       : 4
     MV2_KNOMIAL_INTRA_NODE_FACTOR       : 4
     MV2_KNOMIAL_INTRA_NODE_THRESHOLD    : 131072
     MV2_RED_SCAT_LARGE_MSG              : 524288
     MV2_RED_SCAT_SHORT_MSG              : 64
     MV2_REDUCE_2LEVEL_MSG               : 16384
     MV2_REDUCE_SHORT_MSG                : 8192
     MV2_SCATTER_MEDIUM_MSG              : 0
     MV2_SCATTER_SMALL_MSG               : 0
     MV2_SHMEM_ALLREDUCE_MSG             : 32768
     MV2_SHMEM_COLL_MAX_MSG_SIZE         : 131072
     MV2_SHMEM_COLL_NUM_COMM             : 8
     MV2_SHMEM_COLL_NUM_PROCS            : 2
     MV2_SHMEM_COLL_SPIN_COUNT           : 5
     MV2_SHMEM_REDUCE_MSG                : 4096
     MV2_USE_BCAST_SHORT_MSG             : 16384
     MV2_USE_DIRECT_GATHER               : 1
     MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
     MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
     MV2_USE_DIRECT_SCATTER              : 1
     MV2_USE_OSU_COLLECTIVES             : 1
     MV2_USE_OSU_NB_COLLECTIVES          : 1
     MV2_USE_KNOMIAL_2LEVEL_BCAST        : 1
     MV2_USE_KNOMIAL_INTER_LEADER_BCAST  : 1
     MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
     MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
     MV2_USE_SHMEM_ALLREDUCE             : 1
     MV2_USE_SHMEM_BARRIER               : 1
     MV2_USE_SHMEM_BCAST                 : 1
     MV2_USE_SHMEM_COLL                  : 1
     MV2_USE_SHMEM_REDUCE                : 1
     MV2_USE_TWO_LEVEL_GATHER            : 1
     MV2_USE_TWO_LEVEL_SCATTER           : 1
     MV2_USE_XOR_ALLTOALL                : 1
     MV2_DEFAULT_SRC_PATH_BITS           : 0
     MV2_DEFAULT_STATIC_RATE             : 0
     MV2_DEFAULT_TIME_OUT                : 17237780
     MV2_DEFAULT_MTU                     : 0
     MV2_DEFAULT_PKEY                    : 0
     MV2_DEFAULT_PORT                    : -1
     MV2_DEFAULT_GID_INDEX               : 0
     MV2_DEFAULT_PSN                     : 0
     MV2_DEFAULT_MAX_RECV_WQE            : 128
     MV2_DEFAULT_MAX_SEND_WQE            : 64
     MV2_DEFAULT_MAX_SG_LIST             : 1
     MV2_DEFAULT_MIN_RNR_TIMER           : 12
     MV2_DEFAULT_QP_OUS_RD_ATOM          : 67371265
     MV2_DEFAULT_RETRY_COUNT             : 16844551
     MV2_DEFAULT_RNR_RETRY               : 65799
     MV2_DEFAULT_MAX_CQ_SIZE             : 40000
     MV2_DEFAULT_MAX_RDMA_DST_OPS        : 4
     MV2_INITIAL_PREPOST_DEPTH           : 10
     MV2_IWARP_MULTIPLE_CQ_THRESHOLD     : 32
     MV2_NUM_HCAS                        : 1
     MV2_NUM_PORTS                       : 1
     MV2_NUM_QP_PER_PORT                 : 1
     MV2_MAX_RDMA_CONNECT_ATTEMPTS       : 20
     MV2_ON_DEMAND_UD_INFO_EXCHANGE      : 0
     MV2_PREPOST_DEPTH                   : 64
     MV2_HOMOGENEOUS_CLUSTER             : 0
     MV2_COALESCE_THRESHOLD              : 6
     MV2_DREG_CACHE_LIMIT                : 0
     MV2_IBA_EAGER_THRESHOLD             : 0
     MV2_MAX_INLINE_SIZE                 : 0
     MV2_MAX_R3_PENDING_DATA             : 524288
     MV2_MED_MSG_RAIL_SHARING_POLICY     : 0
     MV2_NDREG_ENTRIES                   : 0
     MV2_NUM_RDMA_BUFFER                 : 0
     MV2_NUM_SPINS_BEFORE_LOCK           : 2000
     MV2_POLLING_LEVEL                   : 1
     MV2_POLLING_SET_LIMIT               : -1
     MV2_POLLING_SET_THRESHOLD           : 256
     MV2_R3_NOCACHE_THRESHOLD            : 32768
     MV2_R3_THRESHOLD                    : 4096
     MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
     MV2_RAIL_SHARING_MED_MSG_THRESHOLD  : 2048
     MV2_RAIL_SHARING_POLICY             : 4
     MV2_RDMA_EAGER_LIMIT                : 32
     MV2_RDMA_FAST_PATH_BUF_SIZE         : 4096
     MV2_RDMA_NUM_EXTRA_POLLS            : 1
     MV2_RNDV_EXT_SENDQ_SIZE             : 5
     MV2_RNDV_PROTOCOL                   : 3
     MV2_SMALL_MSG_RAIL_SHARING_POLICY   : 0
     MV2_SPIN_COUNT                      : 5000
     MV2_SRQ_LIMIT                       : 30
     MV2_SRQ_MAX_SIZE                    : 4096
     MV2_SRQ_SIZE                        : 256
     MV2_STRIPING_THRESHOLD              : 8192
     MV2_USE_COALESCE                    : 0
     MV2_USE_XRC                         : 0
     MV2_VBUF_MAX                        : -1
     MV2_VBUF_POOL_SIZE                  : 512
     MV2_VBUF_SECONDARY_POOL_SIZE        : 256
     MV2_VBUF_TOTAL_SIZE                 : 0
     MV2_USE_HWLOC_CPU_BINDING           : 1
     MV2_ENABLE_AFFINITY                 : 1
     MV2_ENABLE_LEASTLOAD                : 0
     MV2_SMP_BATCH_SIZE                  : 8
     MV2_SMP_EAGERSIZE                   : 32769
     MV2_SMPI_LENGTH_QUEUE               : 131072
     MV2_SMP_NUM_SEND_BUFFER             : 16
     MV2_SMP_SEND_BUF_SIZE               : 16384
     MV2_USE_SHARED_MEM                  : 1
     MV2_SHOW_ENV_INFO                   : 2
     MV2_DEFAULT_PUT_GET_LIST_SIZE       : 200
     MV2_EAGERSIZE_1SC                   : 0
     MV2_GET_FALLBACK_THRESHOLD          : 0
     MV2_PIN_POOL_SIZE                   : 2097152
     MV2_PUT_FALLBACK_THRESHOLD          : 0
     MV2_ASYNC_THREAD_STACK_SIZE         : 1048576
     MV2_THREAD_YIELD_SPIN_THRESHOLD     : 5
     MV2_USE_HUGEPAGES                   : 1
---------------------------------------------------------------------
-------------CPU AFFINITY-------------
RANK:0  CPU_SET:   0
RANK:1  CPU_SET:   1
-------------------------------------

(omitting the rest)

On 2/23/16 2:39 PM, Hari Subramoni wrote:
> Hello Mehmet,
>
> As you say, InfiniBand HCA should not have any impact on intra-node 
> communication performance as long as shared memory support is enabled. 
> I've a few follow up questions.
>
> 1. Did you use the same process to core mapping for both runs? Could 
> you please re-run after setting MV2_SHOW_CPU_BINDING=1 
> and MV2_SHOW_ENV_INFO=2
> 2. Can you please send the output of mpiname -a
>
> Thx,
> Hari.
>
> On Tue, Feb 23, 2016 at 2:33 PM, Mehmet Belgin 
> <mehmet.belgin at oit.gatech.edu <mailto:mehmet.belgin at oit.gatech.edu>> 
> wrote:
>
>     --===============3882789371153662688==
>     Content-Type: multipart/alternative;
>             boundary="------------030101020503070006010004"
>
>     --------------030101020503070006010004
>     Content-Type: text/plain; charset="utf-8"; format=flowed
>     Content-Transfer-Encoding: 7bit
>
>     Greetings!
>
>     I am troubleshooting a slowness issue on a single 16core node.
>     Compared
>     to profiling data I had from earlier, I can very clearly see that the
>     slowness is caused by MPI routines (MPI send rate dropped from
>     21M/s to
>     13M/s) for the very same code. The memory and CPU profiles of the code
>     look identical.
>
>     I was wondering if IB problems would have any impact at all,
>     despite the
>     fact that I am not using network (using single node). I would not
>     expect
>     it to be a factor, but asking just in case. I will now run a few OSU
>     benchmarks, but will appreciate any other suggestions you might have.
>
>     (using mvapich2/2.1 with intel/15.0 on a 16-core intel node)
>
>     Thanks!
>     -Mehmet
>
>     --------------030101020503070006010004
>     Content-Type: text/html; charset="utf-8"
>     Content-Transfer-Encoding: 7bit
>
>     <html>
>       <head>
>
>         <meta http-equiv="content-type" content="text/html;
>     charset=utf-8">
>       </head>
>       <body bgcolor="#FFFFFF" text="#000000">
>         <font face="Helvetica, Arial, sans-serif">Greetings! <br>
>           <br>
>           I am troubleshooting a slowness issue on a single 16core node.
>           Compared to profiling data I had from earlier, I can very
>     clearly
>           see that the slowness is caused by MPI routines (MPI send rate
>           dropped from 21M/s to 13M/s) for the very same code.
>     </font><font
>           face="Helvetica, Arial, sans-serif"><font face="Helvetica,
>     Arial,
>             sans-serif">The memory and CPU profiles of the code look
>             identical.<br>
>             <br>
>           </font>I was wondering if IB problems would have any impact at
>           all, despite the fact that I am not using network (using single
>           node). I would not expect it to be a factor, but asking just in
>           case. </font><font face="Helvetica, Arial, sans-serif"><font
>             face="Helvetica, Arial, sans-serif">I will now run a few OSU
>             benchmarks, but </font>will appreciate any other suggestions
>           you might have.<br>
>           <br>
>           (using mvapich2/2.1 with intel/15.0 on a 16-core intel node)<br>
>           <br>
>           Thanks!<br>
>           -Mehmet</font>
>       </body>
>     </html>
>
>     --------------030101020503070006010004--
>
>     --===============3882789371153662688==
>     Content-Type: text/plain; charset="us-ascii"
>     MIME-Version: 1.0
>     Content-Transfer-Encoding: 7bit
>     Content-Disposition: inline
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>     --===============3882789371153662688==--
>
>

-- 
=========================================
Mehmet Belgin, Ph.D. (mehmet.belgin at oit.gatech.edu)
Scientific Computing Consultant | OIT - Academic and Research Technologies
Georgia Institute of Technology
258 4th Str NW, Rich Building, Room 326
Atlanta, GA  30332-0700
Office: (404) 385-0665

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160223/98f79f79/attachment-0001.html>