[mvapich-discuss] (no subject)
Mehmet Belgin
mehmet.belgin at oit.gatech.edu
Tue Feb 23 15:04:33 EST 2016
Hi Hari,
Thank you for your fast reply!
I picked a healthy node (identical hw) and getting the same env from
both (copied below). OSU bw tests are running 50% slower on the problem
node. This is really puzzling.
First the 'tail' for results:
Bad node:
=====================
1048576 4464.94
2097152 4473.40
4194304 4182.60
Healthy node:
=====================
1048576 8688.42
2097152 8662.93
4194304 8440.66
Then, the details you requested:
$ mpiname -a
MVAPICH2 2.1rc1 Thu Dec 18 20:00:00 EDT 2014 ch3:mrail
Compilation
CC: icc -DNDEBUG -DNVALGRIND -O2
CXX: icpc -DNDEBUG -DNVALGRIND -O2
F77: ifort -L/lib -L/lib -O2
FC: ifort -O2
Configuration
--prefix=/usr/local/pacerepov1/mvapich2/2.1/intel-15.0/ --with-hwloc
--enable-romio --with-file-system=ufs+nfs --enable-shared
--enable-sharedlibs=gcc
$ mpirun -np 2 -env MV2_ENABLE_AFFINITY=1 -env MV2_SHOW_CPU_BINDING=1
-env MV2_SHOW_ENV_INFO=2 osu_bw
(the copied part is identical for both nodes, diff'ed to confirm)
MVAPICH2-2.1rc1 Parameters
---------------------------------------------------------------------
PROCESSOR ARCH NAME : MV2_ARCH_INTEL_XEON_E5_2670_16
PROCESSOR FAMILY NAME : MV2_CPU_FAMILY_INTEL
PROCESSOR MODEL NUMBER : 45
HCA NAME : MV2_HCA_MLX_CX_QDR
HETEROGENEOUS HCA : NO
MV2_EAGERSIZE_1SC : 0
MV2_SMP_EAGERSIZE : 32769
MV2_SMPI_LENGTH_QUEUE : 131072
MV2_SMP_NUM_SEND_BUFFER : 16
MV2_SMP_BATCH_SIZE : 8
---------------------------------------------------------------------
MVAPICH2 All Parameters
MV2_COMM_WORLD_LOCAL_RANK : 0
MPIRUN_RSH_LAUNCH : 0
MV2_3DTORUS_SUPPORT : 0
MV2_NUM_SA_QUERY_RETRIES : 20
MV2_NUM_SLS : 8
MV2_DEFAULT_SERVICE_LEVEL : 0
MV2_PATH_SL_QUERY : 0
MV2_USE_QOS : 0
MV2_ALLGATHER_BRUCK_THRESHOLD : 524288
MV2_ALLGATHER_RD_THRESHOLD : 81920
MV2_ALLGATHER_REVERSE_RANKING : 1
MV2_ALLGATHERV_RD_THRESHOLD : 0
MV2_ALLREDUCE_2LEVEL_MSG : 262144
MV2_ALLREDUCE_SHORT_MSG : 2048
MV2_ALLTOALL_MEDIUM_MSG : 16384
MV2_ALLTOALL_SMALL_MSG : 2048
MV2_ALLTOALL_THROTTLE_FACTOR : 4
MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE : 64
MV2_GATHER_SWITCH_PT : 0
MV2_INTRA_SHMEM_REDUCE_MSG : 2048
MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
MV2_KNOMIAL_INTER_LEADER_THRESHOLD : 65536
MV2_KNOMIAL_INTER_NODE_FACTOR : 4
MV2_KNOMIAL_INTRA_NODE_FACTOR : 4
MV2_KNOMIAL_INTRA_NODE_THRESHOLD : 131072
MV2_RED_SCAT_LARGE_MSG : 524288
MV2_RED_SCAT_SHORT_MSG : 64
MV2_REDUCE_2LEVEL_MSG : 16384
MV2_REDUCE_SHORT_MSG : 8192
MV2_SCATTER_MEDIUM_MSG : 0
MV2_SCATTER_SMALL_MSG : 0
MV2_SHMEM_ALLREDUCE_MSG : 32768
MV2_SHMEM_COLL_MAX_MSG_SIZE : 131072
MV2_SHMEM_COLL_NUM_COMM : 8
MV2_SHMEM_COLL_NUM_PROCS : 2
MV2_SHMEM_COLL_SPIN_COUNT : 5
MV2_SHMEM_REDUCE_MSG : 4096
MV2_USE_BCAST_SHORT_MSG : 16384
MV2_USE_DIRECT_GATHER : 1
MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
MV2_USE_DIRECT_SCATTER : 1
MV2_USE_OSU_COLLECTIVES : 1
MV2_USE_OSU_NB_COLLECTIVES : 1
MV2_USE_KNOMIAL_2LEVEL_BCAST : 1
MV2_USE_KNOMIAL_INTER_LEADER_BCAST : 1
MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
MV2_USE_SHMEM_ALLREDUCE : 1
MV2_USE_SHMEM_BARRIER : 1
MV2_USE_SHMEM_BCAST : 1
MV2_USE_SHMEM_COLL : 1
MV2_USE_SHMEM_REDUCE : 1
MV2_USE_TWO_LEVEL_GATHER : 1
MV2_USE_TWO_LEVEL_SCATTER : 1
MV2_USE_XOR_ALLTOALL : 1
MV2_DEFAULT_SRC_PATH_BITS : 0
MV2_DEFAULT_STATIC_RATE : 0
MV2_DEFAULT_TIME_OUT : 17237780
MV2_DEFAULT_MTU : 0
MV2_DEFAULT_PKEY : 0
MV2_DEFAULT_PORT : -1
MV2_DEFAULT_GID_INDEX : 0
MV2_DEFAULT_PSN : 0
MV2_DEFAULT_MAX_RECV_WQE : 128
MV2_DEFAULT_MAX_SEND_WQE : 64
MV2_DEFAULT_MAX_SG_LIST : 1
MV2_DEFAULT_MIN_RNR_TIMER : 12
MV2_DEFAULT_QP_OUS_RD_ATOM : 67371265
MV2_DEFAULT_RETRY_COUNT : 16844551
MV2_DEFAULT_RNR_RETRY : 65799
MV2_DEFAULT_MAX_CQ_SIZE : 40000
MV2_DEFAULT_MAX_RDMA_DST_OPS : 4
MV2_INITIAL_PREPOST_DEPTH : 10
MV2_IWARP_MULTIPLE_CQ_THRESHOLD : 32
MV2_NUM_HCAS : 1
MV2_NUM_PORTS : 1
MV2_NUM_QP_PER_PORT : 1
MV2_MAX_RDMA_CONNECT_ATTEMPTS : 20
MV2_ON_DEMAND_UD_INFO_EXCHANGE : 0
MV2_PREPOST_DEPTH : 64
MV2_HOMOGENEOUS_CLUSTER : 0
MV2_COALESCE_THRESHOLD : 6
MV2_DREG_CACHE_LIMIT : 0
MV2_IBA_EAGER_THRESHOLD : 0
MV2_MAX_INLINE_SIZE : 0
MV2_MAX_R3_PENDING_DATA : 524288
MV2_MED_MSG_RAIL_SHARING_POLICY : 0
MV2_NDREG_ENTRIES : 0
MV2_NUM_RDMA_BUFFER : 0
MV2_NUM_SPINS_BEFORE_LOCK : 2000
MV2_POLLING_LEVEL : 1
MV2_POLLING_SET_LIMIT : -1
MV2_POLLING_SET_THRESHOLD : 256
MV2_R3_NOCACHE_THRESHOLD : 32768
MV2_R3_THRESHOLD : 4096
MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
MV2_RAIL_SHARING_MED_MSG_THRESHOLD : 2048
MV2_RAIL_SHARING_POLICY : 4
MV2_RDMA_EAGER_LIMIT : 32
MV2_RDMA_FAST_PATH_BUF_SIZE : 4096
MV2_RDMA_NUM_EXTRA_POLLS : 1
MV2_RNDV_EXT_SENDQ_SIZE : 5
MV2_RNDV_PROTOCOL : 3
MV2_SMALL_MSG_RAIL_SHARING_POLICY : 0
MV2_SPIN_COUNT : 5000
MV2_SRQ_LIMIT : 30
MV2_SRQ_MAX_SIZE : 4096
MV2_SRQ_SIZE : 256
MV2_STRIPING_THRESHOLD : 8192
MV2_USE_COALESCE : 0
MV2_USE_XRC : 0
MV2_VBUF_MAX : -1
MV2_VBUF_POOL_SIZE : 512
MV2_VBUF_SECONDARY_POOL_SIZE : 256
MV2_VBUF_TOTAL_SIZE : 0
MV2_USE_HWLOC_CPU_BINDING : 1
MV2_ENABLE_AFFINITY : 1
MV2_ENABLE_LEASTLOAD : 0
MV2_SMP_BATCH_SIZE : 8
MV2_SMP_EAGERSIZE : 32769
MV2_SMPI_LENGTH_QUEUE : 131072
MV2_SMP_NUM_SEND_BUFFER : 16
MV2_SMP_SEND_BUF_SIZE : 16384
MV2_USE_SHARED_MEM : 1
MV2_SHOW_ENV_INFO : 2
MV2_DEFAULT_PUT_GET_LIST_SIZE : 200
MV2_EAGERSIZE_1SC : 0
MV2_GET_FALLBACK_THRESHOLD : 0
MV2_PIN_POOL_SIZE : 2097152
MV2_PUT_FALLBACK_THRESHOLD : 0
MV2_ASYNC_THREAD_STACK_SIZE : 1048576
MV2_THREAD_YIELD_SPIN_THRESHOLD : 5
MV2_USE_HUGEPAGES : 1
---------------------------------------------------------------------
-------------CPU AFFINITY-------------
RANK:0 CPU_SET: 0
RANK:1 CPU_SET: 1
-------------------------------------
(omitting the rest)
On 2/23/16 2:39 PM, Hari Subramoni wrote:
> Hello Mehmet,
>
> As you say, InfiniBand HCA should not have any impact on intra-node
> communication performance as long as shared memory support is enabled.
> I've a few follow up questions.
>
> 1. Did you use the same process to core mapping for both runs? Could
> you please re-run after setting MV2_SHOW_CPU_BINDING=1
> and MV2_SHOW_ENV_INFO=2
> 2. Can you please send the output of mpiname -a
>
> Thx,
> Hari.
>
> On Tue, Feb 23, 2016 at 2:33 PM, Mehmet Belgin
> <mehmet.belgin at oit.gatech.edu <mailto:mehmet.belgin at oit.gatech.edu>>
> wrote:
>
> --===============3882789371153662688==
> Content-Type: multipart/alternative;
> boundary="------------030101020503070006010004"
>
> --------------030101020503070006010004
> Content-Type: text/plain; charset="utf-8"; format=flowed
> Content-Transfer-Encoding: 7bit
>
> Greetings!
>
> I am troubleshooting a slowness issue on a single 16core node.
> Compared
> to profiling data I had from earlier, I can very clearly see that the
> slowness is caused by MPI routines (MPI send rate dropped from
> 21M/s to
> 13M/s) for the very same code. The memory and CPU profiles of the code
> look identical.
>
> I was wondering if IB problems would have any impact at all,
> despite the
> fact that I am not using network (using single node). I would not
> expect
> it to be a factor, but asking just in case. I will now run a few OSU
> benchmarks, but will appreciate any other suggestions you might have.
>
> (using mvapich2/2.1 with intel/15.0 on a 16-core intel node)
>
> Thanks!
> -Mehmet
>
> --------------030101020503070006010004
> Content-Type: text/html; charset="utf-8"
> Content-Transfer-Encoding: 7bit
>
> <html>
> <head>
>
> <meta http-equiv="content-type" content="text/html;
> charset=utf-8">
> </head>
> <body bgcolor="#FFFFFF" text="#000000">
> <font face="Helvetica, Arial, sans-serif">Greetings! <br>
> <br>
> I am troubleshooting a slowness issue on a single 16core node.
> Compared to profiling data I had from earlier, I can very
> clearly
> see that the slowness is caused by MPI routines (MPI send rate
> dropped from 21M/s to 13M/s) for the very same code.
> </font><font
> face="Helvetica, Arial, sans-serif"><font face="Helvetica,
> Arial,
> sans-serif">The memory and CPU profiles of the code look
> identical.<br>
> <br>
> </font>I was wondering if IB problems would have any impact at
> all, despite the fact that I am not using network (using single
> node). I would not expect it to be a factor, but asking just in
> case. </font><font face="Helvetica, Arial, sans-serif"><font
> face="Helvetica, Arial, sans-serif">I will now run a few OSU
> benchmarks, but </font>will appreciate any other suggestions
> you might have.<br>
> <br>
> (using mvapich2/2.1 with intel/15.0 on a 16-core intel node)<br>
> <br>
> Thanks!<br>
> -Mehmet</font>
> </body>
> </html>
>
> --------------030101020503070006010004--
>
> --===============3882789371153662688==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> <mailto:mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> --===============3882789371153662688==--
>
>
--
=========================================
Mehmet Belgin, Ph.D. (mehmet.belgin at oit.gatech.edu)
Scientific Computing Consultant | OIT - Academic and Research Technologies
Georgia Institute of Technology
258 4th Str NW, Rich Building, Room 326
Atlanta, GA 30332-0700
Office: (404) 385-0665
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160223/98f79f79/attachment-0001.html>
More information about the mvapich-discuss
mailing list