[mvapich-discuss] Odd process to CPU mapping
Jonathan Perkins
perkinjo at cse.ohio-state.edu
Tue Nov 4 16:57:53 EST 2014
Thanks for providing this info. We'll try to determine why this is
happening and provide a work around / fix for you soon.
On Tue, Nov 04, 2014 at 02:57:11PM -0600, Vladimir Florinski wrote:
> Here goes:
>
> Linux node62e 3.16.6-203.fc20.x86_64 #1 SMP Sat Oct 25 12:44:32 UTC 2014
> x86_64 x86_64 x86_64 GNU/Linux
>
> CA 'mlx4_0'
> CA type: MT4099
> Number of ports: 2
> Firmware version: 2.11.550
> Hardware version: 0
> Node GUID: 0x0002c90300ebbbf0
> System image GUID: 0x0002c90300ebbbf3
> Port 1:
> State: Active
> Physical state: LinkUp
> Rate: 56
> Base lid: 18
> LMC: 0
> SM lid: 1
> Capability mask: 0x02514868
> Port GUID: 0x0002c90300ebbbf1
> Link layer: InfiniBand
> Port 2:
> State: Down
> Physical state: Disabled
> Rate: 10
> Base lid: 0
> LMC: 0
> SM lid: 0
> Capability mask: 0x02514868
> Port GUID: 0x0002c90300ebbbf2
> Link layer: InfiniBand
>
> -------------CPU AFFINITY-------------
> RANK:0 CPU_SET: 0
> RANK:1 CPU_SET: 2
> RANK:2 CPU_SET: 4
> RANK:3 CPU_SET: 6
> RANK:4 CPU_SET: 0
> RANK:5 CPU_SET: 2
> RANK:6 CPU_SET: 4
> RANK:7 CPU_SET: 6
> RANK:8 CPU_SET: 9
> RANK:9 CPU_SET: 11
> RANK:10 CPU_SET: 13
> RANK:11 CPU_SET: 15
> RANK:12 CPU_SET: 9
> RANK:13 CPU_SET: 11
> RANK:14 CPU_SET: 13
> RANK:15 CPU_SET: 15
> -------------------------------------
>
> MVAPICH2-2.1a Parameters
> ---------------------------------------------------------------------
> PROCESSOR ARCH NAME : MV2_ARCH_INTEL_GENERIC
> PROCESSOR FAMILY NAME : MV2_CPU_FAMILY_INTEL
> PROCESSOR MODEL NUMBER : 45
> HCA NAME : MV2_HCA_MLX_CX_FDR
> HETEROGENEOUS HCA : NO
> MV2_VBUF_TOTAL_SIZE : 16384
> MV2_IBA_EAGER_THRESHOLD : 16384
> MV2_RDMA_FAST_PATH_BUF_SIZE : 5120
> MV2_PUT_FALLBACK_THRESHOLD : 8192
> MV2_GET_FALLBACK_THRESHOLD : 0
> MV2_EAGERSIZE_1SC : 4096
> MV2_SMP_EAGERSIZE : 65537
> MV2_SMPI_LENGTH_QUEUE : 262144
> MV2_SMP_NUM_SEND_BUFFER : 256
> MV2_SMP_BATCH_SIZE : 8
> ---------------------------------------------------------------------
>
> MVAPICH2 All Parameters
> MV2_COMM_WORLD_LOCAL_RANK : 0
> PMI_ID : 0
> MPIRUN_RSH_LAUNCH : 1
> MPISPAWN_GLOBAL_NPROCS : 32
> MPISPAWN_MPIRUN_HOST : node62e
> MPISPAWN_MPIRUN_ID : 12707
> MPISPAWN_NNODES : 2
> MPISPAWN_WORKING_DIR : (deleted)
> USE_LINEAR_SSH : 1
> PMI_PORT : node62e:40351
> MV2_3DTORUS_SUPPORT : 0
> MV2_NUM_SA_QUERY_RETRIES : 20
> MV2_NUM_SLS : 8
> MV2_DEFAULT_SERVICE_LEVEL : 0
> MV2_PATH_SL_QUERY : 0
> MV2_USE_QOS : 0
> MV2_ALLGATHER_BRUCK_THRESHOLD : 524288
> MV2_ALLGATHER_RD_THRESHOLD : 81920
> MV2_ALLGATHER_REVERSE_RANKING : 1
> MV2_ALLGATHERV_RD_THRESHOLD : 0
> MV2_ALLREDUCE_2LEVEL_MSG : 262144
> MV2_ALLREDUCE_SHORT_MSG : 2048
> MV2_ALLTOALL_MEDIUM_MSG : 16384
> MV2_ALLTOALL_SMALL_MSG : 2048
> MV2_ALLTOALL_THROTTLE_FACTOR : 4
> MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE : 64
> MV2_GATHER_SWITCH_PT : 0
> MV2_INTRA_SHMEM_REDUCE_MSG : 2048
> MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
> MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
> MV2_KNOMIAL_INTER_LEADER_THRESHOLD : 65536
> MV2_KNOMIAL_INTER_NODE_FACTOR : 4
> MV2_KNOMIAL_INTRA_NODE_FACTOR : 4
> MV2_KNOMIAL_INTRA_NODE_THRESHOLD : 131072
> MV2_RED_SCAT_LARGE_MSG : 524288
> MV2_RED_SCAT_SHORT_MSG : 64
> MV2_REDUCE_2LEVEL_MSG : 16384
> MV2_REDUCE_SHORT_MSG : 8192
> MV2_SCATTER_MEDIUM_MSG : 0
> MV2_SCATTER_SMALL_MSG : 0
> MV2_SHMEM_ALLREDUCE_MSG : 32768
> MV2_SHMEM_COLL_MAX_MSG_SIZE : 131072
> MV2_SHMEM_COLL_NUM_COMM : 8
> MV2_SHMEM_COLL_NUM_PROCS : 16
> MV2_SHMEM_COLL_SPIN_COUNT : 5
> MV2_SHMEM_REDUCE_MSG : 4096
> MV2_USE_BCAST_SHORT_MSG : 16384
> MV2_USE_DIRECT_GATHER : 1
> MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
> MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
> MV2_USE_DIRECT_SCATTER : 1
> MV2_USE_OSU_COLLECTIVES : 1
> MV2_USE_OSU_NB_COLLECTIVES : 1
> MV2_USE_KNOMIAL_2LEVEL_BCAST : 1
> MV2_USE_KNOMIAL_INTER_LEADER_BCAST : 1
> MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
> MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
> MV2_USE_SHMEM_ALLREDUCE : 1
> MV2_USE_SHMEM_BARRIER : 1
> MV2_USE_SHMEM_BCAST : 1
> MV2_USE_SHMEM_COLL : 1
> MV2_USE_SHMEM_REDUCE : 1
> MV2_USE_TWO_LEVEL_GATHER : 1
> MV2_USE_TWO_LEVEL_SCATTER : 1
> MV2_USE_XOR_ALLTOALL : 1
> MV2_DEFAULT_SRC_PATH_BITS : 0
> MV2_DEFAULT_STATIC_RATE : 0
> MV2_DEFAULT_TIME_OUT : -939063532
> MV2_DEFAULT_MTU : 4
> MV2_DEFAULT_PKEY : 0
> MV2_DEFAULT_PORT : 1
> MV2_DEFAULT_GID_INDEX : 0
> MV2_DEFAULT_PSN : 0
> MV2_DEFAULT_MAX_RECV_WQE : 128
> MV2_DEFAULT_MAX_SEND_WQE : 64
> MV2_DEFAULT_MAX_SG_LIST : 1
> MV2_DEFAULT_MIN_RNR_TIMER : 12
> MV2_DEFAULT_QP_OUS_RD_ATOM : 268701700
> MV2_DEFAULT_RETRY_COUNT : 13108999
> MV2_DEFAULT_RNR_RETRY : 51207
> MV2_DEFAULT_MAX_CQ_SIZE : 40000
> MV2_DEFAULT_MAX_RDMA_DST_OPS : 4
> MV2_INITIAL_PREPOST_DEPTH : 10
> MV2_IWARP_MULTIPLE_CQ_THRESHOLD : 32
> MV2_NUM_HCAS : 1
> MV2_NUM_NODES_IN_JOB : 2
> MV2_NUM_PORTS : 1
> MV2_NUM_QP_PER_PORT : 1
> MV2_MAX_RDMA_CONNECT_ATTEMPTS : 10
> MV2_ON_DEMAND_UD_INFO_EXCHANGE : 0
> MV2_PREPOST_DEPTH : 64
> MV2_HOMOGENEOUS_CLUSTER : 0
> MV2_COALESCE_THRESHOLD : 6
> MV2_DREG_CACHE_LIMIT : 0
> MV2_IBA_EAGER_THRESHOLD : 16384
> MV2_MAX_INLINE_SIZE : 168
> MV2_MAX_R3_PENDING_DATA : 524288
> MV2_MED_MSG_RAIL_SHARING_POLICY : 0
> MV2_NDREG_ENTRIES : 1164
> MV2_NUM_RDMA_BUFFER : 16
> MV2_NUM_SPINS_BEFORE_LOCK : 2000
> MV2_POLLING_LEVEL : 1
> MV2_POLLING_SET_LIMIT : 64
> MV2_POLLING_SET_THRESHOLD : 256
> MV2_R3_NOCACHE_THRESHOLD : 32768
> MV2_R3_THRESHOLD : 4096
> MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
> MV2_RAIL_SHARING_MED_MSG_THRESHOLD : 2048
> MV2_RAIL_SHARING_POLICY : 4
> MV2_RDMA_EAGER_LIMIT : 32
> MV2_RDMA_FAST_PATH_BUF_SIZE : 5120
> MV2_RDMA_NUM_EXTRA_POLLS : 1
> MV2_RNDV_EXT_SENDQ_SIZE : 5
> MV2_RNDV_PROTOCOL : 3
> MV2_SMALL_MSG_RAIL_SHARING_POLICY : 0
> MV2_SPIN_COUNT : 5000
> MV2_SRQ_LIMIT : 30
> MV2_SRQ_MAX_SIZE : 4096
> MV2_SRQ_SIZE : 128
> MV2_STRIPING_THRESHOLD : 16384
> MV2_USE_COALESCE : 0
> MV2_USE_XRC : 0
> MV2_VBUF_MAX : -1
> MV2_VBUF_POOL_SIZE : 256
> MV2_VBUF_SECONDARY_POOL_SIZE : 128
> MV2_VBUF_TOTAL_SIZE : 16384
> MV2_USE_HWLOC_CPU_BINDING : 1
> MV2_ENABLE_AFFINITY : 1
> MV2_ENABLE_LEASTLOAD : 0
> MV2_SMP_BATCH_SIZE : 8
> MV2_SMP_EAGERSIZE : 65537
> MV2_SMPI_LENGTH_QUEUE : 262144
> MV2_SMP_NUM_SEND_BUFFER : 256
> MV2_SMP_SEND_BUF_SIZE : 8192
> MV2_USE_SHARED_MEM : 1
> MV2_SHOW_ENV_INFO : 2
> MV2_DEFAULT_PUT_GET_LIST_SIZE : 200
> MV2_EAGERSIZE_1SC : 4096
> MV2_GET_FALLBACK_THRESHOLD : 0
> MV2_PIN_POOL_SIZE : 2097152
> MV2_PUT_FALLBACK_THRESHOLD : 8192
> MV2_ASYNC_THREAD_STACK_SIZE : 1048576
> MV2_THREAD_YIELD_SPIN_THRESHOLD : 5
> MV2_USE_HUGEPAGES : 1
> ---------------------------------------------------------------------
>
>
>
>
> On Tue, Nov 4, 2014 at 2:36 PM, Jonathan Perkins <
> perkinjo at cse.ohio-state.edu> wrote:
>
> > On Tue, Nov 04, 2014 at 02:30:13PM -0600, Vladimir Florinski wrote:
> > > Hi,
> > >
> > > We are seeing a rather odd behavior since upgrading to mvapich2-x version
> > > 2.1 (RPM install). If one requests, say, 2 nodes with 2 octa-core CPUs
> > per
> > > node (32 cores in all), the software would place two processes on each
> > core
> > > of the first CPU in the node, leaving the second CPU idle. Because
> > > hyperthreading is disabled, this policy does not make any sense at all.
> > Any
> > > ideas on what could be wrong here?
> >
> > Thanks for your note. This is not intended behavior. Can you send us
> > the output of the following:
> >
> > uname -a
> > ibstat
> >
> > The output from your run with MV2_SHOW_CPU_BINDING=1 set in your
> > environment (you only need to send us the header showing the binding).
> >
> > The output from your run with MV2_SHOW_ENV_INFO=2 set may also be
> > useful.
> >
> > --
> > Jonathan Perkins
> >
>
>
>
> --
> Vladimir Florinski
--
Jonathan Perkins
More information about the mvapich-discuss
mailing list