[mvapich-discuss] 2.3.5-1 performance

Lana Deere lana.deere at gmail.com
Mon Dec 21 16:23:58 EST 2020


Here are the outputs with MV2_SHOW_ENV_INFO=3 for a run using each version
of MPI.  I will experiment with the AFFINITY and THRESHOLD variables.  I'll
also try to drill down into the MPI calls to see if I can tell which ones
specifically are slower by how much; all I have at the moment is the
aggregate statistic.

Thanks.

.. Lana (lana.deere at gmail.com)




On Mon, Dec 21, 2020 at 1:23 PM Subramoni, Hari <subramoni.1 at osu.edu> wrote:

> Hi, Lana.
>
>
>
> The default configuration will ensure that IB is selected. Can you rerun
> the app after setting MV2_SHOW_ENV_INFO=3 with MVAPICH2 2.3.4 and MVAPICH2
> 2.3.5-1? MVAPICH2 will print a lot of information at the beginning. That is
> what we will be looking for.
>
>
>
> Can you let us know what MPI calls are taking more time and at what scale?
>
>
>
> In the meantime, can you please try the following environment variable
> combinations to see if any of those help?
>
>
>
>    1. MV2_ENABLE_AFFINITY=0
>    2. MV2_HYBRID_ENABLE_THRESHOLD=<nprocs+1>
>
>
>
> Best,
>
> Hari.
>
>
>
> *From:* mvapich-discuss-bounces at cse.ohio-state.edu <
> mvapich-discuss-bounces at mailman.cse.ohio-state.edu> *On Behalf Of *Lana
> Deere
> *Sent:* Monday, December 21, 2020 1:16 PM
> *To:* mvapich-discuss at cse.ohio-state.edu <
> mvapich-discuss at mailman.cse.ohio-state.edu>
> *Subject:* [mvapich-discuss] 2.3.5-1 performance
>
>
>
> The runs I have been doing with 2.3.5-1 to try to reproduce the SHMEM_Sync
> bus error are having a new issue, specifically they are running much slower
> than they did with 2.3.1 and 2.3.4.  As best I've managed to measure it,
> under 2.3.5-1 it's spending 2x-4x as much time in the MPI calls as in
> previous versions.  The only difference between the slower and the faster
> runs is the copy of libmpi.so.12.1.1 which is available to the program -- I
> can swap in the 2.3.4 libmpi.so without rebuilding the program and I get
> the performance back.
>
>
>
> There is one configure difference, namely --enable-fast=O2,ndebug on
> 2.3.5-1 vs. --enable-ast=O3,ndebug on the other versions.
>
>
>
> The first thing I thought of was that maybe it had decided to select
> Ethernet rather than InfiniBand for the transport, but there seems to be a
> lot of InfiniBand traffic at the correct times when the program is
> running.  Is there some way to get MPI to output explicitly the transport
> it selects, just to double check?
>
>
>
> Are there any changes in 2.3.5-1 which seem like they might cause the
> performance difference?  Any environment variables which might need to be
> set or set differently than before?
>
>
>
> Thanks.
>
>
> .. Lana (lana.deere at gmail.com)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20201221/0414c178/attachment-0001.html>
-------------- next part --------------
Command 'mpirun' is pid #494834.
WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing without InfiniBand registration cache support.

 MVAPICH2-2.3.4 Parameters
---------------------------------------------------------------------
	PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
	PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
	PROCESSOR MODEL NUMBER         : 79
	HCA NAME                       : MV2_HCA_MLX_CX_CONNIB
	HETEROGENEOUS HCA              : NO
	MV2_VBUF_TOTAL_SIZE            : 16384
	MV2_IBA_EAGER_THRESHOLD        : 16384
	MV2_RDMA_FAST_PATH_BUF_SIZE    : 4096
	MV2_PUT_FALLBACK_THRESHOLD     : 8192
	MV2_GET_FALLBACK_THRESHOLD     : 262144
	MV2_EAGERSIZE_1SC              : 4096
	MV2_SMP_EAGERSIZE              : 65537
	MV2_SMP_QUEUE_LENGTH           : 262144
	MV2_SMP_NUM_SEND_BUFFER        : 256
	MV2_SMP_BATCH_SIZE             : 8
	Tuning Table:                  : MV2_ARCH_UNKWN MV2_HCA_UNKWN
---------------------------------------------------------------------

 MVAPICH2 All Parameters
	MV2_COMM_WORLD_LOCAL_RANK           : 0
	MPIRUN_RSH_LAUNCH                   : 0
	MV2_SHMEM_BACKED_UD_CM              : 0
	MV2_3DTORUS_SUPPORT                 : 0
	MV2_NUM_SA_QUERY_RETRIES            : 20
	MV2_NUM_SLS                         : 8
	MV2_DEFAULT_SERVICE_LEVEL           : 0
	MV2_PATH_SL_QUERY                   : 0
	MV2_USE_QOS                         : 0
	MV2_ALLGATHER_BRUCK_THRESHOLD       : 524288
	MV2_ALLGATHER_RD_THRESHOLD          : 81920
	MV2_ALLGATHER_REVERSE_RANKING       : 1
	MV2_ALLGATHERV_RD_THRESHOLD         : 0
	MV2_ALLREDUCE_2LEVEL_MSG            : 262144
	MV2_ALLREDUCE_SHORT_MSG             : 2048
	MV2_ALLTOALL_MEDIUM_MSG             : 16384
	MV2_ALLTOALL_SMALL_MSG              : 2048
	MV2_ALLTOALL_THROTTLE_FACTOR        : 32
	MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE     : 64
	MV2_GATHER_SWITCH_PT                : 0
	MV2_INTRA_SHMEM_REDUCE_MSG          : 2048
	MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
	MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
	MV2_KNOMIAL_INTER_LEADER_THRESHOLD  : 65536
	MV2_KNOMIAL_INTER_NODE_FACTOR       : 4
	MV2_KNOMIAL_INTRA_NODE_FACTOR       : 4
	MV2_KNOMIAL_INTRA_NODE_THRESHOLD    : 131072
	MV2_RED_SCAT_LARGE_MSG              : 524288
	MV2_RED_SCAT_SHORT_MSG              : 64
	MV2_REDUCE_2LEVEL_MSG               : 16384
	MV2_REDUCE_SHORT_MSG                : 8192
	MV2_SCATTER_MEDIUM_MSG              : 0
	MV2_SCATTER_SMALL_MSG               : 0
	MV2_SHMEM_ALLREDUCE_MSG             : 32768
	MV2_SHMEM_COLL_MAX_MSG_SIZE         : 131072
	MV2_SHMEM_COLL_NUM_COMM             : 8
	MV2_SHMEM_COLL_NUM_PROCS            : 64
	MV2_SHMEM_COLL_SPIN_COUNT           : 5
	MV2_SHMEM_REDUCE_MSG                : 4096
	MV2_USE_BCAST_SHORT_MSG             : 16384
	MV2_USE_DIRECT_GATHER               : 1
	MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
	MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
	MV2_USE_DIRECT_SCATTER              : 1
	MV2_USE_OSU_COLLECTIVES             : 1
	MV2_USE_OSU_NB_COLLECTIVES          : 1
	MV2_USE_KNOMIAL_2LEVEL_BCAST        : 1
	MV2_USE_KNOMIAL_INTER_LEADER_BCAST  : 1
	MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
	MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
	MV2_USE_SHMEM_ALLREDUCE             : 1
	MV2_USE_SHMEM_BARRIER               : 1
	MV2_USE_SHMEM_BCAST                 : 1
	MV2_USE_SHMEM_COLL                  : 0
	MV2_USE_SHMEM_REDUCE                : 1
	MV2_USE_TWO_LEVEL_GATHER            : 1
	MV2_USE_TWO_LEVEL_SCATTER           : 1
	MV2_USE_XOR_ALLTOALL                : 1
	MV2_ENABLE_SOCKET_AWARE_COLLECTIVES : 1
	MV2_USE_SOCKET_AWARE_ALLREDUCE      : 1
	MV2_USE_SOCKET_AWARE_BARRIER        : 1
	MV2_USE_SOCKET_AWARE_SHARP_ALLREDUCE : 0
	MV2_SOCKET_AWARE_ALLREDUCE_MAX_MSG  : 2048
	MV2_SOCKET_AWARE_ALLREDUCE_MIN_MSG  : 1
	MV2_DEFAULT_SRC_PATH_BITS           : 0
	MV2_DEFAULT_STATIC_RATE             : 0
	MV2_DEFAULT_TIME_OUT                : 330772
	MV2_DEFAULT_MTU                     : 5
	MV2_DEFAULT_PKEY                    : 0
	MV2_DEFAULT_QKEY                    : 0
	MV2_DEFAULT_PORT                    : 1
	MV2_DEFAULT_GID_INDEX               : 0
	MV2_DEFAULT_PSN                     : 0
	MV2_DEFAULT_MAX_RECV_WQE            : 128
	MV2_DEFAULT_MAX_SEND_WQE            : 64
	MV2_DEFAULT_MAX_SG_LIST             : 1
	MV2_DEFAULT_MIN_RNR_TIMER           : 12
	MV2_DEFAULT_QP_OUS_RD_ATOM          : 260
	MV2_DEFAULT_RETRY_COUNT             : 84677639
	MV2_DEFAULT_RNR_RETRY               : 202639111
	MV2_DEFAULT_MAX_CQ_SIZE             : 40000
	MV2_DEFAULT_MAX_RDMA_DST_OPS        : 4
	MV2_INITIAL_PREPOST_DEPTH           : 10
	MV2_IWARP_MULTIPLE_CQ_THRESHOLD     : 32
	MV2_NUM_HCAS                        : 1
	MV2_NUM_PORTS                       : 1
	MV2_NUM_QP_PER_PORT                 : 1
	MV2_MAX_RDMA_CONNECT_ATTEMPTS       : 20
	MV2_ON_DEMAND_UD_INFO_EXCHANGE      : 0
	MV2_PREPOST_DEPTH                   : 64
	MV2_HOMOGENEOUS_CLUSTER             : 0
	MV2_NUM_CQES_PER_POLL               : 96
	MV2_COALESCE_THRESHOLD              : 6
	MV2_DREG_CACHE_LIMIT                : 0
	MV2_IBA_EAGER_THRESHOLD             : 16384
	MV2_MAX_INLINE_SIZE                 : 168
	MV2_MAX_R3_PENDING_DATA             : 524288
	MV2_MED_MSG_RAIL_SHARING_POLICY     : 0
	MV2_NDREG_ENTRIES                   : 1116
	MV2_NUM_RDMA_BUFFER                 : 16
	MV2_NUM_SPINS_BEFORE_LOCK           : 2000
	MV2_POLLING_LEVEL                   : 1
	MV2_POLLING_SET_LIMIT               : 64
	MV2_POLLING_SET_THRESHOLD           : 256
	MV2_R3_NOCACHE_THRESHOLD            : 32768
	MV2_R3_THRESHOLD                    : 4096
	MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
	MV2_RAIL_SHARING_MED_MSG_THRESHOLD  : 2048
	MV2_RAIL_SHARING_POLICY             : 4
	MV2_RDMA_EAGER_LIMIT                : 32
	MV2_RDMA_FAST_PATH_BUF_SIZE         : 4096
	MV2_RDMA_NUM_EXTRA_POLLS            : 1
	MV2_RNDV_EXT_SENDQ_SIZE             : 5
	MV2_RNDV_PROTOCOL                   : 4
	MV2_SMP_RNDV_PROTOCOL               : 4
	MV2_SMALL_MSG_RAIL_SHARING_POLICY   : 0
	MV2_SPIN_COUNT                      : 5000
	MV2_SRQ_LIMIT                       : 10
	MV2_SRQ_MAX_SIZE                    : 32767
	MV2_SRQ_SIZE                        : 80
	MV2_STRIPING_THRESHOLD              : 16384
	MV2_USE_BLOCKING                    : 1
	MV2_USE_COALESCE                    : 1
	MV2_USE_XRC                         : 0
	MV2_VBUF_MAX                        : -1
	MV2_VBUF_POOL_SIZE                  : 80
	MV2_VBUF_SECONDARY_POOL_SIZE        : 16
	MV2_VBUF_TOTAL_SIZE                 : 16384
	MV2_USE_IWARP_MODE                  : 0
	MV2_USE_HWLOC_CPU_BINDING           : 1
	MV2_ENABLE_AFFINITY                 : 1
	MV2_HCA_AWARE_PROCESS_MAPPING       : 1
	MV2_ENABLE_LEASTLOAD                : 0
	MV2_SMP_BATCH_SIZE                  : 8
	MV2_SMP_EAGERSIZE                   : 65537
	MV2_SMP_QUEUE_LENGTH                : 262144
	MV2_SMP_NUM_SEND_BUFFER             : 256
	MV2_SMP_SEND_BUF_SIZE               : 8192
	MV2_USE_SHARED_MEM                  : 1
	MV2_SMP_CMA_MAX_SIZE                : 0
	MV2_SMP_LIMIC2_MAX_SIZE             : 0
	MV2_SHOW_ENV_INFO                   : 3
	MV2_DEFAULT_PUT_GET_LIST_SIZE       : 200
	MV2_EAGERSIZE_1SC                   : 4096
	MV2_GET_FALLBACK_THRESHOLD          : 262144
	MV2_PIN_POOL_SIZE                   : 2097152
	MV2_PUT_FALLBACK_THRESHOLD          : 8192
	MV2_USE_RDMA_CM                     : 0
	MV2_UD_MAX_ACK_PENDING              : 100
	MV2_UD_MAX_RECV_WQE                 : 4096
	MV2_UD_MAX_RETRY_TIMEOUT            : 20000000
	MV2_UD_MAX_SEND_WQE                 : 2048
	MV2_UD_MTU                          : 4096
	MV2_UD_NUM_MSG_LIMIT                : 512
	MV2_UD_NUM_ZCOPY_RNDV_QPS           : 64
	MV2_UD_PROGRESS_SPIN                : 1200
	MV2_UD_PROGRESS_TIMEOUT             : 48000
	MV2_UD_RECVWINDOW_SIZE              : 2501
	MV2_UD_RETRY_COUNT                  : 1024
	MV2_UD_RETRY_TIMEOUT                : 500000
	MV2_UD_SENDWINDOW_SIZE              : 400
	MV2_UD_VBUF_POOL_SIZE               : 8192
	MV2_UD_ZCOPY_RQ_SIZE                : 4096
	MV2_UD_ZCOPY_THRESHOLD              : 16384
	MV2_USE_UD_ZCOPY                    : 1
	MV2_USE_UD_HYBRID                   : 0
	MV2_USE_ONLY_UD                     : 0
	MV2_HYBRID_ENABLE_THRESHOLD         : 1024
	MV2_HYBRID_MAX_RC_CONN              : 32
	MV2_ASYNC_THREAD_STACK_SIZE         : 1048576
	MV2_THREAD_YIELD_SPIN_THRESHOLD     : 5
	MV2_SUPPORT_DPM                     : 1
	MV2_USE_HUGEPAGES                   : 1
---------------------------------------------------------------------

Collective Tuning Tables
	Collective           Architecture                             Interconnect                            
	Allgather            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Allreduce            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Alltoall             MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Alltoallv            MV2_ARCH_INTEL_GENERIC                   MV2_HCA_MLX_CX_CONNIB                   
	Broadcast            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Gather               MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Reduce               MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Scatter              MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           

---------------------------------------------------------------------
WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing without InfiniBand registration cache support.

 MVAPICH2-2.3.4 Parameters
---------------------------------------------------------------------
	PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
	PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
	PROCESSOR MODEL NUMBER         : 79
	HCA NAME                       : MV2_HCA_MLX_CX_CONNIB
	HETEROGENEOUS HCA              : NO
	MV2_VBUF_TOTAL_SIZE            : 16384
	MV2_IBA_EAGER_THRESHOLD        : 16384
	MV2_RDMA_FAST_PATH_BUF_SIZE    : 4096
	MV2_PUT_FALLBACK_THRESHOLD     : 8192
	MV2_GET_FALLBACK_THRESHOLD     : 262144
	MV2_EAGERSIZE_1SC              : 4096
	MV2_SMP_EAGERSIZE              : 65537
	MV2_SMP_QUEUE_LENGTH           : 262144
	MV2_SMP_NUM_SEND_BUFFER        : 256
	MV2_SMP_BATCH_SIZE             : 8
	Tuning Table:                  : MV2_ARCH_UNKWN MV2_HCA_UNKWN
---------------------------------------------------------------------

 MVAPICH2 All Parameters
	MV2_COMM_WORLD_LOCAL_RANK           : 0
	MPIRUN_RSH_LAUNCH                   : 0
	MV2_SHMEM_BACKED_UD_CM              : 0
	MV2_3DTORUS_SUPPORT                 : 0
	MV2_NUM_SA_QUERY_RETRIES            : 20
	MV2_NUM_SLS                         : 8
	MV2_DEFAULT_SERVICE_LEVEL           : 0
	MV2_PATH_SL_QUERY                   : 0
	MV2_USE_QOS                         : 0
	MV2_ALLGATHER_BRUCK_THRESHOLD       : 524288
	MV2_ALLGATHER_RD_THRESHOLD          : 81920
	MV2_ALLGATHER_REVERSE_RANKING       : 1
	MV2_ALLGATHERV_RD_THRESHOLD         : 0
	MV2_ALLREDUCE_2LEVEL_MSG            : 262144
	MV2_ALLREDUCE_SHORT_MSG             : 2048
	MV2_ALLTOALL_MEDIUM_MSG             : 16384
	MV2_ALLTOALL_SMALL_MSG              : 2048
	MV2_ALLTOALL_THROTTLE_FACTOR        : 32
	MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE     : 64
	MV2_GATHER_SWITCH_PT                : 0
	MV2_INTRA_SHMEM_REDUCE_MSG          : 2048
	MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
	MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
	MV2_KNOMIAL_INTER_LEADER_THRESHOLD  : 65536
	MV2_KNOMIAL_INTER_NODE_FACTOR       : 4
	MV2_KNOMIAL_INTRA_NODE_FACTOR       : 4
	MV2_KNOMIAL_INTRA_NODE_THRESHOLD    : 131072
	MV2_RED_SCAT_LARGE_MSG              : 524288
	MV2_RED_SCAT_SHORT_MSG              : 64
	MV2_REDUCE_2LEVEL_MSG               : 16384
	MV2_REDUCE_SHORT_MSG                : 8192
	MV2_SCATTER_MEDIUM_MSG              : 0
	MV2_SCATTER_SMALL_MSG               : 0
	MV2_SHMEM_ALLREDUCE_MSG             : 32768
	MV2_SHMEM_COLL_MAX_MSG_SIZE         : 131072
	MV2_SHMEM_COLL_NUM_COMM             : 8
	MV2_SHMEM_COLL_NUM_PROCS            : 64
	MV2_SHMEM_COLL_SPIN_COUNT           : 5
	MV2_SHMEM_REDUCE_MSG                : 4096
	MV2_USE_BCAST_SHORT_MSG             : 16384
	MV2_USE_DIRECT_GATHER               : 1
	MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
	MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
	MV2_USE_DIRECT_SCATTER              : 1
	MV2_USE_OSU_COLLECTIVES             : 1
	MV2_USE_OSU_NB_COLLECTIVES          : 1
	MV2_USE_KNOMIAL_2LEVEL_BCAST        : 1
	MV2_USE_KNOMIAL_INTER_LEADER_BCAST  : 1
	MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
	MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
	MV2_USE_SHMEM_ALLREDUCE             : 1
	MV2_USE_SHMEM_BARRIER               : 1
	MV2_USE_SHMEM_BCAST                 : 1
	MV2_USE_SHMEM_COLL                  : 0
	MV2_USE_SHMEM_REDUCE                : 1
	MV2_USE_TWO_LEVEL_GATHER            : 1
	MV2_USE_TWO_LEVEL_SCATTER           : 1
	MV2_USE_XOR_ALLTOALL                : 1
	MV2_ENABLE_SOCKET_AWARE_COLLECTIVES : 1
	MV2_USE_SOCKET_AWARE_ALLREDUCE      : 1
	MV2_USE_SOCKET_AWARE_BARRIER        : 1
	MV2_USE_SOCKET_AWARE_SHARP_ALLREDUCE : 0
	MV2_SOCKET_AWARE_ALLREDUCE_MAX_MSG  : 2048
	MV2_SOCKET_AWARE_ALLREDUCE_MIN_MSG  : 1
	MV2_DEFAULT_SRC_PATH_BITS           : 0
	MV2_DEFAULT_STATIC_RATE             : 0
	MV2_DEFAULT_TIME_OUT                : 330772
	MV2_DEFAULT_MTU                     : 5
	MV2_DEFAULT_PKEY                    : 0
	MV2_DEFAULT_QKEY                    : 0
	MV2_DEFAULT_PORT                    : 1
	MV2_DEFAULT_GID_INDEX               : 0
	MV2_DEFAULT_PSN                     : 0
	MV2_DEFAULT_MAX_RECV_WQE            : 128
	MV2_DEFAULT_MAX_SEND_WQE            : 64
	MV2_DEFAULT_MAX_SG_LIST             : 1
	MV2_DEFAULT_MIN_RNR_TIMER           : 12
	MV2_DEFAULT_QP_OUS_RD_ATOM          : 260
	MV2_DEFAULT_RETRY_COUNT             : 84677639
	MV2_DEFAULT_RNR_RETRY               : 202639111
	MV2_DEFAULT_MAX_CQ_SIZE             : 40000
	MV2_DEFAULT_MAX_RDMA_DST_OPS        : 4
	MV2_INITIAL_PREPOST_DEPTH           : 10
	MV2_IWARP_MULTIPLE_CQ_THRESHOLD     : 32
	MV2_NUM_HCAS                        : 1
	MV2_NUM_PORTS                       : 1
	MV2_NUM_QP_PER_PORT                 : 1
	MV2_MAX_RDMA_CONNECT_ATTEMPTS       : 20
	MV2_ON_DEMAND_UD_INFO_EXCHANGE      : 0
	MV2_PREPOST_DEPTH                   : 64
	MV2_HOMOGENEOUS_CLUSTER             : 0
	MV2_NUM_CQES_PER_POLL               : 96
	MV2_COALESCE_THRESHOLD              : 6
	MV2_DREG_CACHE_LIMIT                : 0
	MV2_IBA_EAGER_THRESHOLD             : 16384
	MV2_MAX_INLINE_SIZE                 : 168
	MV2_MAX_R3_PENDING_DATA             : 524288
	MV2_MED_MSG_RAIL_SHARING_POLICY     : 0
	MV2_NDREG_ENTRIES                   : 1116
	MV2_NUM_RDMA_BUFFER                 : 16
	MV2_NUM_SPINS_BEFORE_LOCK           : 2000
	MV2_POLLING_LEVEL                   : 1
	MV2_POLLING_SET_LIMIT               : 64
	MV2_POLLING_SET_THRESHOLD           : 256
	MV2_R3_NOCACHE_THRESHOLD            : 32768
	MV2_R3_THRESHOLD                    : 4096
	MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
	MV2_RAIL_SHARING_MED_MSG_THRESHOLD  : 2048
	MV2_RAIL_SHARING_POLICY             : 4
	MV2_RDMA_EAGER_LIMIT                : 32
	MV2_RDMA_FAST_PATH_BUF_SIZE         : 4096
	MV2_RDMA_NUM_EXTRA_POLLS            : 1
	MV2_RNDV_EXT_SENDQ_SIZE             : 5
	MV2_RNDV_PROTOCOL                   : 4
	MV2_SMP_RNDV_PROTOCOL               : 4
	MV2_SMALL_MSG_RAIL_SHARING_POLICY   : 0
	MV2_SPIN_COUNT                      : 5000
	MV2_SRQ_LIMIT                       : 10
	MV2_SRQ_MAX_SIZE                    : 32767
	MV2_SRQ_SIZE                        : 80
	MV2_STRIPING_THRESHOLD              : 16384
	MV2_USE_BLOCKING                    : 1
	MV2_USE_COALESCE                    : 1
	MV2_USE_XRC                         : 0
	MV2_VBUF_MAX                        : -1
	MV2_VBUF_POOL_SIZE                  : 80
	MV2_VBUF_SECONDARY_POOL_SIZE        : 16
	MV2_VBUF_TOTAL_SIZE                 : 16384
	MV2_USE_IWARP_MODE                  : 0
	MV2_USE_HWLOC_CPU_BINDING           : 1
	MV2_ENABLE_AFFINITY                 : 1
	MV2_HCA_AWARE_PROCESS_MAPPING       : 1
	MV2_ENABLE_LEASTLOAD                : 0
	MV2_SMP_BATCH_SIZE                  : 8
	MV2_SMP_EAGERSIZE                   : 65537
	MV2_SMP_QUEUE_LENGTH                : 262144
	MV2_SMP_NUM_SEND_BUFFER             : 256
	MV2_SMP_SEND_BUF_SIZE               : 8192
	MV2_USE_SHARED_MEM                  : 1
	MV2_SMP_CMA_MAX_SIZE                : 0
	MV2_SMP_LIMIC2_MAX_SIZE             : 0
	MV2_SHOW_ENV_INFO                   : 3
	MV2_DEFAULT_PUT_GET_LIST_SIZE       : 200
	MV2_EAGERSIZE_1SC                   : 4096
	MV2_GET_FALLBACK_THRESHOLD          : 262144
	MV2_PIN_POOL_SIZE                   : 2097152
	MV2_PUT_FALLBACK_THRESHOLD          : 8192
	MV2_USE_RDMA_CM                     : 0
	MV2_UD_MAX_ACK_PENDING              : 100
	MV2_UD_MAX_RECV_WQE                 : 4096
	MV2_UD_MAX_RETRY_TIMEOUT            : 20000000
	MV2_UD_MAX_SEND_WQE                 : 2048
	MV2_UD_MTU                          : 4096
	MV2_UD_NUM_MSG_LIMIT                : 512
	MV2_UD_NUM_ZCOPY_RNDV_QPS           : 64
	MV2_UD_PROGRESS_SPIN                : 1200
	MV2_UD_PROGRESS_TIMEOUT             : 48000
	MV2_UD_RECVWINDOW_SIZE              : 2501
	MV2_UD_RETRY_COUNT                  : 1024
	MV2_UD_RETRY_TIMEOUT                : 500000
	MV2_UD_SENDWINDOW_SIZE              : 400
	MV2_UD_VBUF_POOL_SIZE               : 8192
	MV2_UD_ZCOPY_RQ_SIZE                : 4096
	MV2_UD_ZCOPY_THRESHOLD              : 16384
	MV2_USE_UD_ZCOPY                    : 1
	MV2_USE_UD_HYBRID                   : 0
	MV2_USE_ONLY_UD                     : 0
	MV2_HYBRID_ENABLE_THRESHOLD         : 1024
	MV2_HYBRID_MAX_RC_CONN              : 32
	MV2_ASYNC_THREAD_STACK_SIZE         : 1048576
	MV2_THREAD_YIELD_SPIN_THRESHOLD     : 5
	MV2_SUPPORT_DPM                     : 1
	MV2_USE_HUGEPAGES                   : 1
---------------------------------------------------------------------

Collective Tuning Tables
	Collective           Architecture                             Interconnect                            
	Allgather            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Allreduce            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Alltoall             MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Alltoallv            MV2_ARCH_INTEL_GENERIC                   MV2_HCA_MLX_CX_CONNIB                   
	Broadcast            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Gather               MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Reduce               MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Scatter              MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           

---------------------------------------------------------------------
-------------- next part --------------
Command 'mpirun' is pid #605314.
WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing without InfiniBand registration cache support.
[worker04.local:mpi_rank_0][rdma_param_handle_heterogeneity] All nodes involved in the job were detected to be homogeneous in terms of processors and interconnects. Setting MV2_HOMOGENEOUS_CLUSTER=1 can improve job startup performance on such systems. The following link has more details on enhancing job startup performance. http://mvapich.cse.ohio-state.edu/performance/job-startup/.
[worker04.local:mpi_rank_0][rdma_param_handle_heterogeneity] To suppress this warning, please set MV2_SUPPRESS_JOB_STARTUP_PERFORMANCE_WARNING to 1

 MVAPICH2-2.3.5 Parameters
---------------------------------------------------------------------
	PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
	PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
	PROCESSOR MODEL NUMBER         : 79
	HCA NAME                       : MV2_HCA_MLX_CX_CONNIB
	HETEROGENEOUS HCA              : NO
	MV2_VBUF_TOTAL_SIZE            : 16384
	MV2_IBA_EAGER_THRESHOLD        : 16384
	MV2_RDMA_FAST_PATH_BUF_SIZE    : 4096
	MV2_PUT_FALLBACK_THRESHOLD     : 8192
	MV2_GET_FALLBACK_THRESHOLD     : 262144
	MV2_EAGERSIZE_1SC              : 4096
	MV2_SMP_EAGERSIZE              : 65537
	MV2_SMP_QUEUE_LENGTH           : 262144
	MV2_SMP_NUM_SEND_BUFFER        : 256
	MV2_SMP_BATCH_SIZE             : 8
	Tuning Table:                  : MV2_ARCH_UNKWN MV2_HCA_UNKWN
---------------------------------------------------------------------

 MVAPICH2 All Parameters
	MV2_COMM_WORLD_LOCAL_RANK           : 0
	MPIRUN_RSH_LAUNCH                   : 0
	MV2_SHMEM_BACKED_UD_CM              : 0
	MV2_3DTORUS_SUPPORT                 : 0
	MV2_NUM_SA_QUERY_RETRIES            : 20
	MV2_NUM_SLS                         : 8
	MV2_DEFAULT_SERVICE_LEVEL           : 0
	MV2_PATH_SL_QUERY                   : 0
	MV2_USE_QOS                         : 0
	MV2_ALLGATHER_BRUCK_THRESHOLD       : 524288
	MV2_ALLGATHER_RD_THRESHOLD          : 81920
	MV2_ALLGATHER_REVERSE_RANKING       : 1
	MV2_ALLGATHERV_RD_THRESHOLD         : 0
	MV2_ALLREDUCE_2LEVEL_MSG            : 262144
	MV2_ALLREDUCE_SHORT_MSG             : 2048
	MV2_ALLTOALL_MEDIUM_MSG             : 16384
	MV2_ALLTOALL_SMALL_MSG              : 2048
	MV2_ALLTOALL_THROTTLE_FACTOR        : 32
	MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE     : 64
	MV2_GATHER_SWITCH_PT                : 0
	MV2_INTRA_SHMEM_REDUCE_MSG          : 2048
	MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
	MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
	MV2_KNOMIAL_INTER_LEADER_THRESHOLD  : 65536
	MV2_KNOMIAL_INTER_NODE_FACTOR       : 4
	MV2_KNOMIAL_INTRA_NODE_FACTOR       : 4
	MV2_KNOMIAL_INTRA_NODE_THRESHOLD    : 131072
	MV2_RED_SCAT_LARGE_MSG              : 524288
	MV2_RED_SCAT_SHORT_MSG              : 64
	MV2_REDUCE_2LEVEL_MSG               : 16384
	MV2_REDUCE_SHORT_MSG                : 8192
	MV2_SCATTER_MEDIUM_MSG              : 0
	MV2_SCATTER_SMALL_MSG               : 0
	MV2_SHMEM_ALLREDUCE_MSG             : 32768
	MV2_SHMEM_COLL_MAX_MSG_SIZE         : 131072
	MV2_SHMEM_COLL_NUM_COMM             : 32
	MV2_SHMEM_COLL_NUM_PROCS            : 64
	MV2_SHMEM_COLL_SPIN_COUNT           : 5
	MV2_SHMEM_REDUCE_MSG                : 4096
	MV2_USE_BCAST_SHORT_MSG             : 16384
	MV2_USE_DIRECT_GATHER               : 1
	MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
	MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
	MV2_USE_DIRECT_SCATTER              : 1
	MV2_USE_OSU_COLLECTIVES             : 1
	MV2_USE_OSU_NB_COLLECTIVES          : 1
	MV2_USE_KNOMIAL_2LEVEL_BCAST        : 1
	MV2_USE_KNOMIAL_INTER_LEADER_BCAST  : 1
	MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
	MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
	MV2_USE_SHMEM_ALLREDUCE             : 1
	MV2_USE_SHMEM_BARRIER               : 1
	MV2_USE_SHMEM_BCAST                 : 1
	MV2_USE_SHMEM_COLL                  : 0
	MV2_USE_SHMEM_REDUCE                : 1
	MV2_USE_TWO_LEVEL_GATHER            : 1
	MV2_USE_TWO_LEVEL_SCATTER           : 1
	MV2_USE_XOR_ALLTOALL                : 1
	MV2_ENABLE_SOCKET_AWARE_COLLECTIVES : 1
	MV2_USE_SOCKET_AWARE_ALLREDUCE      : 1
	MV2_USE_SOCKET_AWARE_BARRIER        : 1
	MV2_USE_SOCKET_AWARE_SHARP_ALLREDUCE : 0
	MV2_SOCKET_AWARE_ALLREDUCE_MAX_MSG  : 2048
	MV2_SOCKET_AWARE_ALLREDUCE_MIN_MSG  : 1
	MV2_DEFAULT_SRC_PATH_BITS           : 0
	MV2_DEFAULT_STATIC_RATE             : 0
	MV2_DEFAULT_TIME_OUT                : 330772
	MV2_DEFAULT_MTU                     : 5
	MV2_DEFAULT_PKEY                    : 0
	MV2_DEFAULT_QKEY                    : 0
	MV2_DEFAULT_PORT                    : 1
	MV2_DEFAULT_GID_INDEX               : 0
	MV2_DEFAULT_PSN                     : 0
	MV2_DEFAULT_MAX_RECV_WQE            : 128
	MV2_DEFAULT_MAX_SEND_WQE            : 64
	MV2_DEFAULT_MAX_SG_LIST             : 1
	MV2_DEFAULT_MIN_RNR_TIMER           : 12
	MV2_DEFAULT_QP_OUS_RD_ATOM          : 272
	MV2_DEFAULT_RETRY_COUNT             : 84677639
	MV2_DEFAULT_RNR_RETRY               : 202639111
	MV2_DEFAULT_MAX_CQ_SIZE             : 40000
	MV2_DEFAULT_MAX_RDMA_DST_OPS        : 4
	MV2_INITIAL_PREPOST_DEPTH           : 10
	MV2_IWARP_MULTIPLE_CQ_THRESHOLD     : 32
	MV2_NUM_HCAS                        : 1
	MV2_NUM_PORTS                       : 1
	MV2_NUM_QP_PER_PORT                 : 1
	MV2_MAX_RDMA_CONNECT_ATTEMPTS       : 20
	MV2_ON_DEMAND_UD_INFO_EXCHANGE      : 0
	MV2_PREPOST_DEPTH                   : 64
	MV2_HOMOGENEOUS_CLUSTER             : 0
	MV2_NUM_CQES_PER_POLL               : 96
	MV2_COALESCE_THRESHOLD              : 6
	MV2_DREG_CACHE_LIMIT                : 0
	MV2_IBA_EAGER_THRESHOLD             : 16384
	MV2_MAX_INLINE_SIZE                 : 168
	MV2_MAX_R3_PENDING_DATA             : 524288
	MV2_MED_MSG_RAIL_SHARING_POLICY     : 0
	MV2_NDREG_ENTRIES                   : 8208
	MV2_NUM_RDMA_BUFFER                 : 16
	MV2_NUM_SPINS_BEFORE_LOCK           : 2000
	MV2_POLLING_LEVEL                   : 1
	MV2_POLLING_SET_LIMIT               : 64
	MV2_POLLING_SET_THRESHOLD           : 256
	MV2_R3_NOCACHE_THRESHOLD            : 32768
	MV2_R3_THRESHOLD                    : 4096
	MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
	MV2_RAIL_SHARING_MED_MSG_THRESHOLD  : 2048
	MV2_RAIL_SHARING_POLICY             : 4
	MV2_RDMA_EAGER_LIMIT                : 32
	MV2_RDMA_FAST_PATH_BUF_SIZE         : 4096
	MV2_RDMA_NUM_EXTRA_POLLS            : 1
	MV2_RNDV_EXT_SENDQ_SIZE             : 5
	MV2_RNDV_PROTOCOL                   : 4
	MV2_SMP_RNDV_PROTOCOL               : 4
	MV2_SMALL_MSG_RAIL_SHARING_POLICY   : 0
	MV2_SPIN_COUNT                      : 5000
	MV2_SRQ_LIMIT                       : 10
	MV2_SRQ_MAX_SIZE                    : 32767
	MV2_SRQ_SIZE                        : 80
	MV2_STRIPING_THRESHOLD              : 16384
	MV2_USE_BLOCKING                    : 1
	MV2_USE_COALESCE                    : 1
	MV2_USE_XRC                         : 0
	MV2_VBUF_MAX                        : -1
	MV2_VBUF_POOL_SIZE                  : 80
	MV2_VBUF_SECONDARY_POOL_SIZE        : 16
	MV2_VBUF_TOTAL_SIZE                 : 16384
	MV2_USE_IWARP_MODE                  : 0
	MV2_USE_HWLOC_CPU_BINDING           : 1
	MV2_ENABLE_AFFINITY                 : 1
	MV2_ENABLE_LEASTLOAD                : 0
	MV2_SMP_BATCH_SIZE                  : 8
	MV2_SMP_EAGERSIZE                   : 65537
	MV2_SMP_QUEUE_LENGTH                : 262144
	MV2_SMP_NUM_SEND_BUFFER             : 256
	MV2_SMP_SEND_BUF_SIZE               : 8192
	MV2_USE_SHARED_MEM                  : 1
	MV2_SMP_CMA_MAX_SIZE                : 0
	MV2_SMP_LIMIC2_MAX_SIZE             : 0
	MV2_SHOW_ENV_INFO                   : 3
	MV2_DEFAULT_PUT_GET_LIST_SIZE       : 200
	MV2_EAGERSIZE_1SC                   : 4096
	MV2_GET_FALLBACK_THRESHOLD          : 262144
	MV2_PIN_POOL_SIZE                   : 2097152
	MV2_PUT_FALLBACK_THRESHOLD          : 8192
	MV2_USE_RDMA_CM                     : 0
	MV2_UD_MAX_ACK_PENDING              : 100
	MV2_UD_MAX_RECV_WQE                 : 4096
	MV2_UD_MAX_RETRY_TIMEOUT            : 20000000
	MV2_UD_MAX_SEND_WQE                 : 2048
	MV2_UD_MTU                          : 4096
	MV2_UD_NUM_MSG_LIMIT                : 512
	MV2_UD_NUM_ZCOPY_RNDV_QPS           : 64
	MV2_UD_PROGRESS_SPIN                : 1200
	MV2_UD_PROGRESS_TIMEOUT             : 48000
	MV2_UD_RECVWINDOW_SIZE              : 2501
	MV2_UD_RETRY_COUNT                  : 1024
	MV2_UD_RETRY_TIMEOUT                : 500000
	MV2_UD_SENDWINDOW_SIZE              : 400
	MV2_UD_VBUF_POOL_SIZE               : 8192
	MV2_UD_ZCOPY_RQ_SIZE                : 4096
	MV2_UD_ZCOPY_THRESHOLD              : 16384
	MV2_UD_ZCOPY_NUM_RETRY              : 50000
	MV2_USE_UD_ZCOPY                    : 1
	MV2_USE_UD_HYBRID                   : 0
	MV2_USE_ONLY_UD                     : 0
	MV2_HYBRID_ENABLE_THRESHOLD         : 1024
	MV2_HYBRID_MAX_RC_CONN              : 32
	MV2_ASYNC_THREAD_STACK_SIZE         : 1048576
	MV2_THREAD_YIELD_SPIN_THRESHOLD     : 5
	MV2_SUPPORT_DPM                     : 1
	MV2_USE_HUGEPAGES                   : 1
---------------------------------------------------------------------

Collective Tuning Tables
	Collective           Architecture                             Interconnect                            
	Allgather            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Allreduce            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Alltoall             MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Alltoallv            MV2_ARCH_INTEL_GENERIC                   MV2_HCA_MLX_CX_CONNIB                   
	Broadcast            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Gather               MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Reduce               MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Scatter              MV2_ARCH_UNKWN                           MV2_HCA_INTEL_NE020                     

---------------------------------------------------------------------
WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing without InfiniBand registration cache support.
[worker04.local:mpi_rank_0][rdma_param_handle_heterogeneity] All nodes involved in the job were detected to be homogeneous in terms of processors and interconnects. Setting MV2_HOMOGENEOUS_CLUSTER=1 can improve job startup performance on such systems. The following link has more details on enhancing job startup performance. http://mvapich.cse.ohio-state.edu/performance/job-startup/.
[worker04.local:mpi_rank_0][rdma_param_handle_heterogeneity] To suppress this warning, please set MV2_SUPPRESS_JOB_STARTUP_PERFORMANCE_WARNING to 1

 MVAPICH2-2.3.5 Parameters
---------------------------------------------------------------------
	PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
	PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
	PROCESSOR MODEL NUMBER         : 79
	HCA NAME                       : MV2_HCA_MLX_CX_CONNIB
	HETEROGENEOUS HCA              : NO
	MV2_VBUF_TOTAL_SIZE            : 16384
	MV2_IBA_EAGER_THRESHOLD        : 16384
	MV2_RDMA_FAST_PATH_BUF_SIZE    : 4096
	MV2_PUT_FALLBACK_THRESHOLD     : 8192
	MV2_GET_FALLBACK_THRESHOLD     : 262144
	MV2_EAGERSIZE_1SC              : 4096
	MV2_SMP_EAGERSIZE              : 65537
	MV2_SMP_QUEUE_LENGTH           : 262144
	MV2_SMP_NUM_SEND_BUFFER        : 256
	MV2_SMP_BATCH_SIZE             : 8
	Tuning Table:                  : MV2_ARCH_UNKWN MV2_HCA_UNKWN
---------------------------------------------------------------------

 MVAPICH2 All Parameters
	MV2_COMM_WORLD_LOCAL_RANK           : 0
	MPIRUN_RSH_LAUNCH                   : 0
	MV2_SHMEM_BACKED_UD_CM              : 0
	MV2_3DTORUS_SUPPORT                 : 0
	MV2_NUM_SA_QUERY_RETRIES            : 20
	MV2_NUM_SLS                         : 8
	MV2_DEFAULT_SERVICE_LEVEL           : 0
	MV2_PATH_SL_QUERY                   : 0
	MV2_USE_QOS                         : 0
	MV2_ALLGATHER_BRUCK_THRESHOLD       : 524288
	MV2_ALLGATHER_RD_THRESHOLD          : 81920
	MV2_ALLGATHER_REVERSE_RANKING       : 1
	MV2_ALLGATHERV_RD_THRESHOLD         : 0
	MV2_ALLREDUCE_2LEVEL_MSG            : 262144
	MV2_ALLREDUCE_SHORT_MSG             : 2048
	MV2_ALLTOALL_MEDIUM_MSG             : 16384
	MV2_ALLTOALL_SMALL_MSG              : 2048
	MV2_ALLTOALL_THROTTLE_FACTOR        : 32
	MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE     : 64
	MV2_GATHER_SWITCH_PT                : 0
	MV2_INTRA_SHMEM_REDUCE_MSG          : 2048
	MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
	MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
	MV2_KNOMIAL_INTER_LEADER_THRESHOLD  : 65536
	MV2_KNOMIAL_INTER_NODE_FACTOR       : 4
	MV2_KNOMIAL_INTRA_NODE_FACTOR       : 4
	MV2_KNOMIAL_INTRA_NODE_THRESHOLD    : 131072
	MV2_RED_SCAT_LARGE_MSG              : 524288
	MV2_RED_SCAT_SHORT_MSG              : 64
	MV2_REDUCE_2LEVEL_MSG               : 16384
	MV2_REDUCE_SHORT_MSG                : 8192
	MV2_SCATTER_MEDIUM_MSG              : 0
	MV2_SCATTER_SMALL_MSG               : 0
	MV2_SHMEM_ALLREDUCE_MSG             : 32768
	MV2_SHMEM_COLL_MAX_MSG_SIZE         : 131072
	MV2_SHMEM_COLL_NUM_COMM             : 32
	MV2_SHMEM_COLL_NUM_PROCS            : 64
	MV2_SHMEM_COLL_SPIN_COUNT           : 5
	MV2_SHMEM_REDUCE_MSG                : 4096
	MV2_USE_BCAST_SHORT_MSG             : 16384
	MV2_USE_DIRECT_GATHER               : 1
	MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
	MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
	MV2_USE_DIRECT_SCATTER              : 1
	MV2_USE_OSU_COLLECTIVES             : 1
	MV2_USE_OSU_NB_COLLECTIVES          : 1
	MV2_USE_KNOMIAL_2LEVEL_BCAST        : 1
	MV2_USE_KNOMIAL_INTER_LEADER_BCAST  : 1
	MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
	MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
	MV2_USE_SHMEM_ALLREDUCE             : 1
	MV2_USE_SHMEM_BARRIER               : 1
	MV2_USE_SHMEM_BCAST                 : 1
	MV2_USE_SHMEM_COLL                  : 0
	MV2_USE_SHMEM_REDUCE                : 1
	MV2_USE_TWO_LEVEL_GATHER            : 1
	MV2_USE_TWO_LEVEL_SCATTER           : 1
	MV2_USE_XOR_ALLTOALL                : 1
	MV2_ENABLE_SOCKET_AWARE_COLLECTIVES : 1
	MV2_USE_SOCKET_AWARE_ALLREDUCE      : 1
	MV2_USE_SOCKET_AWARE_BARRIER        : 1
	MV2_USE_SOCKET_AWARE_SHARP_ALLREDUCE : 0
	MV2_SOCKET_AWARE_ALLREDUCE_MAX_MSG  : 2048
	MV2_SOCKET_AWARE_ALLREDUCE_MIN_MSG  : 1
	MV2_DEFAULT_SRC_PATH_BITS           : 0
	MV2_DEFAULT_STATIC_RATE             : 0
	MV2_DEFAULT_TIME_OUT                : 330772
	MV2_DEFAULT_MTU                     : 5
	MV2_DEFAULT_PKEY                    : 0
	MV2_DEFAULT_QKEY                    : 0
	MV2_DEFAULT_PORT                    : 1
	MV2_DEFAULT_GID_INDEX               : 0
	MV2_DEFAULT_PSN                     : 0
	MV2_DEFAULT_MAX_RECV_WQE            : 128
	MV2_DEFAULT_MAX_SEND_WQE            : 64
	MV2_DEFAULT_MAX_SG_LIST             : 1
	MV2_DEFAULT_MIN_RNR_TIMER           : 12
	MV2_DEFAULT_QP_OUS_RD_ATOM          : 272
	MV2_DEFAULT_RETRY_COUNT             : 84677639
	MV2_DEFAULT_RNR_RETRY               : 202639111
	MV2_DEFAULT_MAX_CQ_SIZE             : 40000
	MV2_DEFAULT_MAX_RDMA_DST_OPS        : 4
	MV2_INITIAL_PREPOST_DEPTH           : 10
	MV2_IWARP_MULTIPLE_CQ_THRESHOLD     : 32
	MV2_NUM_HCAS                        : 1
	MV2_NUM_PORTS                       : 1
	MV2_NUM_QP_PER_PORT                 : 1
	MV2_MAX_RDMA_CONNECT_ATTEMPTS       : 20
	MV2_ON_DEMAND_UD_INFO_EXCHANGE      : 0
	MV2_PREPOST_DEPTH                   : 64
	MV2_HOMOGENEOUS_CLUSTER             : 0
	MV2_NUM_CQES_PER_POLL               : 96
	MV2_COALESCE_THRESHOLD              : 6
	MV2_DREG_CACHE_LIMIT                : 0
	MV2_IBA_EAGER_THRESHOLD             : 16384
	MV2_MAX_INLINE_SIZE                 : 168
	MV2_MAX_R3_PENDING_DATA             : 524288
	MV2_MED_MSG_RAIL_SHARING_POLICY     : 0
	MV2_NDREG_ENTRIES                   : 8208
	MV2_NUM_RDMA_BUFFER                 : 16
	MV2_NUM_SPINS_BEFORE_LOCK           : 2000
	MV2_POLLING_LEVEL                   : 1
	MV2_POLLING_SET_LIMIT               : 64
	MV2_POLLING_SET_THRESHOLD           : 256
	MV2_R3_NOCACHE_THRESHOLD            : 32768
	MV2_R3_THRESHOLD                    : 4096
	MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
	MV2_RAIL_SHARING_MED_MSG_THRESHOLD  : 2048
	MV2_RAIL_SHARING_POLICY             : 4
	MV2_RDMA_EAGER_LIMIT                : 32
	MV2_RDMA_FAST_PATH_BUF_SIZE         : 4096
	MV2_RDMA_NUM_EXTRA_POLLS            : 1
	MV2_RNDV_EXT_SENDQ_SIZE             : 5
	MV2_RNDV_PROTOCOL                   : 4
	MV2_SMP_RNDV_PROTOCOL               : 4
	MV2_SMALL_MSG_RAIL_SHARING_POLICY   : 0
	MV2_SPIN_COUNT                      : 5000
	MV2_SRQ_LIMIT                       : 10
	MV2_SRQ_MAX_SIZE                    : 32767
	MV2_SRQ_SIZE                        : 80
	MV2_STRIPING_THRESHOLD              : 16384
	MV2_USE_BLOCKING                    : 1
	MV2_USE_COALESCE                    : 1
	MV2_USE_XRC                         : 0
	MV2_VBUF_MAX                        : -1
	MV2_VBUF_POOL_SIZE                  : 80
	MV2_VBUF_SECONDARY_POOL_SIZE        : 16
	MV2_VBUF_TOTAL_SIZE                 : 16384
	MV2_USE_IWARP_MODE                  : 0
	MV2_USE_HWLOC_CPU_BINDING           : 1
	MV2_ENABLE_AFFINITY                 : 1
	MV2_ENABLE_LEASTLOAD                : 0
	MV2_SMP_BATCH_SIZE                  : 8
	MV2_SMP_EAGERSIZE                   : 65537
	MV2_SMP_QUEUE_LENGTH                : 262144
	MV2_SMP_NUM_SEND_BUFFER             : 256
	MV2_SMP_SEND_BUF_SIZE               : 8192
	MV2_USE_SHARED_MEM                  : 1
	MV2_SMP_CMA_MAX_SIZE                : 0
	MV2_SMP_LIMIC2_MAX_SIZE             : 0
	MV2_SHOW_ENV_INFO                   : 3
	MV2_DEFAULT_PUT_GET_LIST_SIZE       : 200
	MV2_EAGERSIZE_1SC                   : 4096
	MV2_GET_FALLBACK_THRESHOLD          : 262144
	MV2_PIN_POOL_SIZE                   : 2097152
	MV2_PUT_FALLBACK_THRESHOLD          : 8192
	MV2_USE_RDMA_CM                     : 0
	MV2_UD_MAX_ACK_PENDING              : 100
	MV2_UD_MAX_RECV_WQE                 : 4096
	MV2_UD_MAX_RETRY_TIMEOUT            : 20000000
	MV2_UD_MAX_SEND_WQE                 : 2048
	MV2_UD_MTU                          : 4096
	MV2_UD_NUM_MSG_LIMIT                : 512
	MV2_UD_NUM_ZCOPY_RNDV_QPS           : 64
	MV2_UD_PROGRESS_SPIN                : 1200
	MV2_UD_PROGRESS_TIMEOUT             : 48000
	MV2_UD_RECVWINDOW_SIZE              : 2501
	MV2_UD_RETRY_COUNT                  : 1024
	MV2_UD_RETRY_TIMEOUT                : 500000
	MV2_UD_SENDWINDOW_SIZE              : 400
	MV2_UD_VBUF_POOL_SIZE               : 8192
	MV2_UD_ZCOPY_RQ_SIZE                : 4096
	MV2_UD_ZCOPY_THRESHOLD              : 16384
	MV2_UD_ZCOPY_NUM_RETRY              : 50000
	MV2_USE_UD_ZCOPY                    : 1
	MV2_USE_UD_HYBRID                   : 0
	MV2_USE_ONLY_UD                     : 0
	MV2_HYBRID_ENABLE_THRESHOLD         : 1024
	MV2_HYBRID_MAX_RC_CONN              : 32
	MV2_ASYNC_THREAD_STACK_SIZE         : 1048576
	MV2_THREAD_YIELD_SPIN_THRESHOLD     : 5
	MV2_SUPPORT_DPM                     : 1
	MV2_USE_HUGEPAGES                   : 1
---------------------------------------------------------------------

Collective Tuning Tables
	Collective           Architecture                             Interconnect                            
	Allgather            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Allreduce            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Alltoall             MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Alltoallv            MV2_ARCH_INTEL_GENERIC                   MV2_HCA_MLX_CX_CONNIB                   
	Broadcast            MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Gather               MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Reduce               MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           
	Scatter              MV2_ARCH_UNKWN                           MV2_HCA_UNKWN                           

---------------------------------------------------------------------


More information about the Mvapich-discuss mailing list