[mvapich-discuss] mvapich hangs at startup on some new machines

Dovis Alessandro adovis at student.ethz.ch
Mon Jun 15 07:34:57 EDT 2015


Hello Jonathan,

The cpus are (from /proc/cpuinfo): 
        Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz .
The Mellanox interface (from lspci -vv):
        Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
        Subsystem: Mellanox Technologies Device 0051
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 80
        Region 0: Memory at d0f00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at cc000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at d0000000 [disabled] [size=1M]
        Capabilities: <access denied>
        Kernel driver in use: mlx4_core

I have run different experiments with MV2_SHOW_ENV_INFO:

- if I run osu_bw with 2 machines, both on new machines, it hangs without showing me any parameters (probably it hangs before getting to that);

- if I run both on old machines I get:

 MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
	PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
	PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
	PROCESSOR MODEL NUMBER         : 62
	HCA NAME                       : MV2_HCA_MLX_CX_FDR
	HETEROGENEOUS HCA              : NO
	MV2_VBUF_TOTAL_SIZE            : 16384
	MV2_IBA_EAGER_THRESHOLD        : 16384
	MV2_RDMA_FAST_PATH_BUF_SIZE    : 5120
	MV2_PUT_FALLBACK_THRESHOLD     : 8192
	MV2_GET_FALLBACK_THRESHOLD     : 0
	MV2_EAGERSIZE_1SC              : 4096
	MV2_SMP_EAGERSIZE              : 65537
	MV2_SMPI_LENGTH_QUEUE          : 262144
	MV2_SMP_NUM_SEND_BUFFER        : 256
	MV2_SMP_BATCH_SIZE             : 8
---------------------------------------------------------------------
---------------------------------------------------------------------

- if I run one node on old machine and one on new machine, I get:

 MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
	PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
	PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
	PROCESSOR MODEL NUMBER         : 62
	HCA NAME                       : MV2_HCA_MLX_CX_FDR
	HETEROGENEOUS HCA              : NO
	MV2_VBUF_TOTAL_SIZE            : 16384
	MV2_IBA_EAGER_THRESHOLD        : 16384
	MV2_RDMA_FAST_PATH_BUF_SIZE    : 5120
	MV2_PUT_FALLBACK_THRESHOLD     : 8192
	MV2_GET_FALLBACK_THRESHOLD     : 0
	MV2_EAGERSIZE_1SC              : 4096
	MV2_SMP_EAGERSIZE              : 65537
	MV2_SMPI_LENGTH_QUEUE          : 262144
	MV2_SMP_NUM_SEND_BUFFER        : 256
	MV2_SMP_BATCH_SIZE             : 8
---------------------------------------------------------------------
---------------------------------------------------------------------

- if I run it with '-n 1' on a single new machine, it shows (before obviously failing, complaining about the number of machines):

 MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
	PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
	PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
	PROCESSOR MODEL NUMBER         : 63
	HCA NAME                       : MV2_HCA_MLX_CX_FDR
	HETEROGENEOUS HCA              : NO
	MV2_EAGERSIZE_1SC              : 0
	MV2_SMP_EAGERSIZE              : 65537
	MV2_SMPI_LENGTH_QUEUE          : 262144
	MV2_SMP_NUM_SEND_BUFFER        : 256
	MV2_SMP_BATCH_SIZE             : 8
---------------------------------------------------------------------
---------------------------------------------------------------------
This test requires exactly two processes

For the backtraces, I will recompile the library with debugging and let you know.

Thanks,
Alessandro

 
________________________________________
From: Jonathan Perkins [perkinjo at cse.ohio-state.edu]
Sent: Friday, June 12, 2015 9:12 PM
To: Dovis  Alessandro; mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mvapich hangs at startup on some new machines

Hello.  Can you share with us the cpu architecture and the type of HCA you're using on your new systems?

In addition to this can you do a one process run while setting the MV2_SHOW_ENV_INFO variable to 1.  It may also be usefule to send us the backtrace of the process(es) when it hangs.

On Fri, Jun 12, 2015 at 7:48 AM Dovis Alessandro <adovis at student.ethz.ch<mailto:adovis at student.ethz.ch>> wrote:
Hello everyone,

a new cluster of machines has been installed and connected to the same Mellanox Infiniband switch as other machines I was already using with MVAPICH (everything works fine there).
I have installed MVAPICH on the new machines (reconfiguring and recompiling, because they have different architecture and kernel). Both clusters use MVAPICH2 2.1.

If I run `/opt/mvapich2-2.1/bin/mpiexec --host machine1,machine2 -n 2 ~/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw`, I see the following behaviour:
- runs well, if both machines in old cluster;
- runs well, if one machine in old cluster and one in new cluster;
- hangs at startup, if both machines in the new cluster.

I have seen the same behaviour with other executables, e.g. a simple 'hello world'.
Following are the outputs both for a hanging execution and for a succeeding one.

Thank you for your help.

Best,
Alessandro Dovis


--------------------------------------------------------------------------------------------------------------------

Copy-paste of the output of the execution that hangs (on new cluster), with '-v' flag:

host: r630-04
host: r630-01

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/mvapich2-2.1/bin/
  Launcher: (null)
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    LC_PAPER=en_DK.UTF-8
    TERM=xterm
    SHELL=/bin/bash
    SSH_CLIENT=10.2.131.222 59209 22
    SSH_TTY=/dev/pts/0
    USER=adovis
    LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
    MV2_ENABLE_AFFINITY=0
    MAIL=/var/mail/adovis
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
    LC_COLLATE=C
    PWD=/home/adovis
    JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
    LANG=en_US.UTF-8
    LC_MEASUREMENT=en_DK.UTF-8
    SHLVL=1
    HOME=/home/adovis
    LOGNAME=adovis
    SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
    LC_TIME=en_DK.UTF-8
    _=/opt/mvapich2-2.1/bin/mpiexec

  Hydra internal environment:
  ---------------------------
    GFORTRAN_UNBUFFERED_PRECONNECTED=y


    Proxy information:
    *********************
      [1] proxy: r630-04 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);

      [2] proxy: r630-01 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);


==================================================================================================

[mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
[mpiexec at r630-01] Got a control port string of r630-01:48645

Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id

Arguments being passed to proxy 0:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-04 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw

Arguments being passed to proxy 1:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw

[mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x r630-04 "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
[mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1
[proxy:0:1 at r630-01] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_maxes

[proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:1 at r630-01] got pmi command (from 0): get_appnum

[proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=PMI_process_mapping
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4979_0 key=hostname[1] value=08323329
[proxy:0:1 at r630-01] cached command: hostname[1]=08323329
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
[proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329) upstream
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at r630-04] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at r630-04] got pmi command (from 4): get_maxes

[proxy:0:0 at r630-04] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0 at r630-04] got pmi command (from 4): get_appnum

[proxy:0:0 at r630-04] PMI response: cmd=appnum appnum=0
[proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=PMI_process_mapping
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0 at r630-04] got pmi command (from 4): put
kvsname=kvs_4979_0 key=hostname[0] value=08323329
[proxy:0:0 at r630-04] cached command: hostname[0]=08323329
[proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] flushing 1 put command(s) out
[proxy:0:0 at r630-04] forwarding command (cmd=put hostname[0]=08323329) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=08323329
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=08323329
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=hostname[0]
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=hostname[1]
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): put
kvsname=kvs_4979_0 key=MVAPICH2_0000 value=00000008:00000070:00000071:
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4979_0 key=MVAPICH2_0001 value=00000011:00000097:00000098:
[proxy:0:1 at r630-01] cached command: MVAPICH2_0001=00000011:00000097:00000098:
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0 at r630-04] cached command: MVAPICH2_0000=00000008:00000070:00000071:
[proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0000=00000008:00000070:00000071:
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] flushing 1 put command(s) out
[proxy:0:0 at r630-04] forwarding command (cmd=put MVAPICH2_0000=00000008:00000070:00000071:) upstream
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put MVAPICH2_0001=00000011:00000097:00000098:) upstream
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=keyval_cache MVAPICH2_0000=00000008:00000070:00000071: MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=keyval_cache MVAPICH2_0000=00000008:00000070:00000071: MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000008:00000070:00000071:
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000008:00000070:00000071:
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=MVAPICH2_0001
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000097:00000098:
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=MVAPICH2_0001
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000097:00000098:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out

{here it hangs forever}

--------------------------------------------------------------------------------------------------------------------

A good run instead looks like the following (again with -v):

host: fdr1
host: r630-01

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/mvapich2-2.1/bin/
  Launcher: (null)
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    LC_PAPER=en_DK.UTF-8
    TERM=xterm
    SHELL=/bin/bash
    SSH_CLIENT=10.2.131.222 59209 22
    SSH_TTY=/dev/pts/0
    USER=adovis
    LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
    MV2_ENABLE_AFFINITY=0
    MAIL=/var/mail/adovis
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
    LC_COLLATE=C
    PWD=/home/adovis
    JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
    LANG=en_US.UTF-8
    LC_MEASUREMENT=en_DK.UTF-8
    SHLVL=1
    HOME=/home/adovis
    LOGNAME=adovis
    SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
    LC_TIME=en_DK.UTF-8
    _=/opt/mvapich2-2.1/bin/mpiexec

  Hydra internal environment:
  ---------------------------
    GFORTRAN_UNBUFFERED_PRECONNECTED=y


    Proxy information:
    *********************
      [1] proxy: fdr1 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);

      [2] proxy: r630-01 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);


==================================================================================================

[mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
[mpiexec at r630-01] Got a control port string of r630-01:39227

Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id

Arguments being passed to proxy 0:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname fdr1 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw

Arguments being passed to proxy 1:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw

[mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x fdr1 "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
[mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1
[proxy:0:1 at r630-01] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_maxes

[proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:1 at r630-01] got pmi command (from 0): get_appnum

[proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=PMI_process_mapping
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4992_0 key=hostname[1] value=08323329
[proxy:0:1 at r630-01] cached command: hostname[1]=08323329
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at fdr1] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at fdr1] got pmi command (from 4): get_maxes

[proxy:0:0 at fdr1] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0 at fdr1] got pmi command (from 4): get_appnum

[proxy:0:0 at fdr1] PMI response: cmd=appnum appnum=0
[proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=PMI_process_mapping
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0 at fdr1] got pmi command (from 4): put
kvsname=kvs_4992_0 key=hostname[0] value=17448404
[proxy:0:0 at fdr1] cached command: hostname[0]=17448404
[proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=17448404
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] flushing 1 put command(s) out
[proxy:0:0 at fdr1] forwarding command (cmd=put hostname[0]=17448404) upstream
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=17448404
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=17448404
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=hostname[0]
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=17448404
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=hostname[1]
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4992_0 key=MVAPICH2_0001 value=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] cached command: MVAPICH2_0001=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put MVAPICH2_0001=00000011:00000099:0000009a:) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0001=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] got pmi command (from 4): put
kvsname=kvs_4992_0 key=MVAPICH2_0000 value=00000004:00000304:00000305:
[proxy:0:0 at fdr1] cached command: MVAPICH2_0000=00000004:00000304:00000305:
[proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0000=00000004:00000304:00000305:
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] flushing 1 put command(s) out
[proxy:0:0 at fdr1] forwarding command (cmd=put MVAPICH2_0000=00000004:00000304:00000305:) upstream
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache MVAPICH2_0001=00000011:00000099:0000009a: MVAPICH2_0000=00000004:00000304:00000305:
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache MVAPICH2_0001=00000011:00000099:0000009a: MVAPICH2_0000=00000004:00000304:00000305:
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000004:00000304:00000305:
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000004:00000304:00000305:
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=MVAPICH2_0001
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000099:0000009a:
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=MVAPICH2_0001
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000099:0000009a:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
# OSU MPI Bandwidth Test v4.4.1
# Size      Bandwidth (MB/s)
1                       2.93
2                       5.84
4                      11.61
8                      22.82
16                     44.77
32                     91.21
64                    179.39
128                   341.07
256                   680.59
512                  1313.70
1024                 2463.06
2048                 3993.00
4096                 5147.77
8192                 5669.63
16384                5701.75
32768                5969.73
65536                6117.85
131072               6243.84
262144               6306.77
524288               6340.28
1048576              6356.89
2097152              6362.19
4194304              6273.45
8388608              6334.72
16777216             5762.50
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): finalize

[proxy:0:1 at r630-01] PMI response: cmd=finalize_ack
[proxy:0:0 at fdr1] got pmi command (from 4): finalize

[proxy:0:0 at fdr1] PMI response: cmd=finalize_ack


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


More information about the mvapich-discuss mailing list