[mvapich-discuss] mvapich hangs at startup on some new machines

Dovis Alessandro adovis at student.ethz.ch
Mon Jun 15 08:16:54 EDT 2015


The backtraces at the point where the processes hang are the following (taken by attaching gdb to the hanging processes):

- for the 'mpiexec':

#0  0x00007f8bec4854f0 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000000045a82a in HYDT_dmxu_poll_wait_for_event (wtime=-1) at tools/demux/demux_poll.c:39
#2  0x000000000045a130 in HYDT_dmx_wait_for_event (wtime=-1) at tools/demux/demux.c:171
#3  0x000000000040d003 in HYD_pmci_wait_for_completion (timeout=-1) at pm/pmiserv/pmiserv_pmci.c:195
#4  0x0000000000404aa3 in main (argc=9, argv=0x7ffd55720438) at ui/mpich/mpiexec.c:343

- for hydra:

#0  0x00007f7f8b4d84f0 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x000000000043b089 in HYDT_dmxu_poll_wait_for_event (wtime=-1) at tools/demux/demux_poll.c:39
#2  0x000000000043a98f in HYDT_dmx_wait_for_event (wtime=-1) at tools/demux/demux.c:171
#3  0x0000000000403c94 in main (argc=17, argv=0x7ffde625b2c8) at pm/pmiserv/pmip.c:205

- for the actual program on the first machine:

#0  MPIDI_CH3I_SMP_init (pg=0x28c2770) at src/mpid/ch3/channels/mrail/src/rdma/ch3_smp_progress.c:2139
#1  0x00007fa3a45fa8e0 in MPIDI_CH3_Init (has_parent=0, pg=0x28c2770, pg_rank=0) at src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c:445
#2  0x00007fa3a45dfd2f in MPID_Init (argc=0x7ffcc03385bc, argv=0x7ffcc03385b0, requested=0, provided=0x7ffcc0338544, has_args=0x7ffcc033854c, 
    has_env=0x7ffcc0338548) at src/mpid/ch3/src/mpid_init.c:357
#3  0x00007fa3a44be7d6 in MPIR_Init_thread (argc=0x7ffcc03385bc, argv=0x7ffcc03385b0, required=0, provided=0x7ffcc0338580)
    at src/mpi/init/initthread.c:512
#4  0x00007fa3a44bd204 in PMPI_Init (argc=0x7ffcc03385bc, argv=0x7ffcc03385b0) at src/mpi/init/init.c:195
#5  0x0000000000400c86 in main (argc=1, argv=0x7ffcc03386f8) at osu_bw.c:124

- actual program on the second machine:

#0  0x00007fb1bfc846fd in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fb1bfcac9d4 in usleep () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fb1c0433bd4 in MPIDI_CH3I_SMP_init (pg=0x35e8770) at src/mpid/ch3/channels/mrail/src/rdma/ch3_smp_progress.c:1918
#3  0x00007fb1c041b8e0 in MPIDI_CH3_Init (has_parent=0, pg=0x35e8770, pg_rank=1) at src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c:445
#4  0x00007fb1c0400d2f in MPID_Init (argc=0x7fff0a176bfc, argv=0x7fff0a176bf0, requested=0, provided=0x7fff0a176b84, has_args=0x7fff0a176b8c, 
    has_env=0x7fff0a176b88) at src/mpid/ch3/src/mpid_init.c:357
#5  0x00007fb1c02df7d6 in MPIR_Init_thread (argc=0x7fff0a176bfc, argv=0x7fff0a176bf0, required=0, provided=0x7fff0a176bc0)
    at src/mpi/init/initthread.c:512
#6  0x00007fb1c02de204 in PMPI_Init (argc=0x7fff0a176bfc, argv=0x7fff0a176bf0) at src/mpi/init/init.c:195
#7  0x0000000000400c86 in main (argc=1, argv=0x7fff0a176d38) at osu_bw.c:124


Best,
Alessandro
________________________________________
From: Dovis  Alessandro
Sent: Monday, June 15, 2015 1:34 PM
To: Jonathan Perkins; mvapich-discuss at cse.ohio-state.edu
Subject: RE: [mvapich-discuss] mvapich hangs at startup on some new machines

Hello Jonathan,

The cpus are (from /proc/cpuinfo):
        Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz .
The Mellanox interface (from lspci -vv):
        Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
        Subsystem: Mellanox Technologies Device 0051
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 80
        Region 0: Memory at d0f00000 (64-bit, non-prefetchable) [size=1M]
        Region 2: Memory at cc000000 (64-bit, prefetchable) [size=8M]
        Expansion ROM at d0000000 [disabled] [size=1M]
        Capabilities: <access denied>
        Kernel driver in use: mlx4_core

I have run different experiments with MV2_SHOW_ENV_INFO:

- if I run osu_bw with 2 machines, both on new machines, it hangs without showing me any parameters (probably it hangs before getting to that);

- if I run both on old machines I get:

 MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
        PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
        PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
        PROCESSOR MODEL NUMBER         : 62
        HCA NAME                       : MV2_HCA_MLX_CX_FDR
        HETEROGENEOUS HCA              : NO
        MV2_VBUF_TOTAL_SIZE            : 16384
        MV2_IBA_EAGER_THRESHOLD        : 16384
        MV2_RDMA_FAST_PATH_BUF_SIZE    : 5120
        MV2_PUT_FALLBACK_THRESHOLD     : 8192
        MV2_GET_FALLBACK_THRESHOLD     : 0
        MV2_EAGERSIZE_1SC              : 4096
        MV2_SMP_EAGERSIZE              : 65537
        MV2_SMPI_LENGTH_QUEUE          : 262144
        MV2_SMP_NUM_SEND_BUFFER        : 256
        MV2_SMP_BATCH_SIZE             : 8
---------------------------------------------------------------------
---------------------------------------------------------------------

- if I run one node on old machine and one on new machine, I get:

 MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
        PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
        PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
        PROCESSOR MODEL NUMBER         : 62
        HCA NAME                       : MV2_HCA_MLX_CX_FDR
        HETEROGENEOUS HCA              : NO
        MV2_VBUF_TOTAL_SIZE            : 16384
        MV2_IBA_EAGER_THRESHOLD        : 16384
        MV2_RDMA_FAST_PATH_BUF_SIZE    : 5120
        MV2_PUT_FALLBACK_THRESHOLD     : 8192
        MV2_GET_FALLBACK_THRESHOLD     : 0
        MV2_EAGERSIZE_1SC              : 4096
        MV2_SMP_EAGERSIZE              : 65537
        MV2_SMPI_LENGTH_QUEUE          : 262144
        MV2_SMP_NUM_SEND_BUFFER        : 256
        MV2_SMP_BATCH_SIZE             : 8
---------------------------------------------------------------------
---------------------------------------------------------------------

- if I run it with '-n 1' on a single new machine, it shows (before obviously failing, complaining about the number of machines):

 MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
        PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
        PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
        PROCESSOR MODEL NUMBER         : 63
        HCA NAME                       : MV2_HCA_MLX_CX_FDR
        HETEROGENEOUS HCA              : NO
        MV2_EAGERSIZE_1SC              : 0
        MV2_SMP_EAGERSIZE              : 65537
        MV2_SMPI_LENGTH_QUEUE          : 262144
        MV2_SMP_NUM_SEND_BUFFER        : 256
        MV2_SMP_BATCH_SIZE             : 8
---------------------------------------------------------------------
---------------------------------------------------------------------
This test requires exactly two processes

For the backtraces, I will recompile the library with debugging and let you know.

Thanks,
Alessandro


________________________________________
From: Jonathan Perkins [perkinjo at cse.ohio-state.edu]
Sent: Friday, June 12, 2015 9:12 PM
To: Dovis  Alessandro; mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mvapich hangs at startup on some new machines

Hello.  Can you share with us the cpu architecture and the type of HCA you're using on your new systems?

In addition to this can you do a one process run while setting the MV2_SHOW_ENV_INFO variable to 1.  It may also be usefule to send us the backtrace of the process(es) when it hangs.

On Fri, Jun 12, 2015 at 7:48 AM Dovis Alessandro <adovis at student.ethz.ch<mailto:adovis at student.ethz.ch>> wrote:
Hello everyone,

a new cluster of machines has been installed and connected to the same Mellanox Infiniband switch as other machines I was already using with MVAPICH (everything works fine there).
I have installed MVAPICH on the new machines (reconfiguring and recompiling, because they have different architecture and kernel). Both clusters use MVAPICH2 2.1.

If I run `/opt/mvapich2-2.1/bin/mpiexec --host machine1,machine2 -n 2 ~/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw`, I see the following behaviour:
- runs well, if both machines in old cluster;
- runs well, if one machine in old cluster and one in new cluster;
- hangs at startup, if both machines in the new cluster.

I have seen the same behaviour with other executables, e.g. a simple 'hello world'.
Following are the outputs both for a hanging execution and for a succeeding one.

Thank you for your help.

Best,
Alessandro Dovis


--------------------------------------------------------------------------------------------------------------------

Copy-paste of the output of the execution that hangs (on new cluster), with '-v' flag:

host: r630-04
host: r630-01

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/mvapich2-2.1/bin/
  Launcher: (null)
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    LC_PAPER=en_DK.UTF-8
    TERM=xterm
    SHELL=/bin/bash
    SSH_CLIENT=10.2.131.222 59209 22
    SSH_TTY=/dev/pts/0
    USER=adovis
    LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
    MV2_ENABLE_AFFINITY=0
    MAIL=/var/mail/adovis
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
    LC_COLLATE=C
    PWD=/home/adovis
    JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
    LANG=en_US.UTF-8
    LC_MEASUREMENT=en_DK.UTF-8
    SHLVL=1
    HOME=/home/adovis
    LOGNAME=adovis
    SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
    LC_TIME=en_DK.UTF-8
    _=/opt/mvapich2-2.1/bin/mpiexec

  Hydra internal environment:
  ---------------------------
    GFORTRAN_UNBUFFERED_PRECONNECTED=y


    Proxy information:
    *********************
      [1] proxy: r630-04 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);

      [2] proxy: r630-01 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);


==================================================================================================

[mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
[mpiexec at r630-01] Got a control port string of r630-01:48645

Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id

Arguments being passed to proxy 0:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-04 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw

Arguments being passed to proxy 1:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw

[mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x r630-04 "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
[mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1
[proxy:0:1 at r630-01] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_maxes

[proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:1 at r630-01] got pmi command (from 0): get_appnum

[proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=PMI_process_mapping
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4979_0 key=hostname[1] value=08323329
[proxy:0:1 at r630-01] cached command: hostname[1]=08323329
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
[proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329) upstream
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at r630-04] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at r630-04] got pmi command (from 4): get_maxes

[proxy:0:0 at r630-04] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0 at r630-04] got pmi command (from 4): get_appnum

[proxy:0:0 at r630-04] PMI response: cmd=appnum appnum=0
[proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=PMI_process_mapping
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0 at r630-04] got pmi command (from 4): put
kvsname=kvs_4979_0 key=hostname[0] value=08323329
[proxy:0:0 at r630-04] cached command: hostname[0]=08323329
[proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] flushing 1 put command(s) out
[proxy:0:0 at r630-04] forwarding command (cmd=put hostname[0]=08323329) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=08323329
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=08323329
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=hostname[0]
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=hostname[1]
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): put
kvsname=kvs_4979_0 key=MVAPICH2_0000 value=00000008:00000070:00000071:
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4979_0 key=MVAPICH2_0001 value=00000011:00000097:00000098:
[proxy:0:1 at r630-01] cached command: MVAPICH2_0001=00000011:00000097:00000098:
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0 at r630-04] cached command: MVAPICH2_0000=00000008:00000070:00000071:
[proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0000=00000008:00000070:00000071:
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] flushing 1 put command(s) out
[proxy:0:0 at r630-04] forwarding command (cmd=put MVAPICH2_0000=00000008:00000070:00000071:) upstream
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put MVAPICH2_0001=00000011:00000097:00000098:) upstream
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=keyval_cache MVAPICH2_0000=00000008:00000070:00000071: MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=keyval_cache MVAPICH2_0000=00000008:00000070:00000071: MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000008:00000070:00000071:
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000008:00000070:00000071:
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=MVAPICH2_0001
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000097:00000098:
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=MVAPICH2_0001
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000097:00000098:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out

{here it hangs forever}

--------------------------------------------------------------------------------------------------------------------

A good run instead looks like the following (again with -v):

host: fdr1
host: r630-01

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/mvapich2-2.1/bin/
  Launcher: (null)
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    LC_PAPER=en_DK.UTF-8
    TERM=xterm
    SHELL=/bin/bash
    SSH_CLIENT=10.2.131.222 59209 22
    SSH_TTY=/dev/pts/0
    USER=adovis
    LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
    MV2_ENABLE_AFFINITY=0
    MAIL=/var/mail/adovis
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
    LC_COLLATE=C
    PWD=/home/adovis
    JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
    LANG=en_US.UTF-8
    LC_MEASUREMENT=en_DK.UTF-8
    SHLVL=1
    HOME=/home/adovis
    LOGNAME=adovis
    SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
    LC_TIME=en_DK.UTF-8
    _=/opt/mvapich2-2.1/bin/mpiexec

  Hydra internal environment:
  ---------------------------
    GFORTRAN_UNBUFFERED_PRECONNECTED=y


    Proxy information:
    *********************
      [1] proxy: fdr1 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);

      [2] proxy: r630-01 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);


==================================================================================================

[mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
[mpiexec at r630-01] Got a control port string of r630-01:39227

Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id

Arguments being passed to proxy 0:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname fdr1 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw

Arguments being passed to proxy 1:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw

[mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x fdr1 "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
[mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1
[proxy:0:1 at r630-01] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_maxes

[proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:1 at r630-01] got pmi command (from 0): get_appnum

[proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=PMI_process_mapping
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4992_0 key=hostname[1] value=08323329
[proxy:0:1 at r630-01] cached command: hostname[1]=08323329
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at fdr1] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at fdr1] got pmi command (from 4): get_maxes

[proxy:0:0 at fdr1] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0 at fdr1] got pmi command (from 4): get_appnum

[proxy:0:0 at fdr1] PMI response: cmd=appnum appnum=0
[proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=PMI_process_mapping
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0 at fdr1] got pmi command (from 4): put
kvsname=kvs_4992_0 key=hostname[0] value=17448404
[proxy:0:0 at fdr1] cached command: hostname[0]=17448404
[proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=17448404
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] flushing 1 put command(s) out
[proxy:0:0 at fdr1] forwarding command (cmd=put hostname[0]=17448404) upstream
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=17448404
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=17448404
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=hostname[0]
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=17448404
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=hostname[1]
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4992_0 key=MVAPICH2_0001 value=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] cached command: MVAPICH2_0001=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put MVAPICH2_0001=00000011:00000099:0000009a:) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0001=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] got pmi command (from 4): put
kvsname=kvs_4992_0 key=MVAPICH2_0000 value=00000004:00000304:00000305:
[proxy:0:0 at fdr1] cached command: MVAPICH2_0000=00000004:00000304:00000305:
[proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0000=00000004:00000304:00000305:
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] flushing 1 put command(s) out
[proxy:0:0 at fdr1] forwarding command (cmd=put MVAPICH2_0000=00000004:00000304:00000305:) upstream
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache MVAPICH2_0001=00000011:00000099:0000009a: MVAPICH2_0000=00000004:00000304:00000305:
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache MVAPICH2_0001=00000011:00000099:0000009a: MVAPICH2_0000=00000004:00000304:00000305:
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000004:00000304:00000305:
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000004:00000304:00000305:
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=MVAPICH2_0001
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000099:0000009a:
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=MVAPICH2_0001
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000099:0000009a:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
# OSU MPI Bandwidth Test v4.4.1
# Size      Bandwidth (MB/s)
1                       2.93
2                       5.84
4                      11.61
8                      22.82
16                     44.77
32                     91.21
64                    179.39
128                   341.07
256                   680.59
512                  1313.70
1024                 2463.06
2048                 3993.00
4096                 5147.77
8192                 5669.63
16384                5701.75
32768                5969.73
65536                6117.85
131072               6243.84
262144               6306.77
524288               6340.28
1048576              6356.89
2097152              6362.19
4194304              6273.45
8388608              6334.72
16777216             5762.50
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): finalize

[proxy:0:1 at r630-01] PMI response: cmd=finalize_ack
[proxy:0:0 at fdr1] got pmi command (from 4): finalize

[proxy:0:0 at fdr1] PMI response: cmd=finalize_ack


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss



More information about the mvapich-discuss mailing list