[mvapich-discuss] mvapich hangs at startup on some new machines

Dovis Alessandro adovis at student.ethz.ch
Fri Jun 12 07:47:25 EDT 2015


Hello everyone,

a new cluster of machines has been installed and connected to the same Mellanox Infiniband switch as other machines I was already using with MVAPICH (everything works fine there).
I have installed MVAPICH on the new machines (reconfiguring and recompiling, because they have different architecture and kernel). Both clusters use MVAPICH2 2.1.

If I run `/opt/mvapich2-2.1/bin/mpiexec --host machine1,machine2 -n 2 ~/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw`, I see the following behaviour:
- runs well, if both machines in old cluster;
- runs well, if one machine in old cluster and one in new cluster;
- hangs at startup, if both machines in the new cluster.

I have seen the same behaviour with other executables, e.g. a simple 'hello world'.
Following are the outputs both for a hanging execution and for a succeeding one.

Thank you for your help.

Best,
Alessandro Dovis


--------------------------------------------------------------------------------------------------------------------

Copy-paste of the output of the execution that hangs (on new cluster), with '-v' flag:

host: r630-04
host: r630-01

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/mvapich2-2.1/bin/
  Launcher: (null)
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    LC_PAPER=en_DK.UTF-8
    TERM=xterm
    SHELL=/bin/bash
    SSH_CLIENT=10.2.131.222 59209 22
    SSH_TTY=/dev/pts/0
    USER=adovis
    LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
    MV2_ENABLE_AFFINITY=0
    MAIL=/var/mail/adovis
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
    LC_COLLATE=C
    PWD=/home/adovis
    JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
    LANG=en_US.UTF-8
    LC_MEASUREMENT=en_DK.UTF-8
    SHLVL=1
    HOME=/home/adovis
    LOGNAME=adovis
    SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
    LC_TIME=en_DK.UTF-8
    _=/opt/mvapich2-2.1/bin/mpiexec

  Hydra internal environment:
  ---------------------------
    GFORTRAN_UNBUFFERED_PRECONNECTED=y


    Proxy information:
    *********************
      [1] proxy: r630-04 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes); 

      [2] proxy: r630-01 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes); 


==================================================================================================

[mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
[mpiexec at r630-01] Got a control port string of r630-01:48645

Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 

Arguments being passed to proxy 0:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-04 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw 

Arguments being passed to proxy 1:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw 

[mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x r630-04 "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0 
[mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1 
[proxy:0:1 at r630-01] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1 
[proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_maxes

[proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:1 at r630-01] got pmi command (from 0): get_appnum

[proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=PMI_process_mapping 
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4979_0 key=hostname[1] value=08323329 
[proxy:0:1 at r630-01] cached command: hostname[1]=08323329
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
[proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329) upstream
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0 at r630-04] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at r630-04] got pmi command (from 4): get_maxes

[proxy:0:0 at r630-04] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0 at r630-04] got pmi command (from 4): get_appnum

[proxy:0:0 at r630-04] PMI response: cmd=appnum appnum=0
[proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=PMI_process_mapping 
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0 at r630-04] got pmi command (from 4): put
kvsname=kvs_4979_0 key=hostname[0] value=08323329 
[proxy:0:0 at r630-04] cached command: hostname[0]=08323329
[proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] flushing 1 put command(s) out
[proxy:0:0 at r630-04] forwarding command (cmd=put hostname[0]=08323329) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=08323329 
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=08323329 
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=hostname[0] 
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=hostname[1] 
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): put
kvsname=kvs_4979_0 key=MVAPICH2_0000 value=00000008:00000070:00000071: 
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4979_0 key=MVAPICH2_0001 value=00000011:00000097:00000098: 
[proxy:0:1 at r630-01] cached command: MVAPICH2_0001=00000011:00000097:00000098:
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0 at r630-04] cached command: MVAPICH2_0000=00000008:00000070:00000071:
[proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0000=00000008:00000070:00000071:
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] flushing 1 put command(s) out
[proxy:0:0 at r630-04] forwarding command (cmd=put MVAPICH2_0000=00000008:00000070:00000071:) upstream
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put MVAPICH2_0001=00000011:00000097:00000098:) upstream
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=keyval_cache MVAPICH2_0000=00000008:00000070:00000071: MVAPICH2_0001=00000011:00000097:00000098: 
[mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=keyval_cache MVAPICH2_0000=00000008:00000070:00000071: MVAPICH2_0001=00000011:00000097:00000098: 
[mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=MVAPICH2_0000 
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000008:00000070:00000071:
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=MVAPICH2_0000 
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000008:00000070:00000071:
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=MVAPICH2_0001 
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000097:00000098:
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=MVAPICH2_0001 
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000097:00000098:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in

[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out

{here it hangs forever}

--------------------------------------------------------------------------------------------------------------------

A good run instead looks like the following (again with -v):

host: fdr1
host: r630-01

==================================================================================================
mpiexec options:
----------------
  Base path: /opt/mvapich2-2.1/bin/
  Launcher: (null)
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    LC_PAPER=en_DK.UTF-8
    TERM=xterm
    SHELL=/bin/bash
    SSH_CLIENT=10.2.131.222 59209 22
    SSH_TTY=/dev/pts/0
    USER=adovis
    LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
    MV2_ENABLE_AFFINITY=0
    MAIL=/var/mail/adovis
    PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
    LC_COLLATE=C
    PWD=/home/adovis
    JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
    LANG=en_US.UTF-8
    LC_MEASUREMENT=en_DK.UTF-8
    SHLVL=1
    HOME=/home/adovis
    LOGNAME=adovis
    SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
    LC_TIME=en_DK.UTF-8
    _=/opt/mvapich2-2.1/bin/mpiexec

  Hydra internal environment:
  ---------------------------
    GFORTRAN_UNBUFFERED_PRECONNECTED=y


    Proxy information:
    *********************
      [1] proxy: fdr1 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes); 

      [2] proxy: r630-01 (1 cores)
      Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes); 


==================================================================================================

[mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
[mpiexec at r630-01] Got a control port string of r630-01:39227

Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 

Arguments being passed to proxy 0:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname fdr1 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw 

Arguments being passed to proxy 1:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw 

[mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x fdr1 "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0 
[mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1 
[proxy:0:1 at r630-01] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1 
[proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_maxes

[proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:1 at r630-01] got pmi command (from 0): get_appnum

[proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname

[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=PMI_process_mapping 
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4992_0 key=hostname[1] value=08323329 
[proxy:0:1 at r630-01] cached command: hostname[1]=08323329
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1 
[proxy:0:0 at fdr1] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at fdr1] got pmi command (from 4): get_maxes

[proxy:0:0 at fdr1] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0 at fdr1] got pmi command (from 4): get_appnum

[proxy:0:0 at fdr1] PMI response: cmd=appnum appnum=0
[proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname

[proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=PMI_process_mapping 
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0 at fdr1] got pmi command (from 4): put
kvsname=kvs_4992_0 key=hostname[0] value=17448404 
[proxy:0:0 at fdr1] cached command: hostname[0]=17448404
[proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=17448404
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] flushing 1 put command(s) out
[proxy:0:0 at fdr1] forwarding command (cmd=put hostname[0]=17448404) upstream
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=17448404 
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=17448404 
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=hostname[0] 
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=17448404
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=hostname[1] 
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4992_0 key=MVAPICH2_0001 value=00000011:00000099:0000009a: 
[proxy:0:1 at r630-01] cached command: MVAPICH2_0001=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put MVAPICH2_0001=00000011:00000099:0000009a:) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0001=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] got pmi command (from 4): put
kvsname=kvs_4992_0 key=MVAPICH2_0000 value=00000004:00000304:00000305: 
[proxy:0:0 at fdr1] cached command: MVAPICH2_0000=00000004:00000304:00000305:
[proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0000=00000004:00000304:00000305:
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] flushing 1 put command(s) out
[proxy:0:0 at fdr1] forwarding command (cmd=put MVAPICH2_0000=00000004:00000304:00000305:) upstream
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache MVAPICH2_0001=00000011:00000099:0000009a: MVAPICH2_0000=00000004:00000304:00000305: 
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache MVAPICH2_0001=00000011:00000099:0000009a: MVAPICH2_0000=00000004:00000304:00000305: 
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=MVAPICH2_0000 
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000004:00000304:00000305:
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=MVAPICH2_0000 
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000004:00000304:00000305:
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=MVAPICH2_0001 
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000099:0000009a:
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=MVAPICH2_0001 
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000099:0000009a:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
# OSU MPI Bandwidth Test v4.4.1
# Size      Bandwidth (MB/s)
1                       2.93
2                       5.84
4                      11.61
8                      22.82
16                     44.77
32                     91.21
64                    179.39
128                   341.07
256                   680.59
512                  1313.70
1024                 2463.06
2048                 3993.00
4096                 5147.77
8192                 5669.63
16384                5701.75
32768                5969.73
65536                6117.85
131072               6243.84
262144               6306.77
524288               6340.28
1048576              6356.89
2097152              6362.19
4194304              6273.45
8388608              6334.72
16777216             5762.50
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in

[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in

[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): finalize

[proxy:0:1 at r630-01] PMI response: cmd=finalize_ack
[proxy:0:0 at fdr1] got pmi command (from 4): finalize

[proxy:0:0 at fdr1] PMI response: cmd=finalize_ack




More information about the mvapich-discuss mailing list