[mvapich-discuss] mvapich hangs at startup on some new machines
Dovis Alessandro
adovis at student.ethz.ch
Mon Jun 15 07:34:57 EDT 2015
Hello Jonathan,
The cpus are (from /proc/cpuinfo):
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz .
The Mellanox interface (from lspci -vv):
Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies Device 0051
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 80
Region 0: Memory at d0f00000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at cc000000 (64-bit, prefetchable) [size=8M]
Expansion ROM at d0000000 [disabled] [size=1M]
Capabilities: <access denied>
Kernel driver in use: mlx4_core
I have run different experiments with MV2_SHOW_ENV_INFO:
- if I run osu_bw with 2 machines, both on new machines, it hangs without showing me any parameters (probably it hangs before getting to that);
- if I run both on old machines I get:
MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
PROCESSOR ARCH NAME : MV2_ARCH_INTEL_GENERIC
PROCESSOR FAMILY NAME : MV2_CPU_FAMILY_INTEL
PROCESSOR MODEL NUMBER : 62
HCA NAME : MV2_HCA_MLX_CX_FDR
HETEROGENEOUS HCA : NO
MV2_VBUF_TOTAL_SIZE : 16384
MV2_IBA_EAGER_THRESHOLD : 16384
MV2_RDMA_FAST_PATH_BUF_SIZE : 5120
MV2_PUT_FALLBACK_THRESHOLD : 8192
MV2_GET_FALLBACK_THRESHOLD : 0
MV2_EAGERSIZE_1SC : 4096
MV2_SMP_EAGERSIZE : 65537
MV2_SMPI_LENGTH_QUEUE : 262144
MV2_SMP_NUM_SEND_BUFFER : 256
MV2_SMP_BATCH_SIZE : 8
---------------------------------------------------------------------
---------------------------------------------------------------------
- if I run one node on old machine and one on new machine, I get:
MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
PROCESSOR ARCH NAME : MV2_ARCH_INTEL_GENERIC
PROCESSOR FAMILY NAME : MV2_CPU_FAMILY_INTEL
PROCESSOR MODEL NUMBER : 62
HCA NAME : MV2_HCA_MLX_CX_FDR
HETEROGENEOUS HCA : NO
MV2_VBUF_TOTAL_SIZE : 16384
MV2_IBA_EAGER_THRESHOLD : 16384
MV2_RDMA_FAST_PATH_BUF_SIZE : 5120
MV2_PUT_FALLBACK_THRESHOLD : 8192
MV2_GET_FALLBACK_THRESHOLD : 0
MV2_EAGERSIZE_1SC : 4096
MV2_SMP_EAGERSIZE : 65537
MV2_SMPI_LENGTH_QUEUE : 262144
MV2_SMP_NUM_SEND_BUFFER : 256
MV2_SMP_BATCH_SIZE : 8
---------------------------------------------------------------------
---------------------------------------------------------------------
- if I run it with '-n 1' on a single new machine, it shows (before obviously failing, complaining about the number of machines):
MVAPICH2-2.1 Parameters
---------------------------------------------------------------------
PROCESSOR ARCH NAME : MV2_ARCH_INTEL_GENERIC
PROCESSOR FAMILY NAME : MV2_CPU_FAMILY_INTEL
PROCESSOR MODEL NUMBER : 63
HCA NAME : MV2_HCA_MLX_CX_FDR
HETEROGENEOUS HCA : NO
MV2_EAGERSIZE_1SC : 0
MV2_SMP_EAGERSIZE : 65537
MV2_SMPI_LENGTH_QUEUE : 262144
MV2_SMP_NUM_SEND_BUFFER : 256
MV2_SMP_BATCH_SIZE : 8
---------------------------------------------------------------------
---------------------------------------------------------------------
This test requires exactly two processes
For the backtraces, I will recompile the library with debugging and let you know.
Thanks,
Alessandro
________________________________________
From: Jonathan Perkins [perkinjo at cse.ohio-state.edu]
Sent: Friday, June 12, 2015 9:12 PM
To: Dovis Alessandro; mvapich-discuss at cse.ohio-state.edu
Subject: Re: [mvapich-discuss] mvapich hangs at startup on some new machines
Hello. Can you share with us the cpu architecture and the type of HCA you're using on your new systems?
In addition to this can you do a one process run while setting the MV2_SHOW_ENV_INFO variable to 1. It may also be usefule to send us the backtrace of the process(es) when it hangs.
On Fri, Jun 12, 2015 at 7:48 AM Dovis Alessandro <adovis at student.ethz.ch<mailto:adovis at student.ethz.ch>> wrote:
Hello everyone,
a new cluster of machines has been installed and connected to the same Mellanox Infiniband switch as other machines I was already using with MVAPICH (everything works fine there).
I have installed MVAPICH on the new machines (reconfiguring and recompiling, because they have different architecture and kernel). Both clusters use MVAPICH2 2.1.
If I run `/opt/mvapich2-2.1/bin/mpiexec --host machine1,machine2 -n 2 ~/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw`, I see the following behaviour:
- runs well, if both machines in old cluster;
- runs well, if one machine in old cluster and one in new cluster;
- hangs at startup, if both machines in the new cluster.
I have seen the same behaviour with other executables, e.g. a simple 'hello world'.
Following are the outputs both for a hanging execution and for a succeeding one.
Thank you for your help.
Best,
Alessandro Dovis
--------------------------------------------------------------------------------------------------------------------
Copy-paste of the output of the execution that hangs (on new cluster), with '-v' flag:
host: r630-04
host: r630-01
==================================================================================================
mpiexec options:
----------------
Base path: /opt/mvapich2-2.1/bin/
Launcher: (null)
Debug level: 1
Enable X: -1
Global environment:
-------------------
LC_PAPER=en_DK.UTF-8
TERM=xterm
SHELL=/bin/bash
SSH_CLIENT=10.2.131.222 59209 22
SSH_TTY=/dev/pts/0
USER=adovis
LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
MV2_ENABLE_AFFINITY=0
MAIL=/var/mail/adovis
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
LC_COLLATE=C
PWD=/home/adovis
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
LANG=en_US.UTF-8
LC_MEASUREMENT=en_DK.UTF-8
SHLVL=1
HOME=/home/adovis
LOGNAME=adovis
SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
LC_TIME=en_DK.UTF-8
_=/opt/mvapich2-2.1/bin/mpiexec
Hydra internal environment:
---------------------------
GFORTRAN_UNBUFFERED_PRECONNECTED=y
Proxy information:
*********************
[1] proxy: r630-04 (1 cores)
Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);
[2] proxy: r630-01 (1 cores)
Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);
==================================================================================================
[mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
[mpiexec at r630-01] Got a control port string of r630-01:48645
Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id
Arguments being passed to proxy 0:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-04 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
Arguments being passed to proxy 1:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
[mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x r630-04 "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
[mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1
[proxy:0:1 at r630-01] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_maxes
[proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:1 at r630-01] got pmi command (from 0): get_appnum
[proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=PMI_process_mapping
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4979_0 key=hostname[1] value=08323329
[proxy:0:1 at r630-01] cached command: hostname[1]=08323329
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] flushing 1 put command(s) out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
[proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329) upstream
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at r630-04] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at r630-04] got pmi command (from 4): get_maxes
[proxy:0:0 at r630-04] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0 at r630-04] got pmi command (from 4): get_appnum
[proxy:0:0 at r630-04] PMI response: cmd=appnum appnum=0
[proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname
[proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname
[proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=PMI_process_mapping
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0 at r630-04] got pmi command (from 4): put
kvsname=kvs_4979_0 key=hostname[0] value=08323329
[proxy:0:0 at r630-04] cached command: hostname[0]=08323329
[proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] flushing 1 put command(s) out
[proxy:0:0 at r630-04] forwarding command (cmd=put hostname[0]=08323329) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=08323329
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=08323329
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=hostname[0]
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=hostname[1]
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:0 at r630-04] got pmi command (from 4): put
kvsname=kvs_4979_0 key=MVAPICH2_0000 value=00000008:00000070:00000071:
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4979_0 key=MVAPICH2_0001 value=00000011:00000097:00000098:
[proxy:0:1 at r630-01] cached command: MVAPICH2_0001=00000011:00000097:00000098:
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:0 at r630-04] cached command: MVAPICH2_0000=00000008:00000070:00000071:
[proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0000=00000008:00000070:00000071:
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] flushing 1 put command(s) out
[proxy:0:0 at r630-04] forwarding command (cmd=put MVAPICH2_0000=00000008:00000070:00000071:) upstream
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put MVAPICH2_0001=00000011:00000097:00000098:) upstream
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=keyval_cache MVAPICH2_0000=00000008:00000070:00000071: MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=keyval_cache MVAPICH2_0000=00000008:00000070:00000071: MVAPICH2_0001=00000011:00000097:00000098:
[mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000008:00000070:00000071:
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4979_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000008:00000070:00000071:
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=MVAPICH2_0001
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000097:00000098:
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): get
kvsname=kvs_4979_0 key=MVAPICH2_0001
[proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000097:00000098:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
[proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at r630-04] PMI response: cmd=barrier_out
{here it hangs forever}
--------------------------------------------------------------------------------------------------------------------
A good run instead looks like the following (again with -v):
host: fdr1
host: r630-01
==================================================================================================
mpiexec options:
----------------
Base path: /opt/mvapich2-2.1/bin/
Launcher: (null)
Debug level: 1
Enable X: -1
Global environment:
-------------------
LC_PAPER=en_DK.UTF-8
TERM=xterm
SHELL=/bin/bash
SSH_CLIENT=10.2.131.222 59209 22
SSH_TTY=/dev/pts/0
USER=adovis
LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
MV2_ENABLE_AFFINITY=0
MAIL=/var/mail/adovis
PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
LC_COLLATE=C
PWD=/home/adovis
JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
LANG=en_US.UTF-8
LC_MEASUREMENT=en_DK.UTF-8
SHLVL=1
HOME=/home/adovis
LOGNAME=adovis
SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
LC_TIME=en_DK.UTF-8
_=/opt/mvapich2-2.1/bin/mpiexec
Hydra internal environment:
---------------------------
GFORTRAN_UNBUFFERED_PRECONNECTED=y
Proxy information:
*********************
[1] proxy: fdr1 (1 cores)
Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);
[2] proxy: r630-01 (1 cores)
Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw (1 processes);
==================================================================================================
[mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
[mpiexec at r630-01] Got a control port string of r630-01:39227
Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id
Arguments being passed to proxy 0:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname fdr1 --global-core-map 0,1,2 --pmi-id-map 0,0 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
Arguments being passed to proxy 1:
--version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1 --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1 --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm' 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0' 'USER=adovis' 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/' 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis' 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin' 'LC_COLLATE=C' 'PWD=/home/adovis' 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8' 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis' 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8' '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir /home/adovis --exec-args 1 /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
[mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x fdr1 "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0
[mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 1
[proxy:0:1 at r630-01] got pmi command (from 0): init
pmi_version=1 pmi_subversion=1
[proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_maxes
[proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:1 at r630-01] got pmi command (from 0): get_appnum
[proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
[proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=PMI_process_mapping
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4992_0 key=hostname[1] value=08323329
[proxy:0:1 at r630-01] cached command: hostname[1]=08323329
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] got pmi command (from 4): init
pmi_version=1 pmi_subversion=1
[proxy:0:0 at fdr1] PMI response: cmd=response_to_init pmi_version=1 pmi_subversion=1 rc=0
[proxy:0:0 at fdr1] got pmi command (from 4): get_maxes
[proxy:0:0 at fdr1] PMI response: cmd=maxes kvsname_max=256 keylen_max=64 vallen_max=1024
[proxy:0:0 at fdr1] got pmi command (from 4): get_appnum
[proxy:0:0 at fdr1] PMI response: cmd=appnum appnum=0
[proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname
[proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname
[proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=PMI_process_mapping
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
[proxy:0:0 at fdr1] got pmi command (from 4): put
kvsname=kvs_4992_0 key=hostname[0] value=17448404
[proxy:0:0 at fdr1] cached command: hostname[0]=17448404
[proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=17448404
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] flushing 1 put command(s) out
[proxy:0:0 at fdr1] forwarding command (cmd=put hostname[0]=17448404) upstream
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=17448404
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache hostname[1]=08323329 hostname[0]=17448404
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=hostname[0]
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=17448404
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=hostname[1]
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=08323329
[proxy:0:1 at r630-01] got pmi command (from 0): put
kvsname=kvs_4992_0 key=MVAPICH2_0001 value=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] cached command: MVAPICH2_0001=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] flushing 1 put command(s) out
[proxy:0:1 at r630-01] forwarding command (cmd=put MVAPICH2_0001=00000011:00000099:0000009a:) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0001=00000011:00000099:0000009a:
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] got pmi command (from 4): put
kvsname=kvs_4992_0 key=MVAPICH2_0000 value=00000004:00000304:00000305:
[proxy:0:0 at fdr1] cached command: MVAPICH2_0000=00000004:00000304:00000305:
[proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put MVAPICH2_0000=00000004:00000304:00000305:
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] flushing 1 put command(s) out
[proxy:0:0 at fdr1] forwarding command (cmd=put MVAPICH2_0000=00000004:00000304:00000305:) upstream
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache MVAPICH2_0001=00000011:00000099:0000009a: MVAPICH2_0000=00000004:00000304:00000305:
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache MVAPICH2_0001=00000011:00000099:0000009a: MVAPICH2_0000=00000004:00000304:00000305:
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000004:00000304:00000305:
[proxy:0:1 at r630-01] got pmi command (from 0): get
kvsname=kvs_4992_0 key=MVAPICH2_0000
[proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success value=00000004:00000304:00000305:
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=MVAPICH2_0001
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000099:0000009a:
[proxy:0:0 at fdr1] got pmi command (from 4): get
kvsname=kvs_4992_0 key=MVAPICH2_0001
[proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success value=00000011:00000099:0000009a:
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
# OSU MPI Bandwidth Test v4.4.1
# Size Bandwidth (MB/s)
1 2.93
2 5.84
4 11.61
8 22.82
16 44.77
32 91.21
64 179.39
128 341.07
256 680.59
512 1313.70
1024 2463.06
2048 3993.00
4096 5147.77
8192 5669.63
16384 5701.75
32768 5969.73
65536 6117.85
131072 6243.84
262144 6306.77
524288 6340.28
1048576 6356.89
2097152 6362.19
4194304 6273.45
8388608 6334.72
16777216 5762.50
[proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
[mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
[mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
[mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
[proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
[proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
[proxy:0:1 at r630-01] PMI response: cmd=barrier_out
[proxy:0:0 at fdr1] PMI response: cmd=barrier_out
[proxy:0:1 at r630-01] got pmi command (from 0): finalize
[proxy:0:1 at r630-01] PMI response: cmd=finalize_ack
[proxy:0:0 at fdr1] got pmi command (from 4): finalize
[proxy:0:0 at fdr1] PMI response: cmd=finalize_ack
_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu>
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mvapich-discuss
mailing list