[mvapich-discuss] mvapich hangs at startup on some new machines

Jonathan Perkins perkinjo at cse.ohio-state.edu
Mon Jun 15 08:21:11 EDT 2015


Thank you for the arch info and backtraces.  We'll take a look at this and
get back to you soon.

On Mon, Jun 15, 2015 at 8:17 AM Dovis Alessandro <adovis at student.ethz.ch>
wrote:

> The backtraces at the point where the processes hang are the following
> (taken by attaching gdb to the hanging processes):
>
> - for the 'mpiexec':
>
> #0  0x00007f8bec4854f0 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000045a82a in HYDT_dmxu_poll_wait_for_event (wtime=-1) at
> tools/demux/demux_poll.c:39
> #2  0x000000000045a130 in HYDT_dmx_wait_for_event (wtime=-1) at
> tools/demux/demux.c:171
> #3  0x000000000040d003 in HYD_pmci_wait_for_completion (timeout=-1) at
> pm/pmiserv/pmiserv_pmci.c:195
> #4  0x0000000000404aa3 in main (argc=9, argv=0x7ffd55720438) at
> ui/mpich/mpiexec.c:343
>
> - for hydra:
>
> #0  0x00007f7f8b4d84f0 in poll () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x000000000043b089 in HYDT_dmxu_poll_wait_for_event (wtime=-1) at
> tools/demux/demux_poll.c:39
> #2  0x000000000043a98f in HYDT_dmx_wait_for_event (wtime=-1) at
> tools/demux/demux.c:171
> #3  0x0000000000403c94 in main (argc=17, argv=0x7ffde625b2c8) at
> pm/pmiserv/pmip.c:205
>
> - for the actual program on the first machine:
>
> #0  MPIDI_CH3I_SMP_init (pg=0x28c2770) at
> src/mpid/ch3/channels/mrail/src/rdma/ch3_smp_progress.c:2139
> #1  0x00007fa3a45fa8e0 in MPIDI_CH3_Init (has_parent=0, pg=0x28c2770,
> pg_rank=0) at src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c:445
> #2  0x00007fa3a45dfd2f in MPID_Init (argc=0x7ffcc03385bc,
> argv=0x7ffcc03385b0, requested=0, provided=0x7ffcc0338544,
> has_args=0x7ffcc033854c,
>     has_env=0x7ffcc0338548) at src/mpid/ch3/src/mpid_init.c:357
> #3  0x00007fa3a44be7d6 in MPIR_Init_thread (argc=0x7ffcc03385bc,
> argv=0x7ffcc03385b0, required=0, provided=0x7ffcc0338580)
>     at src/mpi/init/initthread.c:512
> #4  0x00007fa3a44bd204 in PMPI_Init (argc=0x7ffcc03385bc,
> argv=0x7ffcc03385b0) at src/mpi/init/init.c:195
> #5  0x0000000000400c86 in main (argc=1, argv=0x7ffcc03386f8) at
> osu_bw.c:124
>
> - actual program on the second machine:
>
> #0  0x00007fb1bfc846fd in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fb1bfcac9d4 in usleep () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fb1c0433bd4 in MPIDI_CH3I_SMP_init (pg=0x35e8770) at
> src/mpid/ch3/channels/mrail/src/rdma/ch3_smp_progress.c:1918
> #3  0x00007fb1c041b8e0 in MPIDI_CH3_Init (has_parent=0, pg=0x35e8770,
> pg_rank=1) at src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c:445
> #4  0x00007fb1c0400d2f in MPID_Init (argc=0x7fff0a176bfc,
> argv=0x7fff0a176bf0, requested=0, provided=0x7fff0a176b84,
> has_args=0x7fff0a176b8c,
>     has_env=0x7fff0a176b88) at src/mpid/ch3/src/mpid_init.c:357
> #5  0x00007fb1c02df7d6 in MPIR_Init_thread (argc=0x7fff0a176bfc,
> argv=0x7fff0a176bf0, required=0, provided=0x7fff0a176bc0)
>     at src/mpi/init/initthread.c:512
> #6  0x00007fb1c02de204 in PMPI_Init (argc=0x7fff0a176bfc,
> argv=0x7fff0a176bf0) at src/mpi/init/init.c:195
> #7  0x0000000000400c86 in main (argc=1, argv=0x7fff0a176d38) at
> osu_bw.c:124
>
>
> Best,
> Alessandro
> ________________________________________
> From: Dovis  Alessandro
> Sent: Monday, June 15, 2015 1:34 PM
> To: Jonathan Perkins; mvapich-discuss at cse.ohio-state.edu
> Subject: RE: [mvapich-discuss] mvapich hangs at startup on some new
> machines
>
> Hello Jonathan,
>
> The cpus are (from /proc/cpuinfo):
>         Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz .
> The Mellanox interface (from lspci -vv):
>         Network controller: Mellanox Technologies MT27500 Family
> [ConnectX-3]
>         Subsystem: Mellanox Technologies Device 0051
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 80
>         Region 0: Memory at d0f00000 (64-bit, non-prefetchable) [size=1M]
>         Region 2: Memory at cc000000 (64-bit, prefetchable) [size=8M]
>         Expansion ROM at d0000000 [disabled] [size=1M]
>         Capabilities: <access denied>
>         Kernel driver in use: mlx4_core
>
> I have run different experiments with MV2_SHOW_ENV_INFO:
>
> - if I run osu_bw with 2 machines, both on new machines, it hangs without
> showing me any parameters (probably it hangs before getting to that);
>
> - if I run both on old machines I get:
>
>  MVAPICH2-2.1 Parameters
> ---------------------------------------------------------------------
>         PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
>         PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
>         PROCESSOR MODEL NUMBER         : 62
>         HCA NAME                       : MV2_HCA_MLX_CX_FDR
>         HETEROGENEOUS HCA              : NO
>         MV2_VBUF_TOTAL_SIZE            : 16384
>         MV2_IBA_EAGER_THRESHOLD        : 16384
>         MV2_RDMA_FAST_PATH_BUF_SIZE    : 5120
>         MV2_PUT_FALLBACK_THRESHOLD     : 8192
>         MV2_GET_FALLBACK_THRESHOLD     : 0
>         MV2_EAGERSIZE_1SC              : 4096
>         MV2_SMP_EAGERSIZE              : 65537
>         MV2_SMPI_LENGTH_QUEUE          : 262144
>         MV2_SMP_NUM_SEND_BUFFER        : 256
>         MV2_SMP_BATCH_SIZE             : 8
> ---------------------------------------------------------------------
> ---------------------------------------------------------------------
>
> - if I run one node on old machine and one on new machine, I get:
>
>  MVAPICH2-2.1 Parameters
> ---------------------------------------------------------------------
>         PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
>         PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
>         PROCESSOR MODEL NUMBER         : 62
>         HCA NAME                       : MV2_HCA_MLX_CX_FDR
>         HETEROGENEOUS HCA              : NO
>         MV2_VBUF_TOTAL_SIZE            : 16384
>         MV2_IBA_EAGER_THRESHOLD        : 16384
>         MV2_RDMA_FAST_PATH_BUF_SIZE    : 5120
>         MV2_PUT_FALLBACK_THRESHOLD     : 8192
>         MV2_GET_FALLBACK_THRESHOLD     : 0
>         MV2_EAGERSIZE_1SC              : 4096
>         MV2_SMP_EAGERSIZE              : 65537
>         MV2_SMPI_LENGTH_QUEUE          : 262144
>         MV2_SMP_NUM_SEND_BUFFER        : 256
>         MV2_SMP_BATCH_SIZE             : 8
> ---------------------------------------------------------------------
> ---------------------------------------------------------------------
>
> - if I run it with '-n 1' on a single new machine, it shows (before
> obviously failing, complaining about the number of machines):
>
>  MVAPICH2-2.1 Parameters
> ---------------------------------------------------------------------
>         PROCESSOR ARCH NAME            : MV2_ARCH_INTEL_GENERIC
>         PROCESSOR FAMILY NAME          : MV2_CPU_FAMILY_INTEL
>         PROCESSOR MODEL NUMBER         : 63
>         HCA NAME                       : MV2_HCA_MLX_CX_FDR
>         HETEROGENEOUS HCA              : NO
>         MV2_EAGERSIZE_1SC              : 0
>         MV2_SMP_EAGERSIZE              : 65537
>         MV2_SMPI_LENGTH_QUEUE          : 262144
>         MV2_SMP_NUM_SEND_BUFFER        : 256
>         MV2_SMP_BATCH_SIZE             : 8
> ---------------------------------------------------------------------
> ---------------------------------------------------------------------
> This test requires exactly two processes
>
> For the backtraces, I will recompile the library with debugging and let
> you know.
>
> Thanks,
> Alessandro
>
>
> ________________________________________
> From: Jonathan Perkins [perkinjo at cse.ohio-state.edu]
> Sent: Friday, June 12, 2015 9:12 PM
> To: Dovis  Alessandro; mvapich-discuss at cse.ohio-state.edu
> Subject: Re: [mvapich-discuss] mvapich hangs at startup on some new
> machines
>
> Hello.  Can you share with us the cpu architecture and the type of HCA
> you're using on your new systems?
>
> In addition to this can you do a one process run while setting the
> MV2_SHOW_ENV_INFO variable to 1.  It may also be usefule to send us the
> backtrace of the process(es) when it hangs.
>
> On Fri, Jun 12, 2015 at 7:48 AM Dovis Alessandro <adovis at student.ethz.ch
> <mailto:adovis at student.ethz.ch>> wrote:
> Hello everyone,
>
> a new cluster of machines has been installed and connected to the same
> Mellanox Infiniband switch as other machines I was already using with
> MVAPICH (everything works fine there).
> I have installed MVAPICH on the new machines (reconfiguring and
> recompiling, because they have different architecture and kernel). Both
> clusters use MVAPICH2 2.1.
>
> If I run `/opt/mvapich2-2.1/bin/mpiexec --host machine1,machine2 -n 2
> ~/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw`, I see the following
> behaviour:
> - runs well, if both machines in old cluster;
> - runs well, if one machine in old cluster and one in new cluster;
> - hangs at startup, if both machines in the new cluster.
>
> I have seen the same behaviour with other executables, e.g. a simple
> 'hello world'.
> Following are the outputs both for a hanging execution and for a
> succeeding one.
>
> Thank you for your help.
>
> Best,
> Alessandro Dovis
>
>
>
> --------------------------------------------------------------------------------------------------------------------
>
> Copy-paste of the output of the execution that hangs (on new cluster),
> with '-v' flag:
>
> host: r630-04
> host: r630-01
>
>
> ==================================================================================================
> mpiexec options:
> ----------------
>   Base path: /opt/mvapich2-2.1/bin/
>   Launcher: (null)
>   Debug level: 1
>   Enable X: -1
>
>   Global environment:
>   -------------------
>     LC_PAPER=en_DK.UTF-8
>     TERM=xterm
>     SHELL=/bin/bash
>     SSH_CLIENT=10.2.131.222 59209 22
>     SSH_TTY=/dev/pts/0
>     USER=adovis
>
> LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
>     MV2_ENABLE_AFFINITY=0
>     MAIL=/var/mail/adovis
>
> PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
>     LC_COLLATE=C
>     PWD=/home/adovis
>     JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
>     LANG=en_US.UTF-8
>     LC_MEASUREMENT=en_DK.UTF-8
>     SHLVL=1
>     HOME=/home/adovis
>     LOGNAME=adovis
>     SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
>     LC_TIME=en_DK.UTF-8
>     _=/opt/mvapich2-2.1/bin/mpiexec
>
>   Hydra internal environment:
>   ---------------------------
>     GFORTRAN_UNBUFFERED_PRECONNECTED=y
>
>
>     Proxy information:
>     *********************
>       [1] proxy: r630-04 (1 cores)
>       Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
> (1 processes);
>
>       [2] proxy: r630-01 (1 cores)
>       Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
> (1 processes);
>
>
>
> ==================================================================================================
>
> [mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
> [mpiexec at r630-01] Got a control port string of r630-01:48645
>
> Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port
> r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0
> --retries 10 --usize -2 --proxy-id
>
> Arguments being passed to proxy 0:
> --version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname r630-04 --global-core-map 0,1,2 --pmi-id-map 0,0
> --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0
> --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm'
> 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0'
> 'USER=adovis'
> 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/'
> 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis'
> 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin'
> 'LC_COLLATE=C' 'PWD=/home/adovis'
> 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8'
> 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis'
> 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8'
> '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/adovis --exec-args 1
> /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
>
> Arguments being passed to proxy 1:
> --version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1
> --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0
> --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm'
> 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0'
> 'USER=adovis'
> 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/'
> 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis'
> 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin'
> 'LC_COLLATE=C' 'PWD=/home/adovis'
> 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8'
> 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis'
> 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8'
> '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/adovis --exec-args 1
> /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
>
> [mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x r630-04
> "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:48645
> --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
> --usize -2 --proxy-id 0
> [mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy
> --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --usize -2 --proxy-id 1
> [proxy:0:1 at r630-01] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_maxes
>
> [proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:1 at r630-01] got pmi command (from 0): get_appnum
>
> [proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4979_0 key=PMI_process_mapping
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [proxy:0:1 at r630-01] got pmi command (from 0): put
> kvsname=kvs_4979_0 key=hostname[1] value=08323329
> [proxy:0:1 at r630-01] cached command: hostname[1]=08323329
> [proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] flushing 1 put command(s) out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
> [proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329)
> upstream
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] got pmi command (from 4): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at r630-04] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:0 at r630-04] got pmi command (from 4): get_maxes
>
> [proxy:0:0 at r630-04] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:0 at r630-04] got pmi command (from 4): get_appnum
>
> [proxy:0:0 at r630-04] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
> [proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
> [proxy:0:0 at r630-04] got pmi command (from 4): get
> kvsname=kvs_4979_0 key=PMI_process_mapping
> [proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [proxy:0:0 at r630-04] got pmi command (from 4): put
> kvsname=kvs_4979_0 key=hostname[0] value=08323329
> [proxy:0:0 at r630-04] cached command: hostname[0]=08323329
> [proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=08323329
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] flushing 1 put command(s) out
> [proxy:0:0 at r630-04] forwarding command (cmd=put hostname[0]=08323329)
> upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache
> hostname[1]=08323329 hostname[0]=08323329
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache
> hostname[1]=08323329 hostname[0]=08323329
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4979_0 key=hostname[0]
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=08323329
> [proxy:0:0 at r630-04] got pmi command (from 4): get
> kvsname=kvs_4979_0 key=hostname[1]
> [proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success
> value=08323329
> [proxy:0:0 at r630-04] got pmi command (from 4): put
> kvsname=kvs_4979_0 key=MVAPICH2_0000 value=00000008:00000070:00000071:
> [proxy:0:1 at r630-01] got pmi command (from 0): put
> kvsname=kvs_4979_0 key=MVAPICH2_0001 value=00000011:00000097:00000098:
> [proxy:0:1 at r630-01] cached command:
> MVAPICH2_0001=00000011:00000097:00000098:
> [proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:0 at r630-04] cached command:
> MVAPICH2_0000=00000008:00000070:00000071:
> [proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put
> MVAPICH2_0000=00000008:00000070:00000071:
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] flushing 1 put command(s) out
> [proxy:0:0 at r630-04] forwarding command (cmd=put
> MVAPICH2_0000=00000008:00000070:00000071:) upstream
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] flushing 1 put command(s) out
> [proxy:0:1 at r630-01] forwarding command (cmd=put
> MVAPICH2_0001=00000011:00000097:00000098:) upstream
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put
> MVAPICH2_0001=00000011:00000097:00000098:
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=keyval_cache
> MVAPICH2_0000=00000008:00000070:00000071:
> MVAPICH2_0001=00000011:00000097:00000098:
> [mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=keyval_cache
> MVAPICH2_0000=00000008:00000070:00000071:
> MVAPICH2_0001=00000011:00000097:00000098:
> [mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=barrier_out
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4979_0 key=MVAPICH2_0000
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=00000008:00000070:00000071:
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4979_0 key=MVAPICH2_0000
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=00000008:00000070:00000071:
> [proxy:0:0 at r630-04] got pmi command (from 4): get
> kvsname=kvs_4979_0 key=MVAPICH2_0001
> [proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success
> value=00000011:00000097:00000098:
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] got pmi command (from 4): get
> kvsname=kvs_4979_0 key=MVAPICH2_0001
> [proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success
> value=00000011:00000097:00000098:
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
>
> {here it hangs forever}
>
>
> --------------------------------------------------------------------------------------------------------------------
>
> A good run instead looks like the following (again with -v):
>
> host: fdr1
> host: r630-01
>
>
> ==================================================================================================
> mpiexec options:
> ----------------
>   Base path: /opt/mvapich2-2.1/bin/
>   Launcher: (null)
>   Debug level: 1
>   Enable X: -1
>
>   Global environment:
>   -------------------
>     LC_PAPER=en_DK.UTF-8
>     TERM=xterm
>     SHELL=/bin/bash
>     SSH_CLIENT=10.2.131.222 59209 22
>     SSH_TTY=/dev/pts/0
>     USER=adovis
>
> LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
>     MV2_ENABLE_AFFINITY=0
>     MAIL=/var/mail/adovis
>
> PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
>     LC_COLLATE=C
>     PWD=/home/adovis
>     JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
>     LANG=en_US.UTF-8
>     LC_MEASUREMENT=en_DK.UTF-8
>     SHLVL=1
>     HOME=/home/adovis
>     LOGNAME=adovis
>     SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
>     LC_TIME=en_DK.UTF-8
>     _=/opt/mvapich2-2.1/bin/mpiexec
>
>   Hydra internal environment:
>   ---------------------------
>     GFORTRAN_UNBUFFERED_PRECONNECTED=y
>
>
>     Proxy information:
>     *********************
>       [1] proxy: fdr1 (1 cores)
>       Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
> (1 processes);
>
>       [2] proxy: r630-01 (1 cores)
>       Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
> (1 processes);
>
>
>
> ==================================================================================================
>
> [mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
> [mpiexec at r630-01] Got a control port string of r630-01:39227
>
> Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port
> r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0
> --retries 10 --usize -2 --proxy-id
>
> Arguments being passed to proxy 0:
> --version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname fdr1 --global-core-map 0,1,2 --pmi-id-map 0,0
> --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0
> --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm'
> 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0'
> 'USER=adovis'
> 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/'
> 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis'
> 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin'
> 'LC_COLLATE=C' 'PWD=/home/adovis'
> 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8'
> 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis'
> 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8'
> '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/adovis --exec-args 1
> /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
>
> Arguments being passed to proxy 1:
> --version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1
> --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0
> --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm'
> 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0'
> 'USER=adovis'
> 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/'
> 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis'
> 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin'
> 'LC_COLLATE=C' 'PWD=/home/adovis'
> 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8'
> 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis'
> 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8'
> '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/adovis --exec-args 1
> /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
>
> [mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x fdr1
> "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:39227
> --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
> --usize -2 --proxy-id 0
> [mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy
> --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --usize -2 --proxy-id 1
> [proxy:0:1 at r630-01] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_maxes
>
> [proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:1 at r630-01] got pmi command (from 0): get_appnum
>
> [proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4992_0 key=PMI_process_mapping
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [proxy:0:1 at r630-01] got pmi command (from 0): put
> kvsname=kvs_4992_0 key=hostname[1] value=08323329
> [proxy:0:1 at r630-01] cached command: hostname[1]=08323329
> [proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] flushing 1 put command(s) out
> [proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329)
> upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] got pmi command (from 4): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at fdr1] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:0 at fdr1] got pmi command (from 4): get_maxes
>
> [proxy:0:0 at fdr1] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:0 at fdr1] got pmi command (from 4): get_appnum
>
> [proxy:0:0 at fdr1] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
> [proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
> [proxy:0:0 at fdr1] got pmi command (from 4): get
> kvsname=kvs_4992_0 key=PMI_process_mapping
> [proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [proxy:0:0 at fdr1] got pmi command (from 4): put
> kvsname=kvs_4992_0 key=hostname[0] value=17448404
> [proxy:0:0 at fdr1] cached command: hostname[0]=17448404
> [proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=17448404
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] flushing 1 put command(s) out
> [proxy:0:0 at fdr1] forwarding command (cmd=put hostname[0]=17448404)
> upstream
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache
> hostname[1]=08323329 hostname[0]=17448404
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache
> hostname[1]=08323329 hostname[0]=17448404
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4992_0 key=hostname[0]
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=17448404
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): get
> kvsname=kvs_4992_0 key=hostname[1]
> [proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success
> value=08323329
> [proxy:0:1 at r630-01] got pmi command (from 0): put
> kvsname=kvs_4992_0 key=MVAPICH2_0001 value=00000011:00000099:0000009a:
> [proxy:0:1 at r630-01] cached command:
> MVAPICH2_0001=00000011:00000099:0000009a:
> [proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] flushing 1 put command(s) out
> [proxy:0:1 at r630-01] forwarding command (cmd=put
> MVAPICH2_0001=00000011:00000099:0000009a:) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put
> MVAPICH2_0001=00000011:00000099:0000009a:
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] got pmi command (from 4): put
> kvsname=kvs_4992_0 key=MVAPICH2_0000 value=00000004:00000304:00000305:
> [proxy:0:0 at fdr1] cached command: MVAPICH2_0000=00000004:00000304:00000305:
> [proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put
> MVAPICH2_0000=00000004:00000304:00000305:
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] flushing 1 put command(s) out
> [proxy:0:0 at fdr1] forwarding command (cmd=put
> MVAPICH2_0000=00000004:00000304:00000305:) upstream
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache
> MVAPICH2_0001=00000011:00000099:0000009a:
> MVAPICH2_0000=00000004:00000304:00000305:
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache
> MVAPICH2_0001=00000011:00000099:0000009a:
> MVAPICH2_0000=00000004:00000304:00000305:
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4992_0 key=MVAPICH2_0000
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=00000004:00000304:00000305:
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4992_0 key=MVAPICH2_0000
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=00000004:00000304:00000305:
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): get
> kvsname=kvs_4992_0 key=MVAPICH2_0001
> [proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success
> value=00000011:00000099:0000009a:
> [proxy:0:0 at fdr1] got pmi command (from 4): get
> kvsname=kvs_4992_0 key=MVAPICH2_0001
> [proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success
> value=00000011:00000099:0000009a:
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> # OSU MPI Bandwidth Test v4.4.1
> # Size      Bandwidth (MB/s)
> 1                       2.93
> 2                       5.84
> 4                      11.61
> 8                      22.82
> 16                     44.77
> 32                     91.21
> 64                    179.39
> 128                   341.07
> 256                   680.59
> 512                  1313.70
> 1024                 2463.06
> 2048                 3993.00
> 4096                 5147.77
> 8192                 5669.63
> 16384                5701.75
> 32768                5969.73
> 65536                6117.85
> 131072               6243.84
> 262144               6306.77
> 524288               6340.28
> 1048576              6356.89
> 2097152              6362.19
> 4194304              6273.45
> 8388608              6334.72
> 16777216             5762.50
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): finalize
>
> [proxy:0:1 at r630-01] PMI response: cmd=finalize_ack
> [proxy:0:0 at fdr1] got pmi command (from 4): finalize
>
> [proxy:0:0 at fdr1] PMI response: cmd=finalize_ack
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu<mailto:
> mvapich-discuss at cse.ohio-state.edu>
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150615/b794e7a4/attachment-0001.html>


More information about the mvapich-discuss mailing list