[mvapich-discuss] mvapich hangs at startup on some new machines

Jonathan Perkins perkinjo at cse.ohio-state.edu
Fri Jun 12 15:12:59 EDT 2015


Hello.  Can you share with us the cpu architecture and the type of HCA
you're using on your new systems?

In addition to this can you do a one process run while setting the
MV2_SHOW_ENV_INFO variable to 1.  It may also be usefule to send us the
backtrace of the process(es) when it hangs.

On Fri, Jun 12, 2015 at 7:48 AM Dovis Alessandro <adovis at student.ethz.ch>
wrote:

> Hello everyone,
>
> a new cluster of machines has been installed and connected to the same
> Mellanox Infiniband switch as other machines I was already using with
> MVAPICH (everything works fine there).
> I have installed MVAPICH on the new machines (reconfiguring and
> recompiling, because they have different architecture and kernel). Both
> clusters use MVAPICH2 2.1.
>
> If I run `/opt/mvapich2-2.1/bin/mpiexec --host machine1,machine2 -n 2
> ~/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw`, I see the following
> behaviour:
> - runs well, if both machines in old cluster;
> - runs well, if one machine in old cluster and one in new cluster;
> - hangs at startup, if both machines in the new cluster.
>
> I have seen the same behaviour with other executables, e.g. a simple
> 'hello world'.
> Following are the outputs both for a hanging execution and for a
> succeeding one.
>
> Thank you for your help.
>
> Best,
> Alessandro Dovis
>
>
>
> --------------------------------------------------------------------------------------------------------------------
>
> Copy-paste of the output of the execution that hangs (on new cluster),
> with '-v' flag:
>
> host: r630-04
> host: r630-01
>
>
> ==================================================================================================
> mpiexec options:
> ----------------
>   Base path: /opt/mvapich2-2.1/bin/
>   Launcher: (null)
>   Debug level: 1
>   Enable X: -1
>
>   Global environment:
>   -------------------
>     LC_PAPER=en_DK.UTF-8
>     TERM=xterm
>     SHELL=/bin/bash
>     SSH_CLIENT=10.2.131.222 59209 22
>     SSH_TTY=/dev/pts/0
>     USER=adovis
>
> LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
>     MV2_ENABLE_AFFINITY=0
>     MAIL=/var/mail/adovis
>
> PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
>     LC_COLLATE=C
>     PWD=/home/adovis
>     JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
>     LANG=en_US.UTF-8
>     LC_MEASUREMENT=en_DK.UTF-8
>     SHLVL=1
>     HOME=/home/adovis
>     LOGNAME=adovis
>     SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
>     LC_TIME=en_DK.UTF-8
>     _=/opt/mvapich2-2.1/bin/mpiexec
>
>   Hydra internal environment:
>   ---------------------------
>     GFORTRAN_UNBUFFERED_PRECONNECTED=y
>
>
>     Proxy information:
>     *********************
>       [1] proxy: r630-04 (1 cores)
>       Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
> (1 processes);
>
>       [2] proxy: r630-01 (1 cores)
>       Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
> (1 processes);
>
>
>
> ==================================================================================================
>
> [mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
> [mpiexec at r630-01] Got a control port string of r630-01:48645
>
> Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port
> r630-01:48645 --debug --rmk user --launcher ssh --demux poll --pgid 0
> --retries 10 --usize -2 --proxy-id
>
> Arguments being passed to proxy 0:
> --version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname r630-04 --global-core-map 0,1,2 --pmi-id-map 0,0
> --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0
> --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm'
> 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0'
> 'USER=adovis'
> 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/'
> 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis'
> 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin'
> 'LC_COLLATE=C' 'PWD=/home/adovis'
> 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8'
> 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis'
> 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8'
> '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/adovis --exec-args 1
> /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
>
> Arguments being passed to proxy 1:
> --version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1
> --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4979_0
> --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm'
> 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0'
> 'USER=adovis'
> 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/'
> 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis'
> 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin'
> 'LC_COLLATE=C' 'PWD=/home/adovis'
> 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8'
> 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis'
> 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8'
> '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/adovis --exec-args 1
> /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
>
> [mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x r630-04
> "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:48645
> --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
> --usize -2 --proxy-id 0
> [mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy
> --control-port r630-01:48645 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --usize -2 --proxy-id 1
> [proxy:0:1 at r630-01] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_maxes
>
> [proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:1 at r630-01] got pmi command (from 0): get_appnum
>
> [proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4979_0 key=PMI_process_mapping
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [proxy:0:1 at r630-01] got pmi command (from 0): put
> kvsname=kvs_4979_0 key=hostname[1] value=08323329
> [proxy:0:1 at r630-01] cached command: hostname[1]=08323329
> [proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] flushing 1 put command(s) out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
> [proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329)
> upstream
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] got pmi command (from 4): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at r630-04] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:0 at r630-04] got pmi command (from 4): get_maxes
>
> [proxy:0:0 at r630-04] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:0 at r630-04] got pmi command (from 4): get_appnum
>
> [proxy:0:0 at r630-04] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
> [proxy:0:0 at r630-04] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:0 at r630-04] PMI response: cmd=my_kvsname kvsname=kvs_4979_0
> [proxy:0:0 at r630-04] got pmi command (from 4): get
> kvsname=kvs_4979_0 key=PMI_process_mapping
> [proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [proxy:0:0 at r630-04] got pmi command (from 4): put
> kvsname=kvs_4979_0 key=hostname[0] value=08323329
> [proxy:0:0 at r630-04] cached command: hostname[0]=08323329
> [proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=08323329
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] flushing 1 put command(s) out
> [proxy:0:0 at r630-04] forwarding command (cmd=put hostname[0]=08323329)
> upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache
> hostname[1]=08323329 hostname[0]=08323329
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache
> hostname[1]=08323329 hostname[0]=08323329
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4979_0 key=hostname[0]
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=08323329
> [proxy:0:0 at r630-04] got pmi command (from 4): get
> kvsname=kvs_4979_0 key=hostname[1]
> [proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success
> value=08323329
> [proxy:0:0 at r630-04] got pmi command (from 4): put
> kvsname=kvs_4979_0 key=MVAPICH2_0000 value=00000008:00000070:00000071:
> [proxy:0:1 at r630-01] got pmi command (from 0): put
> kvsname=kvs_4979_0 key=MVAPICH2_0001 value=00000011:00000097:00000098:
> [proxy:0:1 at r630-01] cached command:
> MVAPICH2_0001=00000011:00000097:00000098:
> [proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:0 at r630-04] cached command:
> MVAPICH2_0000=00000008:00000070:00000071:
> [proxy:0:0 at r630-04] PMI response: cmd=put_result rc=0 msg=success
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put
> MVAPICH2_0000=00000008:00000070:00000071:
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] flushing 1 put command(s) out
> [proxy:0:0 at r630-04] forwarding command (cmd=put
> MVAPICH2_0000=00000008:00000070:00000071:) upstream
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] flushing 1 put command(s) out
> [proxy:0:1 at r630-01] forwarding command (cmd=put
> MVAPICH2_0001=00000011:00000097:00000098:) upstream
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put
> MVAPICH2_0001=00000011:00000097:00000098:
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=keyval_cache
> MVAPICH2_0000=00000008:00000070:00000071:
> MVAPICH2_0001=00000011:00000097:00000098:
> [mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=keyval_cache
> MVAPICH2_0000=00000008:00000070:00000071:
> MVAPICH2_0001=00000011:00000097:00000098:
> [mpiexec at r630-01] PMI response to fd 9 pid 0: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 0: cmd=barrier_out
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4979_0 key=MVAPICH2_0000
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=00000008:00000070:00000071:
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4979_0 key=MVAPICH2_0000
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=00000008:00000070:00000071:
> [proxy:0:0 at r630-04] got pmi command (from 4): get
> kvsname=kvs_4979_0 key=MVAPICH2_0001
> [proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success
> value=00000011:00000097:00000098:
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] got pmi command (from 4): get
> kvsname=kvs_4979_0 key=MVAPICH2_0001
> [proxy:0:0 at r630-04] PMI response: cmd=get_result rc=0 msg=success
> value=00000011:00000097:00000098:
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at r630-04] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at r630-04] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at r630-04] PMI response: cmd=barrier_out
>
> {here it hangs forever}
>
>
> --------------------------------------------------------------------------------------------------------------------
>
> A good run instead looks like the following (again with -v):
>
> host: fdr1
> host: r630-01
>
>
> ==================================================================================================
> mpiexec options:
> ----------------
>   Base path: /opt/mvapich2-2.1/bin/
>   Launcher: (null)
>   Debug level: 1
>   Enable X: -1
>
>   Global environment:
>   -------------------
>     LC_PAPER=en_DK.UTF-8
>     TERM=xterm
>     SHELL=/bin/bash
>     SSH_CLIENT=10.2.131.222 59209 22
>     SSH_TTY=/dev/pts/0
>     USER=adovis
>
> LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/
>     MV2_ENABLE_AFFINITY=0
>     MAIL=/var/mail/adovis
>
> PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin
>     LC_COLLATE=C
>     PWD=/home/adovis
>     JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/
>     LANG=en_US.UTF-8
>     LC_MEASUREMENT=en_DK.UTF-8
>     SHLVL=1
>     HOME=/home/adovis
>     LOGNAME=adovis
>     SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22
>     LC_TIME=en_DK.UTF-8
>     _=/opt/mvapich2-2.1/bin/mpiexec
>
>   Hydra internal environment:
>   ---------------------------
>     GFORTRAN_UNBUFFERED_PRECONNECTED=y
>
>
>     Proxy information:
>     *********************
>       [1] proxy: fdr1 (1 cores)
>       Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
> (1 processes);
>
>       [2] proxy: r630-01 (1 cores)
>       Exec list: /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
> (1 processes);
>
>
>
> ==================================================================================================
>
> [mpiexec at r630-01] Timeout set to -1 (-1 means infinite)
> [mpiexec at r630-01] Got a control port string of r630-01:39227
>
> Proxy launch args: /opt/mvapich2-2.1/bin/hydra_pmi_proxy --control-port
> r630-01:39227 --debug --rmk user --launcher ssh --demux poll --pgid 0
> --retries 10 --usize -2 --proxy-id
>
> Arguments being passed to proxy 0:
> --version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname fdr1 --global-core-map 0,1,2 --pmi-id-map 0,0
> --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0
> --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm'
> 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0'
> 'USER=adovis'
> 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/'
> 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis'
> 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin'
> 'LC_COLLATE=C' 'PWD=/home/adovis'
> 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8'
> 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis'
> 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8'
> '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/adovis --exec-args 1
> /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
>
> Arguments being passed to proxy 1:
> --version 3.1.4 --iface-ip-env-name MPIR_CVAR_CH3_INTERFACE_HOSTNAME
> --hostname r630-01 --global-core-map 0,1,2 --pmi-id-map 0,1
> --global-process-count 2 --auto-cleanup 1 --pmi-kvsname kvs_4992_0
> --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 21 'LC_PAPER=en_DK.UTF-8' 'TERM=xterm'
> 'SHELL=/bin/bash' 'SSH_CLIENT=10.2.131.222 59209 22' 'SSH_TTY=/dev/pts/0'
> 'USER=adovis'
> 'LD_LIBRARY_PATH=:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/:/home/adovis/fbx/opt/lib/:/opt/lib/boost_1_58_0/'
> 'MV2_ENABLE_AFFINITY=0' 'MAIL=/var/mail/adovis'
> 'PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adovis/bin:/usr/sbin:/usr/local/sbin'
> 'LC_COLLATE=C' 'PWD=/home/adovis'
> 'JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/' 'LANG=en_US.UTF-8'
> 'LC_MEASUREMENT=en_DK.UTF-8' 'SHLVL=1' 'HOME=/home/adovis' 'LOGNAME=adovis'
> 'SSH_CONNECTION=10.2.131.222 59209 10.1.212.71 22' 'LC_TIME=en_DK.UTF-8'
> '_=/opt/mvapich2-2.1/bin/mpiexec' --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/adovis --exec-args 1
> /home/adovis/osu-micro-benchmarks-4.4.1/mpi/pt2pt/osu_bw
>
> [mpiexec at r630-01] Launch arguments: /usr/bin/ssh -x fdr1
> "/opt/mvapich2-2.1/bin/hydra_pmi_proxy" --control-port r630-01:39227
> --debug --rmk user --launcher ssh --demux poll --pgid 0 --retries 10
> --usize -2 --proxy-id 0
> [mpiexec at r630-01] Launch arguments: /opt/mvapich2-2.1/bin/hydra_pmi_proxy
> --control-port r630-01:39227 --debug --rmk user --launcher ssh --demux poll
> --pgid 0 --retries 10 --usize -2 --proxy-id 1
> [proxy:0:1 at r630-01] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:1 at r630-01] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_maxes
>
> [proxy:0:1 at r630-01] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:1 at r630-01] got pmi command (from 0): get_appnum
>
> [proxy:0:1 at r630-01] PMI response: cmd=appnum appnum=0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
> [proxy:0:1 at r630-01] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:1 at r630-01] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4992_0 key=PMI_process_mapping
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [proxy:0:1 at r630-01] got pmi command (from 0): put
> kvsname=kvs_4992_0 key=hostname[1] value=08323329
> [proxy:0:1 at r630-01] cached command: hostname[1]=08323329
> [proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] flushing 1 put command(s) out
> [proxy:0:1 at r630-01] forwarding command (cmd=put hostname[1]=08323329)
> upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[1]=08323329
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] got pmi command (from 4): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at fdr1] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0
> [proxy:0:0 at fdr1] got pmi command (from 4): get_maxes
>
> [proxy:0:0 at fdr1] PMI response: cmd=maxes kvsname_max=256 keylen_max=64
> vallen_max=1024
> [proxy:0:0 at fdr1] got pmi command (from 4): get_appnum
>
> [proxy:0:0 at fdr1] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
> [proxy:0:0 at fdr1] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:0 at fdr1] PMI response: cmd=my_kvsname kvsname=kvs_4992_0
> [proxy:0:0 at fdr1] got pmi command (from 4): get
> kvsname=kvs_4992_0 key=PMI_process_mapping
> [proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success
> value=(vector,(0,2,1))
> [proxy:0:0 at fdr1] got pmi command (from 4): put
> kvsname=kvs_4992_0 key=hostname[0] value=17448404
> [proxy:0:0 at fdr1] cached command: hostname[0]=17448404
> [proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put hostname[0]=17448404
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] flushing 1 put command(s) out
> [proxy:0:0 at fdr1] forwarding command (cmd=put hostname[0]=17448404)
> upstream
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache
> hostname[1]=08323329 hostname[0]=17448404
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache
> hostname[1]=08323329 hostname[0]=17448404
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4992_0 key=hostname[0]
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=17448404
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): get
> kvsname=kvs_4992_0 key=hostname[1]
> [proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success
> value=08323329
> [proxy:0:1 at r630-01] got pmi command (from 0): put
> kvsname=kvs_4992_0 key=MVAPICH2_0001 value=00000011:00000099:0000009a:
> [proxy:0:1 at r630-01] cached command:
> MVAPICH2_0001=00000011:00000099:0000009a:
> [proxy:0:1 at r630-01] PMI response: cmd=put_result rc=0 msg=success
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] flushing 1 put command(s) out
> [proxy:0:1 at r630-01] forwarding command (cmd=put
> MVAPICH2_0001=00000011:00000099:0000009a:) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put
> MVAPICH2_0001=00000011:00000099:0000009a:
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] got pmi command (from 4): put
> kvsname=kvs_4992_0 key=MVAPICH2_0000 value=00000004:00000304:00000305:
> [proxy:0:0 at fdr1] cached command: MVAPICH2_0000=00000004:00000304:00000305:
> [proxy:0:0 at fdr1] PMI response: cmd=put_result rc=0 msg=success
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=put
> MVAPICH2_0000=00000004:00000304:00000305:
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] flushing 1 put command(s) out
> [proxy:0:0 at fdr1] forwarding command (cmd=put
> MVAPICH2_0000=00000004:00000304:00000305:) upstream
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=keyval_cache
> MVAPICH2_0001=00000011:00000099:0000009a:
> MVAPICH2_0000=00000004:00000304:00000305:
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=keyval_cache
> MVAPICH2_0001=00000011:00000099:0000009a:
> MVAPICH2_0000=00000004:00000304:00000305:
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4992_0 key=MVAPICH2_0000
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=00000004:00000304:00000305:
> [proxy:0:1 at r630-01] got pmi command (from 0): get
> kvsname=kvs_4992_0 key=MVAPICH2_0000
> [proxy:0:1 at r630-01] PMI response: cmd=get_result rc=0 msg=success
> value=00000004:00000304:00000305:
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): get
> kvsname=kvs_4992_0 key=MVAPICH2_0001
> [proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success
> value=00000011:00000099:0000009a:
> [proxy:0:0 at fdr1] got pmi command (from 4): get
> kvsname=kvs_4992_0 key=MVAPICH2_0001
> [proxy:0:0 at fdr1] PMI response: cmd=get_result rc=0 msg=success
> value=00000011:00000099:0000009a:
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> # OSU MPI Bandwidth Test v4.4.1
> # Size      Bandwidth (MB/s)
> 1                       2.93
> 2                       5.84
> 4                      11.61
> 8                      22.82
> 16                     44.77
> 32                     91.21
> 64                    179.39
> 128                   341.07
> 256                   680.59
> 512                  1313.70
> 1024                 2463.06
> 2048                 3993.00
> 4096                 5147.77
> 8192                 5669.63
> 16384                5701.75
> 32768                5969.73
> 65536                6117.85
> 131072               6243.84
> 262144               6306.77
> 524288               6340.28
> 1048576              6356.89
> 2097152              6362.19
> 4194304              6273.45
> 8388608              6334.72
> 16777216             5762.50
> [proxy:0:1 at r630-01] got pmi command (from 0): barrier_in
>
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at r630-01] forwarding command (cmd=barrier_in) upstream
> [mpiexec at r630-01] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at r630-01] PMI response to fd 9 pid 4: cmd=barrier_out
> [mpiexec at r630-01] PMI response to fd 6 pid 4: cmd=barrier_out
> [proxy:0:0 at fdr1] got pmi command (from 4): barrier_in
>
> [proxy:0:0 at fdr1] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at r630-01] PMI response: cmd=barrier_out
> [proxy:0:0 at fdr1] PMI response: cmd=barrier_out
> [proxy:0:1 at r630-01] got pmi command (from 0): finalize
>
> [proxy:0:1 at r630-01] PMI response: cmd=finalize_ack
> [proxy:0:0 at fdr1] got pmi command (from 4): finalize
>
> [proxy:0:0 at fdr1] PMI response: cmd=finalize_ack
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150612/108e7ac5/attachment-0001.html>


More information about the mvapich-discuss mailing list