[mvapich-discuss] Hydra initialization timeout on mvapich2 1.6

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Apr 7 08:30:25 EDT 2011


Hi, lets try something more simple to start out.  There could be
something small tripping this up.

First are you trying to run over IB or IPoIB?  By specifying the
-iface option of mpiexec I believe that is telling hydra to
communicate over the IPoIB network but I dont' believe that this
should effect the network used by the MPI app itself.  Let's remove
this option for the time being.

Did you recompile mpi_hello with the mvapich2 library?  Please try
this again just be sure.

Your run should look like `mpiexec -v -f /tmp/hostfile -n 2
./mpi_hello'.  You can also try running the osu benchmarks, these are
installed in the same prefix as mvapich2 in the libexec directory
(/usr/local/libexec/osu-micro-benchmarks according to your configure
options).

Please let us know how this goes.

On Wed, Apr 6, 2011 at 4:54 PM,  <davidr at ressman.org> wrote:
> Hello all,
>
> I'm running into a problem with hydra on mvapich2-1.6, and I'm sure
> I'm doing something wrong, but I can't figure out what it is. Here's
> what the test environment currently looks like:
>
> 2 hosts, computea and computeb, running:
>  ubuntu 10.04 LTS
>  ofed 1.5.2_2
>  2 IP interfaces, hosta (gigE) and hosta-ib0 (connectx II QDR IB)
>  mvapich2-1.6, configured with:
>   --enable-sharedlibs=gcc \
>   --with-pm=hydra \
>   --enable-f77 \
>   --enable-fc \
>   --enable-cxx \
>   --enable-romio
>
> I have a simple mpi_hello program, and my hosts file looks like:
>
> computea:1
> computeb:1
>
> When I run the following command:
>
> mpiexec -iface ib0 -v -f /tmp/hostfile -n 2 ./mpi_hello
>
> At this point, I get the Hydra initialization debug messages, and the
> job will hang indefinitely. I can see that the mpi_hello processes
> have been started on both clients as well as the hydra_pmi_proxy.  If
> I set MPIEXEC_TIMEOUT, it will time out and exit properly.  I don't
> see anything particularly useful in the debug output above, but I'm
> not familiar at all with Hydra, so I doubt very much I'd be able to
> recognize the problem.
>
> What is most confusing is that I tried precisely the same setup on
> mpich2 1.3.2p1 (over ethernet) and it worked perfectly. What did I
> break in mvapich2?
>
> Thanks!
>
> The output follows:
>
> -- BEGIN MPIEXEC OUTPUT --
>
> mpiexec options:
> ----------------
>  Base path: /usr/local/pkg/software/modules_repo/mvapich2/1.6/bin/
>  Launcher: (null)
>  Debug level: 1
>  Enable X: -1
>
>  Global environment:
>  -------------------
>   MPIEXEC_TIMEOUT=20
>   MODULE_VERSION_STACK=3.2.7
>   MANPATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/share/man:/usr/man
>   TERM=xterm
>   SHELL=/bin/bash
>   HISTSIZE=1000
>   SSH_CLIENT=192.168.1018.100 44940 22
>   LIBRARY_PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/lib
>   OLDPWD=/home/myuser
>   SSH_TTY=/dev/pts/0
>   USER=myuser
>   LD_LIBRARY_PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/lib
>   CPATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/include
>   MODULE_VERSION=3.2.7
>   MAIL=/var/mail/myuser
>   PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/bin:/usr/local/bin:/usr/bin:/bin:/sbin:/usr/sbin:/usr/local/sbin
>   PWD=/home/myuser/jobs/mpi_hello
>   _LMFILES_=/physical/gpfs/oak-hpc/home_01/grid_software/modules_modulefiles/mpi/mvapich2/1.6
>   LANG=en_US
>   MODULEPATH=/mnt/nfs/GRID_SOFTWARE/modules_modulefiles:/physical/gpfs/oak-hpc/home_01/grid_software/modules_modulefiles
>   LOADEDMODULES=mpi/mvapich2/1.6
>   PS1=\h \t \W [\!/$?] \$
>   HISTCONTROL=ignoreboth
>   PS2=  >
>   SHLVL=1
>   HOME=/home/myuser
>   MODULE_VERSION=3.2.7
>   BASH_ENV=~/.bashrc
>   LOGNAME=myuser
>   SSH_CONNECTION=192.168.1018.100 44940 192.168.100.210 22
>   MODULESHOME=/usr/Modules/3.2.7
>   INCLUDE=/usr/local/pkg/software/modules_repo/mvapich2/1.6/include
>   HISTFILE=/home/myuser/.bash_history.d/history.computea
>   module=() {  eval `/usr/Modules/$MODULE_VERSION/bin/modulecmd bash $*` }
>   _=/usr/local/pkg/software/modules_repo/mvapich2/1.6/bin/mpiexec
>
>  Hydra internal environment:
>  ---------------------------
>   GFORTRAN_UNBUFFERED_PRECONNECTED=y
>
>
>   Proxy information:
>   *********************
>     Proxy ID:  1
>     -----------------
>       Proxy name: computea
>       Process count: 1
>
>       Proxy exec list:
>       ....................
>         Exec: ./mpi_hello; Process count: 1
>     Proxy ID:  2
>     -----------------
>       Proxy name: computeb
>       Process count: 1
>
>       Proxy exec list:
>       ....................
>         Exec: ./mpi_hello; Process count: 1
>
> ==================================================================================================
>
> [mpiexec at computea] Timeout set to 20 (-1 means infinite)
> [mpiexec at computea] Got a control port string of computea:32853
>
> Proxy launch args:
> /usr/local/pkg/software/modules_repo/mvapich2/1.6/bin/hydra_pmi_proxy
> --control-port computea:32853 --debug --demux poll --pgid 0 --proxy-id
>
> [mpiexec at computea] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
> Arguments being passed to proxy 0:
> --version 1.6rc3 --interface-env-name MPICH_INTERFACE_HOSTNAME
> --hostname computea --global-core-map 0,1,1 --filler-process-map 0,1,1
> --global-process-count 2 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname
> kvs_11662_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 35 'MPIEXEC_TIMEOUT=20'
> 'MODULE_VERSION_STACK=3.2.7'
> 'MANPATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/share/man:/usr/man'
> 'TERM=xterm' 'SHELL=/bin/bash' 'HISTSIZE=1000'
> 'SSH_CLIENT=192.168.1018.100 44940 22'
> 'LIBRARY_PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/lib'
> 'OLDPWD=/home/myuser' 'SSH_TTY=/dev/pts/0' 'USER=myuser'
> 'LD_LIBRARY_PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/lib'
> 'CPATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/include'
> 'MODULE_VERSION=3.2.7' 'MAIL=/var/mail/myuser'
> 'PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/bin:/usr/local/bin:/usr/bin:/bin:/sbin:/usr/sbin:/usr/local/sbin'
> 'PWD=/home/myuser/jobs/mpi_hello' '_LMFILES_
> =/physical/gpfs/oak-hpc/home_01/grid_software/modules_modulefiles/mpi/mvapich2/1.6'
> 'LANG=en_US' 'MODULEPATH=/mnt/nfs/GRID_SOFTWARE/modules_modulefiles:/physical/gpfs/oak-hpc/home_01/grid_software/modules_modulefiles'
> 'LOADEDMODULES=mpi/mvapich2/1.6' 'PS1=\h \t \W [\!/$?] \$ '
> 'HISTCONTROL=ignoreboth' 'PS2=  > ' 'SHLVL=1' 'HOME=/home/myuser'
> 'MODULE_VERSION=3.2.7' 'BASH_ENV=~/.bashrc' 'LOGNAME=myuser'
> 'SSH_CONNECTION=192.168.1018.100 44940 192.168.100.210 22'
> 'MODULESHOME=/usr/Modules/3.2.7'
> 'INCLUDE=/usr/local/pkg/software/modules_repo/mvapich2/1.6/include'
> 'HISTFILE=/home/myuser/.bash_history.d/history.computea' 'module=() {
> eval `/usr/Modules/$MODULE_VERSION/bin/modulecmd bash $*` }'
> '_=/usr/local/pkg/software/modules_repo/mvapich2/1.6/bin/mpiexec'
> --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/myuser/jobs/mpi_hello --exec-args 1 ./mpi_hello
>
> [mpiexec at computea] PMI FD: (null); PMI PORT: (null); PMI ID/RANK: -1
> Arguments being passed to proxy 1:
> --version 1.6rc3 --interface-env-name MPICH_INTERFACE_HOSTNAME
> --hostname computeb --global-core-map 1,1,0 --filler-process-map 1,1,0
> --global-process-count 2 --auto-cleanup 1 --pmi-rank -1 --pmi-kvsname
> kvs_11662_0 --pmi-process-mapping (vector,(0,2,1)) --ckpoint-num -1
> --global-inherited-env 35 'MPIEXEC_TIMEOUT=20'
> 'MODULE_VERSION_STACK=3.2.7'
> 'MANPATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/share/man:/usr/man'
> 'TERM=xterm' 'SHELL=/bin/bash' 'HISTSIZE=1000'
> 'SSH_CLIENT=192.168.1018.100 44940 22'
> 'LIBRARY_PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/lib'
> 'OLDPWD=/home/myuser' 'SSH_TTY=/dev/pts/0' 'USER=myuser'
> 'LD_LIBRARY_PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/lib'
> 'CPATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/include'
> 'MODULE_VERSION=3.2.7' 'MAIL=/var/mail/myuser'
> 'PATH=/usr/local/pkg/software/modules_repo/mvapich2/1.6/bin:/usr/local/bin:/usr/bin:/bin:/sbin:/usr/sbin:/usr/local/sbin'
> 'PWD=/home/myuser/jobs/mpi_hello' '_LMFILES_
> =/physical/gpfs/oak-hpc/home_01/grid_software/modules_modulefiles/mpi/mvapich2/1.6'
> 'LANG=en_US' 'MODULEPATH=/mnt/nfs/GRID_SOFTWARE/modules_modulefiles:/physical/gpfs/oak-hpc/home_01/grid_software/modules_modulefiles'
> 'LOADEDMODULES=mpi/mvapich2/1.6' 'PS1=\h \t \W [\!/$?] \$ '
> 'HISTCONTROL=ignoreboth' 'PS2=  > ' 'SHLVL=1' 'HOME=/home/myuser'
> 'MODULE_VERSION=3.2.7' 'BASH_ENV=~/.bashrc' 'LOGNAME=myuser'
> 'SSH_CONNECTION=192.168.1018.100 44940 192.168.100.210 22'
> 'MODULESHOME=/usr/Modules/3.2.7'
> 'INCLUDE=/usr/local/pkg/software/modules_repo/mvapich2/1.6/include'
> 'HISTFILE=/home/myuser/.bash_history.d/history.computea' 'module=() {
> eval `/usr/Modules/$MODULE_VERSION/bin/modulecmd bash $*` }'
> '_=/usr/local/pkg/software/modules_repo/mvapich2/1.6/bin/mpiexec'
> --global-user-env 0 --global-system-env 1
> 'GFORTRAN_UNBUFFERED_PRECONNECTED=y' --proxy-core-count 1 --exec
> --exec-appnum 0 --exec-proc-count 1 --exec-local-env 0 --exec-wdir
> /home/myuser/jobs/mpi_hello --exec-args 1 ./mpi_hello
>
> [mpiexec at computea] Launch arguments:
> /usr/local/pkg/software/modules_repo/mvapich2/1.6/bin/hydra_pmi_proxy
> --control-port computea:32853 --debug --demux poll --pgid 0 --proxy-id
> 0 [mpiexec at computea] Launch arguments: /usr/bin/ssh -x computeb
> "/usr/local/pkg/software/modules_repo/mvapich2/1.6/bin/hydra_pmi_proxy"
> --control-port computea:32853 --debug --demux poll --pgid 0 --proxy-id
> 1 [proxy:0:0 at computea] got pmi command (from 0): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:0 at computea] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0 [proxy:0:0 at computea] got pmi command (from 0):
> get_maxes
>
> [proxy:0:0 at computea] PMI response: cmd=maxes kvsname_max=256
> keylen_max=64 vallen_max=1024 [proxy:0:0 at computea] got pmi command
> (from 0): get_appnum
>
> [proxy:0:0 at computea] PMI response: cmd=appnum appnum=0
> [proxy:0:0 at computea] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:0 at computea] PMI response: cmd=my_kvsname kvsname=kvs_11662_0
> [proxy:0:0 at computea] got pmi command (from 0): get_my_kvsname
>
> [proxy:0:0 at computea] PMI response: cmd=my_kvsname kvsname=kvs_11662_0
> [proxy:0:0 at computea] got pmi command (from 0): get
> kvsname=kvs_11662_0 key=PMI_process_mapping [proxy:0:0 at computea] PMI
> response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
> [proxy:0:0 at computea] got pmi command (from 0): put
> kvsname=kvs_11662_0 key=MVAPICH2_0000 value=00000146:004c0051:004c0052:
> [proxy:0:0 at computea] we don't understand this command put; forwarding
> upstream [mpiexec at computea] [pgid: 0] got PMI command: cmd=put
> kvsname=kvs_11662_0 key=MVAPICH2_0000
> value=00000146:004c0051:004c0052:
> [mpiexec at computea] PMI response to fd 6 pid 0: cmd=put_result rc=0
> msg=success [proxy:0:0 at computea] we don't understand the response
> put_result; forwarding downstream [proxy:0:0 at computea] got pmi command
> (from 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at computeb] got pmi command (from 4): init
> pmi_version=1 pmi_subversion=1
> [proxy:0:1 at computeb] PMI response: cmd=response_to_init pmi_version=1
> pmi_subversion=1 rc=0 [proxy:0:1 at computeb] got pmi command (from 4):
> get_maxes
>
> [proxy:0:1 at computeb] PMI response: cmd=maxes kvsname_max=256
> keylen_max=64 vallen_max=1024 [proxy:0:1 at computeb] got pmi command
> (from 4): get_appnum
>
> [proxy:0:1 at computeb] PMI response: cmd=appnum appnum=0
> [proxy:0:1 at computeb] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:1 at computeb] PMI response: cmd=my_kvsname kvsname=kvs_11662_0
> [proxy:0:1 at computeb] got pmi command (from 4): get_my_kvsname
>
> [proxy:0:1 at computeb] PMI response: cmd=my_kvsname kvsname=kvs_11662_0
> [proxy:0:1 at computeb] got pmi command (from 4): get
> kvsname=kvs_11662_0 key=PMI_process_mapping [proxy:0:1 at computeb] PMI
> response: cmd=get_result rc=0 msg=success value=(vector,(0,2,1))
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=put
> kvsname=kvs_11662_0 key=MVAPICH2_0001
> value=00000147:00580051:00580052:
> [mpiexec at computea] PMI response to fd 7 pid 4: cmd=put_result rc=0
> msg=success [proxy:0:1 at computeb] got pmi command (from 4): put
> kvsname=kvs_11662_0 key=MVAPICH2_0001 value=00000147:00580051:00580052:
> [proxy:0:1 at computeb] we don't understand this command put; forwarding
> upstream [proxy:0:1 at computeb] we don't understand the response
> put_result; forwarding downstream [mpiexec at computea] [pgid: 0] got PMI
> command: cmd=barrier_in [mpiexec at computea] PMI response to fd 6 pid 4:
> cmd=barrier_out [mpiexec at computea] PMI response to fd 7 pid 4:
> cmd=barrier_out [proxy:0:0 at computea] PMI response: cmd=barrier_out
> [proxy:0:1 at computeb] got pmi command (from 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:0 at computea] got pmi command (from 0): get
> kvsname=kvs_11662_0 key=MVAPICH2_0001
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=get
> kvsname=kvs_11662_0 key=MVAPICH2_0001 [mpiexec at computea] PMI response
> to fd 6 pid 0: cmd=get_result rc=0 msg=success
> value=00000147:00580051:00580052:
> [proxy:0:0 at computea] forwarding command (cmd=get kvsname=kvs_11662_0
> key=MVAPICH2_0001) upstream [proxy:0:0 at computea] we don't understand
> the response get_result; forwarding downstream [proxy:0:0 at computea]
> got pmi command (from 0): get
> kvsname=kvs_11662_0 key=MVAPICH2_0001
> [proxy:0:0 at computea] forwarding command (cmd=get kvsname=kvs_11662_0
> key=MVAPICH2_0001) upstream [proxy:0:1 at computeb] PMI response:
> cmd=barrier_out [mpiexec at computea] [pgid: 0] got PMI command: cmd=get
> kvsname=kvs_11662_0 key=MVAPICH2_0001 [mpiexec at computea] PMI response
> to fd 6 pid 0: cmd=get_result rc=0 msg=success
> value=00000147:00580051:00580052:
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=get
> kvsname=kvs_11662_0 key=MVAPICH2_0000 [mpiexec at computea] PMI response
> to fd 7 pid 4: cmd=get_result rc=0 msg=success
> value=00000146:004c0051:004c0052:
> [proxy:0:0 at computea] we don't understand the response get_result;
> forwarding downstream [proxy:0:1 at computeb] got pmi command (from 4):
> get
> kvsname=kvs_11662_0 key=MVAPICH2_0000
> [proxy:0:1 at computeb] forwarding command (cmd=get kvsname=kvs_11662_0
> key=MVAPICH2_0000) upstream [proxy:0:0 at computea] got pmi command (from
> 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at computeb] we don't understand the response get_result;
> forwarding downstream [mpiexec at computea] [pgid: 0] got PMI command:
> cmd=get kvsname=kvs_11662_0 key=MVAPICH2_0000 [mpiexec at computea] PMI
> response to fd 7 pid 4: cmd=get_result rc=0 msg=success
> value=00000146:004c0051:004c0052:
> [proxy:0:1 at computeb] got pmi command (from 4): get
> kvsname=kvs_11662_0 key=MVAPICH2_0000
> [proxy:0:1 at computeb] forwarding command (cmd=get kvsname=kvs_11662_0
> key=MVAPICH2_0000) upstream [proxy:0:1 at computeb] we don't understand
> the response get_result; forwarding downstream [mpiexec at computea]
> [pgid: 0] got PMI command: cmd=barrier_in [mpiexec at computea] PMI
> response to fd 6 pid 4: cmd=barrier_out [mpiexec at computea] PMI
> response to fd 7 pid 4: cmd=barrier_out [proxy:0:1 at computeb] got pmi
> command (from 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:0 at computea] PMI response: cmd=barrier_out
> [proxy:0:1 at computeb] PMI response: cmd=barrier_out
> [proxy:0:0 at computea] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at computea] PMI response to fd 6 pid 4: cmd=barrier_out
> [mpiexec at computea] PMI response to fd 7 pid 4: cmd=barrier_out
> [proxy:0:0 at computea] PMI response: cmd=barrier_out
> [proxy:0:1 at computeb] got pmi command (from 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at computeb] PMI response: cmd=barrier_out
> [proxy:0:0 at computea] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at computea] PMI response to fd 6 pid 4: cmd=barrier_out
> [mpiexec at computea] PMI response to fd 7 pid 4: cmd=barrier_out
> [proxy:0:0 at computea] PMI response: cmd=barrier_out
> [proxy:0:1 at computeb] got pmi command (from 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at computeb] PMI response: cmd=barrier_out
> [proxy:0:0 at computea] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at computea] PMI response to fd 6 pid 4: cmd=barrier_out
> [mpiexec at computea] PMI response to fd 7 pid 4: cmd=barrier_out
> [proxy:0:0 at computea] PMI response: cmd=barrier_out
> [proxy:0:1 at computeb] got pmi command (from 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at computeb] PMI response: cmd=barrier_out [mpiexec at computea]
> [pgid: 0] got PMI command: cmd=barrier_in [proxy:0:1 at computeb] got pmi
> command (from 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:0 at computea] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at computea] PMI response to fd 6 pid 0: cmd=barrier_out
> [mpiexec at computea] PMI response to fd 7 pid 0: cmd=barrier_out
> [proxy:0:0 at computea] PMI response: cmd=barrier_out
> [proxy:0:1 at computeb] PMI response: cmd=barrier_out [mpiexec at computea]
> [pgid: 0] got PMI command: cmd=barrier_in [proxy:0:1 at computeb] got pmi
> command (from 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:0 at computea] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [mpiexec at computea] PMI response to fd 6 pid 0: cmd=barrier_out
> [mpiexec at computea] PMI response to fd 7 pid 0: cmd=barrier_out
> [proxy:0:0 at computea] PMI response: cmd=barrier_out
> [proxy:0:0 at computea] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at computeb] PMI response: cmd=barrier_out [mpiexec at computea]
> [pgid: 0] got PMI command: cmd=barrier_in [mpiexec at computea] PMI
> response to fd 6 pid 4: cmd=barrier_out [mpiexec at computea] PMI
> response to fd 7 pid 4: cmd=barrier_out [proxy:0:0 at computea] PMI
> response: cmd=barrier_out [proxy:0:1 at computeb] got pmi command (from
> 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:0 at computea] got pmi command (from 0): barrier_in
>
> [proxy:0:0 at computea] forwarding command (cmd=barrier_in) upstream
> [mpiexec at computea] [pgid: 0] got PMI command: cmd=barrier_in
> [proxy:0:1 at computeb] PMI response: cmd=barrier_out [mpiexec at computea]
> [pgid: 0] got PMI command: cmd=barrier_in [mpiexec at computea] PMI
> response to fd 6 pid 4: cmd=barrier_out [mpiexec at computea] PMI
> response to fd 7 pid 4: cmd=barrier_out [proxy:0:0 at computea] PMI
> response: cmd=barrier_out [proxy:0:1 at computeb] got pmi command (from
> 4): barrier_in
>
> [proxy:0:1 at computeb] forwarding command (cmd=barrier_in) upstream
> [proxy:0:1 at computeb] PMI response: cmd=barrier_out
>
> <  at this point, output stops until the job times out or I Ctrl-C it  >
>
> -- END MPIEXEC OUTPUT --
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Jonathan Perkins
http://www.cse.ohio-state.edu/~perkinjo



More information about the mvapich-discuss mailing list