[mvapich-discuss] How to debug mvapich2 - pmi_tree.c:296 Assert failed

bright.yang at vaisala.com bright.yang at vaisala.com
Fri Feb 11 12:47:24 EST 2011


I have seen issues with mpi jobs on our cluster, so I decided to install
Mvapich2 1-4.1 with debug flags on. I wonder if there is a log file I
can use to debug the problem. Even with debug flag on, I still don't see
whole lot of information to help me figure out what is the issue. The
following is the error message. Really appreciate your help.

mpirun_rsh -np 4 -hostfile hostfile ./real.exe

USE_LINEAR = 1

2 forks to be done

final cmd line =
LD_LIBRARY_PATH=/usr/mvapich/lib/shared:/opt/gridengine/lib/lx26-amd64
MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1
MPISPAWN_MPIRUN_HOST=caerus.vaisala.com MPIRUN_RSH_LAUNCH=1
MPISPAWN_CHECKIN_PORT=37196 MPISPAWN_MPIRUN_PORT=37196
MPISPAWN_GLOBAL_NPROCS=4 MPISPAWN_MPIRUN_ID=11345 MPISPAWN_ARGC=1
MPDMAN_KVS_TEMPLATE=kvs_306_caerus.vaisala.com_11345
MPISPAWN_LOCAL_NPROCS=2 MPISPAWN_ARGV_0=./real.exe
MPISPAWN_GENERIC_ENV_COUNT=1  MPISPAWN_GENERIC_NAME_0=MV2_XRC_FILE
MPISPAWN_GENERIC_VALUE_0=mv2_xrc_383_caerus.vaisala.com_11345
MPISPAWN_ID=0
MPISPAWN_WORKING_DIR=/home/wrf/wrf-brya-installed/mvapich2-wrf-dm/WRFV3/
test/em_real MPISPAWN_MPIRUN_RANK_0=0 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1
MPISPAWN_MPIRUN_RANK_1=1 MPISPAWN_VIADEV_DEFAULT_PORT_1=-1

final cmd line =
LD_LIBRARY_PATH=/usr/mvapich/lib/shared:/opt/gridengine/lib/lx26-amd64
MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1
MPISPAWN_MPIRUN_HOST=caerus.vaisala.com MPIRUN_RSH_LAUNCH=1
MPISPAWN_CHECKIN_PORT=37196 MPISPAWN_MPIRUN_PORT=37196
MPISPAWN_GLOBAL_NPROCS=4 MPISPAWN_MPIRUN_ID=11345 MPISPAWN_ARGC=1
MPDMAN_KVS_TEMPLATE=kvs_306_caerus.vaisala.com_11345
MPISPAWN_LOCAL_NPROCS=2 MPISPAWN_ARGV_0=./real.exe
MPISPAWN_GENERIC_ENV_COUNT=1  MPISPAWN_GENERIC_NAME_0=MV2_XRC_FILE
MPISPAWN_GENERIC_VALUE_0=mv2_xrc_383_caerus.vaisala.com_11345
MPISPAWN_ID=1
MPISPAWN_WORKING_DIR=/home/wrf/wrf-brya-installed/mvapich2-wrf-dm/WRFV3/
test/em_real MPISPAWN_MPIRUN_RANK_0=2 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1
MPISPAWN_MPIRUN_RANK_1=3 MPISPAWN_VIADEV_DEFAULT_PORT_1=-1

entering mpispawn_tree_init [id: 1]

[id: 1] connecting to parent

entering mpispawn_tree_init [id: 0]

[id: 0] connecting to parent

[id: 0] connected to parent

leaving mpispawn_tree_init [id: 0]

[id: 1] connected to parent

leaving mpispawn_tree_init [id: 1]

[id: 1] connecting to parent

entering conn2parent [id: 1]

[id: 0] connecting to children

verifying conn2parent [id: 1]

leaving conn2parent [id: 1]

[id: 1] connected to parent

[id: 0] connected to children

 starting wrf task  starting wrf task             0             1 of
of            4 starting wrf task             3  of             4

 starting wrf task             2  of             4

            4

 

 

pmi_tree.c:296 Assert failed (msg[n - 1] == '\n')

 

pmi_tree.c:296 Assert failed (msg[n - 1] == '\n')

MPI process (rank: 3) terminated unexpectedly on compute-0-6.local

Exit code -5 signaled from compute-0-6

MPI process (rank: 1) terminated unexpectedly on compute-0-5.local

 

Bright Yang

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110211/eeaf447e/attachment.html


More information about the mvapich-discuss mailing list