[mvapich-discuss] How to debug mvapich2 - pmi_tree.c:296 Assert
failed
bright.yang at vaisala.com
bright.yang at vaisala.com
Fri Feb 11 12:47:24 EST 2011
I have seen issues with mpi jobs on our cluster, so I decided to install
Mvapich2 1-4.1 with debug flags on. I wonder if there is a log file I
can use to debug the problem. Even with debug flag on, I still don't see
whole lot of information to help me figure out what is the issue. The
following is the error message. Really appreciate your help.
mpirun_rsh -np 4 -hostfile hostfile ./real.exe
USE_LINEAR = 1
2 forks to be done
final cmd line =
LD_LIBRARY_PATH=/usr/mvapich/lib/shared:/opt/gridengine/lib/lx26-amd64
MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1
MPISPAWN_MPIRUN_HOST=caerus.vaisala.com MPIRUN_RSH_LAUNCH=1
MPISPAWN_CHECKIN_PORT=37196 MPISPAWN_MPIRUN_PORT=37196
MPISPAWN_GLOBAL_NPROCS=4 MPISPAWN_MPIRUN_ID=11345 MPISPAWN_ARGC=1
MPDMAN_KVS_TEMPLATE=kvs_306_caerus.vaisala.com_11345
MPISPAWN_LOCAL_NPROCS=2 MPISPAWN_ARGV_0=./real.exe
MPISPAWN_GENERIC_ENV_COUNT=1 MPISPAWN_GENERIC_NAME_0=MV2_XRC_FILE
MPISPAWN_GENERIC_VALUE_0=mv2_xrc_383_caerus.vaisala.com_11345
MPISPAWN_ID=0
MPISPAWN_WORKING_DIR=/home/wrf/wrf-brya-installed/mvapich2-wrf-dm/WRFV3/
test/em_real MPISPAWN_MPIRUN_RANK_0=0 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1
MPISPAWN_MPIRUN_RANK_1=1 MPISPAWN_VIADEV_DEFAULT_PORT_1=-1
final cmd line =
LD_LIBRARY_PATH=/usr/mvapich/lib/shared:/opt/gridengine/lib/lx26-amd64
MPISPAWN_MPIRUN_MPD=0 USE_LINEAR_SSH=1
MPISPAWN_MPIRUN_HOST=caerus.vaisala.com MPIRUN_RSH_LAUNCH=1
MPISPAWN_CHECKIN_PORT=37196 MPISPAWN_MPIRUN_PORT=37196
MPISPAWN_GLOBAL_NPROCS=4 MPISPAWN_MPIRUN_ID=11345 MPISPAWN_ARGC=1
MPDMAN_KVS_TEMPLATE=kvs_306_caerus.vaisala.com_11345
MPISPAWN_LOCAL_NPROCS=2 MPISPAWN_ARGV_0=./real.exe
MPISPAWN_GENERIC_ENV_COUNT=1 MPISPAWN_GENERIC_NAME_0=MV2_XRC_FILE
MPISPAWN_GENERIC_VALUE_0=mv2_xrc_383_caerus.vaisala.com_11345
MPISPAWN_ID=1
MPISPAWN_WORKING_DIR=/home/wrf/wrf-brya-installed/mvapich2-wrf-dm/WRFV3/
test/em_real MPISPAWN_MPIRUN_RANK_0=2 MPISPAWN_VIADEV_DEFAULT_PORT_0=-1
MPISPAWN_MPIRUN_RANK_1=3 MPISPAWN_VIADEV_DEFAULT_PORT_1=-1
entering mpispawn_tree_init [id: 1]
[id: 1] connecting to parent
entering mpispawn_tree_init [id: 0]
[id: 0] connecting to parent
[id: 0] connected to parent
leaving mpispawn_tree_init [id: 0]
[id: 1] connected to parent
leaving mpispawn_tree_init [id: 1]
[id: 1] connecting to parent
entering conn2parent [id: 1]
[id: 0] connecting to children
verifying conn2parent [id: 1]
leaving conn2parent [id: 1]
[id: 1] connected to parent
[id: 0] connected to children
starting wrf task starting wrf task 0 1 of
of 4 starting wrf task 3 of 4
starting wrf task 2 of 4
4
pmi_tree.c:296 Assert failed (msg[n - 1] == '\n')
pmi_tree.c:296 Assert failed (msg[n - 1] == '\n')
MPI process (rank: 3) terminated unexpectedly on compute-0-6.local
Exit code -5 signaled from compute-0-6
MPI process (rank: 1) terminated unexpectedly on compute-0-5.local
Bright Yang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20110211/eeaf447e/attachment.html
More information about the mvapich-discuss
mailing list