[mvapich-discuss] PMGR_COLLECTIVE ERROR - pmgr_collective_mpispawn
Steve Jones
stevejones at stanford.edu
Sun Apr 27 21:26:00 EDT 2008
Hi.
I'm receiving an error on a number of Intel MPI Benchmark (IMB) jobs
that result in a PMGR_COLLECTIVE ERROR, shown below. The job failure
is not constant, I'm able to run the benchmark on a large number of
nodes, it seems to only error on sets of nodes. Can you provide more
detail on this error?
I'm using MVAPICH 1.0gen2 OFED 1.2.5 on RHEL4 2.6.9-55.0.12
The start command is $ mpirun_rsh -np 136 -hostfile $PBS_NODEFILE ./IMB-MPI1
mpispawn.c:303 Unexpected exit status
Exit code -1 signaled from COMPUTE-1-3
Killing remote processes...PMGR_COLLECTIVE ERROR: reading from (read()
Success errno=0) @ file pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: unexpected value: received 0, expecting 7 @
file pmgr_collective_mpispawn.c:137
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: unexpected value: received 0, expecting 7 @
file pmgr_collective_mpispawn.c:137
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121PMGR_COLLECTIVE ERROR: reading from
(read() Success errno=0) @ file pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: PMGR_COLLECTIVE ERROR: reading from (read()
Success errno=0) @ file pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file
pmgr_collective_mpispawn.c:121
reading from (read() Success errno=0) @ file pmgr_collective_mpispawn.c:121
reading from (read() Success errno=0) @ file pmgr_collective_mpispawn.c:121
reading from (read() Success errno=0) @ file pmgr_collective_mpispawn.c:121
DONE
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
More information about the mvapich-discuss
mailing list