[mvapich-discuss] PMGR_COLLECTIVE ERROR - pmgr_collective_mpispawn

Steve Jones stevejones at stanford.edu
Sun Apr 27 21:26:00 EDT 2008


Hi.

I'm receiving an error on a number of Intel MPI Benchmark (IMB) jobs  
that result in a PMGR_COLLECTIVE ERROR, shown below. The job failure  
is not constant, I'm able to run the benchmark on a large number of  
nodes, it seems to only error on sets of nodes. Can you provide more  
detail on this error?

I'm using MVAPICH 1.0gen2 OFED 1.2.5 on RHEL4 2.6.9-55.0.12
The start command is $ mpirun_rsh -np 136 -hostfile $PBS_NODEFILE ./IMB-MPI1

mpispawn.c:303 Unexpected exit status
Exit code -1 signaled from COMPUTE-1-3
Killing remote processes...PMGR_COLLECTIVE ERROR: reading from (read()  
Success errno=0) @ file pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: unexpected value: received 0, expecting 7 @  
file pmgr_collective_mpispawn.c:137
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: unexpected value: received 0, expecting 7 @  
file pmgr_collective_mpispawn.c:137
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121PMGR_COLLECTIVE ERROR: reading from  
(read() Success errno=0) @ file pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: PMGR_COLLECTIVE ERROR: reading from (read()  
Success errno=0) @ file pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
PMGR_COLLECTIVE ERROR: reading from (read() Success errno=0) @ file  
pmgr_collective_mpispawn.c:121
reading from (read() Success errno=0) @ file pmgr_collective_mpispawn.c:121

reading from (read() Success errno=0) @ file pmgr_collective_mpispawn.c:121
reading from (read() Success errno=0) @ file pmgr_collective_mpispawn.c:121
DONE
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.
Signal 15 received.



More information about the mvapich-discuss mailing list