[mvapich-discuss] help: Poll CQ failed!

Jeff Haferman jeff at haferman.com
Mon Dec 28 20:09:16 EST 2009


I've built four mvapich 1.0.1 stacks (PGI, gnu, intel, sun) and
one mvapich 2.1.4 stack (PGI) and I'm getting the same problem with all of
them just running the simple "cpi" test:

With mvapich1:
mpirun -np 16 -machinefile ./hostfile.16 ./cpi
Abort signaled by rank 6: Error polling CQ
MPI process terminated unexpectedly
Signal 15 received.
DONE

With mvapich2:
mpirun_rsh -ssh -np 3 -hostfile ./hostfile.16 ./cpi
Fatal error in MPI_Init:
Internal MPI error!, error stack:
MPIR_Init_thread(311).........: Initialization failed
MPID_Init(191)................: channel initialization failed
MPIDI_CH3_Init(163)...........: 
MPIDI_CH3I_RDMA_init(190).....: 
rdma_ring_based_allgather(545): Poll CQ failed!


The INTERESTING thing is that sometimes these run successfully!  They
almost always run with 2-4 processors, but generally fail with more than
4 processors (and my hostfile is setup to ensure that the processors are
on physically separate nodes).  Today I've actually had a hard time
getting mvapich1 to fail with any number of processors.  

The ibdiagnet tests show no problems.  

Where do I go from here?

Jeff



More information about the mvapich-discuss mailing list