[mvapich-discuss] Abort: Error creating CQ

Pete Wyckoff pw at osc.edu
Fri Oct 12 13:58:50 EDT 2007


koop at cse.ohio-state.edu wrote on Fri, 12 Oct 2007 12:27 -0400:
> [egor said:]
> > This error occur after applying this patch.
> > Now I get latest version of mpiexec from svn, compile and install
> > but errors are the same.
> 
> The problem of memory locking is dependant on the startup method used. As
> you noted, it works for mpirun_rsh, so the general setup sounds correct.
> Per the discussion on this list regarding other process managers, there
> should be a startup parameters file for mpiexec that will allow setting
> the lockable memory limit. Unfortunately, I am not very familiar with
> mpiexec and don't know any details for it. Perhaps someone else with more
> knowledge of mpiexec can give their insights.

Egor, I lost the thread here.  Were you having problems running out
of locked memory for IB comms?  Or protocol issues due to changes in
mvapich from 0.9.<older> to 0.9.9?  Maybe Matt has some insight I
missed.

For memory locking, /etc/security/limits.conf on redhat:
    * hard memlock 4194304
    * soft memlock 4194304
or do "ulimit -l 4194304" in PBS startup script.

For protocol issues, you should run with "-v -v" and try to figure
out what's going wrong.

Some history below.  Please avoid top posting so we can keep track
of who said what when.

		-- Pete

> > > MVAPICH 0.9.9 changed the startup protocol from previous versions, which
> > > requires a patch to mpiexec. The patch is at the top of the mpiexec
> > > webpage:
> > >
> > > http://www.osc.edu/~pw/mpiexec/index.php
> > >
> > > Let us know if you continue to have problems after applying this patch.
> > >
> > > Thanks,
> > > Matt
> > >
> > > On Mon, 8 Oct 2007, Egor Tur wrote:
> > >
> > > >
> > > >  Hi folk.
> > > >
> > > > I have problem when I try submit jobs with mpiexec under mvapich1.
> > > > See errors below:
> > > > [0:node001] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [7:node008] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [2:node003] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [1:node002] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [3:node004] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [4:node005] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [5:node006] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [6:node007] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > mpiexec: Warning: tasks 0-7 exited before completing MPI startup.
> > > > [2:node003] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [6:node007] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [4:node005] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [3:node004] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [5:node006] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [1:node002] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [7:node008] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > [0:node001] Abort: Error creating CQ
> > > >  at line 358 in file viainit.c
> > > > mpiexec: Error: read_ib_one: rank -3 out of bounds [0..8).
> > > >
> > > > Command line is:
> > > > mpiexec -comm mpich-ib -n 8 ./progs
> > > >
> > > > mpiexec is 0.82 version. mvapich1 is 0.9.9 version
> > > >
> > > > When I use mpirun for mvapich1 than jobs running correctly.
> > > > Also when I submit jobs under ethernet with mpiexec than its running without errors.
> > > > ulimit is unlimited.
> > > >
> > > > How can i fix this situation?


More information about the mvapich-discuss mailing list