[mvapich-discuss] hang in mpi_comm_create

Dan Kokron daniel.kokron at nasa.gov
Mon Mar 7 15:44:19 EST 2011


There are some indications that I am running out of communicators.  If I
use IntelMPI, the application also fails in the same place, but is nice
enough to print the following.

MPI_Comm_create(119): Too many communicators

Does MVAPICH have some hidden limit on the number of communicators?


I don't know if I need the DRDMA_CM flag or not.  I added the
enable-error flags to help debug this issue.

Dan

On Mon, 2011-03-07 at 14:31 -0600, Krishna Kandalla wrote:
> Hi Dan,
>        Thanks for reporting this issue. I have built mvapich2-1.6rc3
> in the exact same way as you indicated and tried a few simple mpich2
> test cases with MPI_Comm_Create and they seem to be working. We were
> wondering if you have any  thoughts on how we could reproduce this
> issue on our systems.  It will surely help if you could let us know
> more about your application. Also, is there a specific reason why you
> might require the -DRDMA_CM and/or the --enable-error flags?
> 
> Thanks,
> Krishna
> 
> 
> On Mon, Mar 7, 2011 at 12:21 PM, Dan Kokron <daniel.kokron at nasa.gov> wrote:
> > I am using mvpich2-1.6rc3 on a large Linux cluster (see
> > http://www.nas.nasa.gov/Resources/Systems/pleiades.html for more
> > information about the cluster)
> >
> > MVAPICH is configured as follows.
> > ./configure CC=icc CXX=icpc F77=ifort F90=ifort CFLAGS=-fpic -DRDMA_CM
> > CXXFLAGS=-fpic -DRDMA_CM FFLAGS=-fpic F90FLAGS=-fpic
> > --prefix=/u/dkokron/play/mvapich2-1.6rc3/install.dbg --enable-f77
> > --enable-f90 --enable-cxx --enable-mpe --enable-romio
> > --with-file-system=lustre --enable-threads=default --with-rdma=gen2
> > --with-hwloc --enable-error-checking=all --enable-error-messages=all
> > --enable-g=all --enable-fast=none
> >
> > The configure log is attached.
> >
> > My application gets wedged on a call to MPI_COMM_CREATE.  It does not
> > matter if SHMEM collective are enabled or not.  Also attached are
> > summaries of stack traces from all MPI processes.
> >
> > Any ideas on how to proceed in debugging this issue?
> >
> > --
> > Dan Kokron
> > Global Modeling and Assimilation Office
> > NASA Goddard Space Flight Center
> > Greenbelt, MD 20771
> > Daniel.S.Kokron at nasa.gov
> > Phone: (301) 614-5192
> > Fax:   (301) 614-5304
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
-- 
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron at nasa.gov
Phone: (301) 614-5192
Fax:   (301) 614-5304



More information about the mvapich-discuss mailing list