[mvapich-discuss] MVAPICH2 Invalid communicator errors

Dhabaleswar Panda panda at cse.ohio-state.edu
Wed Sep 12 23:07:38 EDT 2007


Mark, 

Sorry to know that you are facing this problem. We have used MVAPICH2
0.9.8 with Intel 9.1 compiler and do not see any problem. In fact, as
a part of our regular testing, we test the code with all four
compilers - gcc, Intel, pathscale and pgi. It appears to be some
set-up issues.

May we know which version of MVAPICH2 0.9.8 you are using (from OFED
or from MVAPICH page). 

Thanks, 

DK


>     The MVAPICH2 was built with Intel 9.1-038 and I attempted to build
>     and run the app with the identical compiler.
> 
>     I use modules to properly set PATH, LD_LIBRARY_PATH, MAN_PATH,
>     LIBRARY_PATH, etc. for MVAPICH2 to avoid just some of the problems
>     you describe.
> 
>     The simple test codes are all pure C, no Fortran.
> 
>     My description of building and runing with gcc compilers was to
>     indicate that I had most of the MVAPICH2 things working, and even
>     the same test pgms. worked when using the gcc compilers.
> 
>     Has anyone had similar problems when building and running
>     MVAPICH2-0.9.8 with Intel compilers.
> 
>          regards,
> 
> Tom Mitchell wrote:
> > On Sep 12 03:41, Mark Potts wrote:
> >> Hi,
> >>    Taking my first steps with MVAPICH2-0.9.8 from OFED 1.2.
> >>
> >>    I am able to build and run apps using gcc to build MVAPICH2 and apps.
> >>
> >>    I am also (apparently) successfully building MVAPICH2 and simple C
> >>    code apps. with an icc Intel 9.1 compiler.  However, any
> >>    built-with-Intel app that I attempt to run fails with an output
> >>    similar to that shown at the bottom.  The same apps build and run
> >>    quite nicely using gcc.  The failure occurs at the very beginning
> >>    of the pgm, after:
> > 
> > Are you compiling MVAPICH2 with one compiler and the application
> > with another?  In general you do not want to do this.
> > 
> > When building any MPI the helper scripts (mpicc and friends)
> > have specific knowledge of the compiler and libs.  After
> > building the binary, it can be important to ensure that paths
> > (both PATH and LD_LIBRARY_PATH etc. (csh path .vs. PATH too))
> > find all the correct bits first.   These environment variables
> > are important both on the command line and also on compute
> > nodes including localhost.
> > 
> > When you get to Fortran there is a risk of subtle binary
> > incompatibility issues to confound you if you are mixing
> > compilers and other common libs.  So always start with
> > all the bits being compiled with a single compiler suite.
> > 
> > 
> > 
> > 
> >>       MPI_Init (&argc, &argv);
> >>
> >>    and before the completion of the next statement:
> >>
> >>       MPI_Comm_rank (MPI_COMM_WORLD, &rank);
> >>
> >>    Since we need both gcc and Intel versions of MVAPICH2, I would
> >>    appreciate any help understanding what is going wrong during
> >>    startup of these apps.
> >>
> >>    The output from mpich2version is:
> >> Version:           MVAPICH2-0.9.8
> >> Device:            osu_ch3:mrail
> >> Configure Options: --prefix=/usr/mpi/mvapich2-0.9.8-12/intel 
> >> --with-device=osu_ch3:mrail --with-rdma=gen2 --with-pm=mpd 
> >> --enable-romio --enable-sharedlibs=gcc --without-mpe
> >>
> >>
> >>   Typical failed mpiexec output:
> >>
> >>  mpiexec -machinefile hostdews -n 3 ./mpisanity_mv2
> >> Fatal error in MPI_Comm_rank: Invalid communicator, error stack:
> >> MPI_Comm_rank(105): MPI_Comm_rank(comm=0x1, rank=0x7fff4fb01cd8) failed
> >> MPI_Comm_rank(64).: Invalid communicatorrank 2 in job 1  dewberry4_32925 
> >>   caused collective abort of all ranks
> >>   exit status of rank 2: killed by signal 9
> >> Fatal error in MPI_Comm_rank: Invalid communicator, error stack:
> >> MPI_Comm_rank(105): MPI_Comm_rank(comm=0x1, rank=0x7fffddc500c8) failed
> >> MPI_Comm_rank(64).: Invalid communicatorrank 1 in job 1  dewberry4_32925 
> >>   caused collective abort of all ranks
> >>   exit status of rank 1: killed by signal 9
> >> Fatal error in MPI_Comm_rank: Invalid communicator, error stack:
> >> MPI_Comm_rank(105): MPI_Comm_rank(comm=0x1, rank=0x7fff13581138) failed
> >> MPI_Comm_rank(64).: Invalid communicatorrank 0 in job 1  dewberry4_32925 
> >>   caused collective abort of all ranks
> >>   exit status of rank 0: killed by signal 9
> >>
> >>    regards,
> >> -- 
> >> ***********************************
> >>>> Mark J. Potts, PhD
> >>>>
> >>>> HPC Applications Inc.
> >>>> phone: 410-992-8360 Bus
> >>>>        410-313-9318 Home
> >>>>        443-418-4375 Cell
> >>>> email: potts at hpcapplications.com
> >>>>        potts at excray.com
> >> ***********************************
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> > 
> 
> -- 
> ***********************************
>  >> Mark J. Potts, PhD
>  >>
>  >> HPC Applications Inc.
>  >> phone: 410-992-8360 Bus
>  >>        410-313-9318 Home
>  >>        443-418-4375 Cell
>  >> email: potts at hpcapplications.com
>  >>        potts at excray.com
> ***********************************
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 



More information about the mvapich-discuss mailing list