[mvapich-discuss] MVAPICH2 Invalid communicator errors

Shaun Rowland rowland at cse.ohio-state.edu
Thu Sep 13 01:07:46 EDT 2007


Mark Potts wrote:
> DK and Tom,
>    Thanks for your interest.
> 
>    I'm not certain what version info you wanted.  However, one
>    designator is "mvapich2-0.9.8-12".  The MVAPICH2 source was
>    obtained as part of OFED 1.2.  I'll get more explicit
>    version info (OFED and MVAPICH2) if you tell me what and where
>    to look.

That's the information we were looking for. The -12 is the RPM version
number, which has to be incremented whenever there is any SRPM change.
That should correspond to the latest MVAPICH2. There's a slightly
updated one with OFED 1.2.5.

>    We have built MVAPICH (and lots of other packages) with Intel
>    compilers and are using them without problem.  However, the
>    responses received to date indicate that the problem is not
>    a known issue with MVAPICH2 and Intel compilers and thus must
>    be a setup issue on our end.

It seems we have seen a similar error before on one of the clusters we
use. The cluster had a modules system to set up user environments, and
it ended up causing a different mpi.h file to be included, instead of
the one that was supposed to be used with the package the user expected
(from their specific build). You should check your user environment to
make sure there's not something like that happening, or that there's no
mpi.h in /usr/include or something. Also, check the mpicc command with
the -show argument I suggested and check the paths. The type of error we
would see was:

Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(110): MPI_Comm_size(comm=0x5b, size=0x7ffffff2b308) failed
MPI_Comm_size(69).: Invalidcommunicatorrank 0 in job 4  bm1_48690
caused collective abort of all ranks
  exit status of rank 0: killed by signal 9

which looks like your error.
-- 
Shaun Rowland	rowland at cse.ohio-state.edu
http://www.cse.ohio-state.edu/~rowland/


More information about the mvapich-discuss mailing list