[mvapich-discuss] MVAPICH2 Invalid communicator errors

Nathan Dauchy Nathan.Dauchy at noaa.gov
Thu Sep 13 13:17:55 EDT 2007


We have also run into a very similar sounding problem, with
mvapich2-0.9.8-2007.08.30 and intel-9.1.

mpiexec -np 54 /rt1/rtruc/13km_wjet/exec/hybcst_sp
Fatal error in MPI_Comm_rank: Invalid communicator, error stack:
MPI_Comm_rank(105): MPI_Comm_rank(comm=0x5b, rank=0x7fbfffc898) failed
MPI_Comm_rank(64).: Invalid communicatorFatal error in MPI_Comm_rank:
Invalid communicator, error stack:

Unfortunately, I haven't found an /usr/include/mpi.h file or other quick
fix yet.

Is there supposed to be "-I/usr/include" in the output of "mpicc -show"?
  Perhaps something went wrong in the build process?  Here is the output
on the system with the Invalid communicator errors:

$ mpicc -show
icc -D_EM64T_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED
-DMPID_USE_SEQUENCE_NUMBERS -D_SHMEM_COLL_ -I/usr/include -O2
-I/opt/mvapich/2-0.9.8-2007.08.30/include
-L/opt/mvapich/2-0.9.8-2007.08.30/lib -lmpich -L/usr/lib64 -libverbs
-libumad -lpthread

Whereas another system using mvapich-0.9.9 works fine and does not have
"-I/usr/include":

$ mpicc -show
icc -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
-DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1
-L/opt/mvapich/0.9.9-1326_single_rail_intel_9.1/lib -lmpich -L/usr/lib64
-Wl,-rpath=/usr/lib64 -libverbs -libumad -lpthread -lpthread -lrt


"ldd" on the executable shows:

        libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000002a9566c000)
        libibumad.so.1 => /usr/lib64/libibumad.so.1 (0x0000002a95778000)
        libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000002a9589a000)
        libm.so.6 => /lib64/tls/libm.so.6 (0x0000002a959af000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000002a95b36000)
        libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a95c39000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000002a95e6d000)
        libibcommon.so.1 => /usr/lib64/libibcommon.so.1 (0x0000002a95f7b000)
        /lib64/ld-linux-x86-64.so.2 (0x0000002a95556000)

Any other suggestions of where to look?  Hopefully I'm missing something
obvious!

Thanks much,
Nathan




Shaun Rowland wrote:
> Mark Potts wrote:
>> DK and Tom,
>>    Thanks for your interest.
>>
>>    I'm not certain what version info you wanted.  However, one
>>    designator is "mvapich2-0.9.8-12".  The MVAPICH2 source was
>>    obtained as part of OFED 1.2.  I'll get more explicit
>>    version info (OFED and MVAPICH2) if you tell me what and where
>>    to look.
> 
> That's the information we were looking for. The -12 is the RPM version
> number, which has to be incremented whenever there is any SRPM change.
> That should correspond to the latest MVAPICH2. There's a slightly
> updated one with OFED 1.2.5.
> 
>>    We have built MVAPICH (and lots of other packages) with Intel
>>    compilers and are using them without problem.  However, the
>>    responses received to date indicate that the problem is not
>>    a known issue with MVAPICH2 and Intel compilers and thus must
>>    be a setup issue on our end.
> 
> It seems we have seen a similar error before on one of the clusters we
> use. The cluster had a modules system to set up user environments, and
> it ended up causing a different mpi.h file to be included, instead of
> the one that was supposed to be used with the package the user expected
> (from their specific build). You should check your user environment to
> make sure there's not something like that happening, or that there's no
> mpi.h in /usr/include or something. Also, check the mpicc command with
> the -show argument I suggested and check the paths. The type of error we
> would see was:
> 
> Fatal error in MPI_Comm_size: Invalid communicator, error stack:
> MPI_Comm_size(110): MPI_Comm_size(comm=0x5b, size=0x7ffffff2b308) failed
> MPI_Comm_size(69).: Invalidcommunicatorrank 0 in job 4  bm1_48690
> caused collective abort of all ranks
>  exit status of rank 0: killed by signal 9
> 
> which looks like your error.



More information about the mvapich-discuss mailing list