[mvapich-discuss] MVAPICH2 Invalid communicator errors

Mark Potts potts at hpcapplications.com
Thu Sep 13 15:45:41 EDT 2007


Hi,
    Our build guru here pointed out that the /usr/include that
    I find in the line in the MVAPICH2 mpicc script:
    MPI_CFLAGS="-D_EM64T_ -D_SMP_ -DUSE_HEADER_CACHING  -DONE_SIDED 
-DMPID_USE_SEQUENCE_NUMBERS -D_SHMEM_COLL_ -DRDMA_CM   -I/usr/i 
nclude -O2"
    is _not_ a result of any changes here.  The same line occurs
    in our MVAPICH2 gcc and icc mpicc scripts.  Unfortunately, MPI_CFLAGS
    goes into the icc command line first, before the MVAPICH2 include
    directory reference.

    The injection of the MPI_CFLAGS definition in the mpicc script
    apparently occurs as a result of the MVAPICH2 build sequence.
    There is no similar inclusion of a "-I/usr/include" in the
    MVAPICH-0.9.9 mpicc script(s).

    I'm also puzzled now why this seeming error in the MVAPICH2 mpicc
    script doesn't cause problems for gcc- as well as icc-built
    codes.

    I now know how to manually fix the problem but I would appreciate
    some more input from Shaun Rowland or other developer at OSU about
    this issue.  If /usr/include should be in mpicc, then it must be
    later in the command line.  Probably, it should not even be in the
    script.

          regards,

Nathan Dauchy wrote:
> We have also run into a very similar sounding problem, with
> mvapich2-0.9.8-2007.08.30 and intel-9.1.
> 
> mpiexec -np 54 /rt1/rtruc/13km_wjet/exec/hybcst_sp
> Fatal error in MPI_Comm_rank: Invalid communicator, error stack:
> MPI_Comm_rank(105): MPI_Comm_rank(comm=0x5b, rank=0x7fbfffc898) failed
> MPI_Comm_rank(64).: Invalid communicatorFatal error in MPI_Comm_rank:
> Invalid communicator, error stack:
> 
> Unfortunately, I haven't found an /usr/include/mpi.h file or other quick
> fix yet.
> 
> Is there supposed to be "-I/usr/include" in the output of "mpicc -show"?
>   Perhaps something went wrong in the build process?  Here is the output
> on the system with the Invalid communicator errors:
> 
> $ mpicc -show
> icc -D_EM64T_ -D_SMP_ -DUSE_HEADER_CACHING -DONE_SIDED
> -DMPID_USE_SEQUENCE_NUMBERS -D_SHMEM_COLL_ -I/usr/include -O2
> -I/opt/mvapich/2-0.9.8-2007.08.30/include
> -L/opt/mvapich/2-0.9.8-2007.08.30/lib -lmpich -L/usr/lib64 -libverbs
> -libumad -lpthread
> 
> Whereas another system using mvapich-0.9.9 works fine and does not have
> "-I/usr/include":
> 
> $ mpicc -show
> icc -DUSE_STDARG -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_UNISTD_H=1
> -DHAVE_STDARG_H=1 -DUSE_STDARG=1 -DMALLOC_RET_VOID=1
> -L/opt/mvapich/0.9.9-1326_single_rail_intel_9.1/lib -lmpich -L/usr/lib64
> -Wl,-rpath=/usr/lib64 -libverbs -libumad -lpthread -lpthread -lrt
> 
> 
> "ldd" on the executable shows:
> 
>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000002a9566c000)
>         libibumad.so.1 => /usr/lib64/libibumad.so.1 (0x0000002a95778000)
>         libpthread.so.0 => /lib64/tls/libpthread.so.0 (0x0000002a9589a000)
>         libm.so.6 => /lib64/tls/libm.so.6 (0x0000002a959af000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x0000002a95b36000)
>         libc.so.6 => /lib64/tls/libc.so.6 (0x0000002a95c39000)
>         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000002a95e6d000)
>         libibcommon.so.1 => /usr/lib64/libibcommon.so.1 (0x0000002a95f7b000)
>         /lib64/ld-linux-x86-64.so.2 (0x0000002a95556000)
> 
> Any other suggestions of where to look?  Hopefully I'm missing something
> obvious!
> 
> Thanks much,
> Nathan
> 
> 
> 
> 
> Shaun Rowland wrote:
>> Mark Potts wrote:
>>> DK and Tom,
>>>    Thanks for your interest.
>>>
>>>    I'm not certain what version info you wanted.  However, one
>>>    designator is "mvapich2-0.9.8-12".  The MVAPICH2 source was
>>>    obtained as part of OFED 1.2.  I'll get more explicit
>>>    version info (OFED and MVAPICH2) if you tell me what and where
>>>    to look.
>> That's the information we were looking for. The -12 is the RPM version
>> number, which has to be incremented whenever there is any SRPM change.
>> That should correspond to the latest MVAPICH2. There's a slightly
>> updated one with OFED 1.2.5.
>>
>>>    We have built MVAPICH (and lots of other packages) with Intel
>>>    compilers and are using them without problem.  However, the
>>>    responses received to date indicate that the problem is not
>>>    a known issue with MVAPICH2 and Intel compilers and thus must
>>>    be a setup issue on our end.
>> It seems we have seen a similar error before on one of the clusters we
>> use. The cluster had a modules system to set up user environments, and
>> it ended up causing a different mpi.h file to be included, instead of
>> the one that was supposed to be used with the package the user expected
>> (from their specific build). You should check your user environment to
>> make sure there's not something like that happening, or that there's no
>> mpi.h in /usr/include or something. Also, check the mpicc command with
>> the -show argument I suggested and check the paths. The type of error we
>> would see was:
>>
>> Fatal error in MPI_Comm_size: Invalid communicator, error stack:
>> MPI_Comm_size(110): MPI_Comm_size(comm=0x5b, size=0x7ffffff2b308) failed
>> MPI_Comm_size(69).: Invalidcommunicatorrank 0 in job 4  bm1_48690
>> caused collective abort of all ranks
>>  exit status of rank 0: killed by signal 9
>>
>> which looks like your error.
> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

-- 
***********************************
 >> Mark J. Potts, PhD
 >>
 >> HPC Applications Inc.
 >> phone: 410-992-8360 Bus
 >>        410-313-9318 Home
 >>        443-418-4375 Cell
 >> email: potts at hpcapplications.com
 >>        potts at excray.com
***********************************


More information about the mvapich-discuss mailing list