[mvapich-discuss] MVAPICH2 Invalid communicator errors
Mark Potts
potts at hpcapplications.com
Thu Sep 13 11:12:36 EDT 2007
Shaun,
You were dead on. A spurious reference to /usr/include/mpi.h in
a script originally intended for other MPI builds was the culprit.
Thanks -- until the next problem...
regards,
Shaun Rowland wrote:
> Mark Potts wrote:
>> DK and Tom,
>> Thanks for your interest.
>>
>> I'm not certain what version info you wanted. However, one
>> designator is "mvapich2-0.9.8-12". The MVAPICH2 source was
>> obtained as part of OFED 1.2. I'll get more explicit
>> version info (OFED and MVAPICH2) if you tell me what and where
>> to look.
>
> That's the information we were looking for. The -12 is the RPM version
> number, which has to be incremented whenever there is any SRPM change.
> That should correspond to the latest MVAPICH2. There's a slightly
> updated one with OFED 1.2.5.
>
>> We have built MVAPICH (and lots of other packages) with Intel
>> compilers and are using them without problem. However, the
>> responses received to date indicate that the problem is not
>> a known issue with MVAPICH2 and Intel compilers and thus must
>> be a setup issue on our end.
>
> It seems we have seen a similar error before on one of the clusters we
> use. The cluster had a modules system to set up user environments, and
> it ended up causing a different mpi.h file to be included, instead of
> the one that was supposed to be used with the package the user expected
> (from their specific build). You should check your user environment to
> make sure there's not something like that happening, or that there's no
> mpi.h in /usr/include or something. Also, check the mpicc command with
> the -show argument I suggested and check the paths. The type of error we
> would see was:
>
> Fatal error in MPI_Comm_size: Invalid communicator, error stack:
> MPI_Comm_size(110): MPI_Comm_size(comm=0x5b, size=0x7ffffff2b308) failed
> MPI_Comm_size(69).: Invalidcommunicatorrank 0 in job 4 bm1_48690
> caused collective abort of all ranks
> exit status of rank 0: killed by signal 9
>
> which looks like your error.
--
***********************************
>> Mark J. Potts, PhD
>>
>> HPC Applications Inc.
>> phone: 410-992-8360 Bus
>> 410-313-9318 Home
>> 443-418-4375 Cell
>> email: potts at hpcapplications.com
>> potts at excray.com
***********************************
More information about the mvapich-discuss
mailing list