[mvapich-discuss] MPI_Iallreduce() Segfault Over 512 Processes

Derek Gaston friedmud at gmail.com
Thu Sep 29 21:04:29 EDT 2016


No problem Hari: stuff happens :-)

As it turns out... I just compiled MVAPICH2/2.2 and tried it out... and it
works just fine!  Ran it up to 2400 processors without issue.

Thanks for the quick response!

Derek

On Thu, Sep 29, 2016 at 9:10 AM Hari Subramoni <subramoni.1 at osu.edu> wrote:

> Hi Derek,
>
> Sorry to hear that you're facing issues. Can you please try with mvapich2
> 2.2 and see if the failures exist?
>
> Can you also send us the output of mpiname - a and the runtime flags (if
> any) that you're using?
>
> Thanks,
> Hari.
>
> On Sep 29, 2016 9:04 AM, "Derek Gaston" <friedmud at gmail.com> wrote:
>
>> Hello all... I'm running into a segfault with MPI_Iallreduce().  It is
>> segfaulting when using over 512 processors (yes, exactly 512.  It works at
>> 512 and segfaults at 513!).
>>
>> It feels like MVAPICH is switching algorithms or something... and the one
>> it's switching too isn't happy!
>>
>> I'm on an SGI ICE-X cluster with Mellanox ConnectX-3 ( MT27500 Family)
>> FDR Infiniband cards.
>>
>> My test application is down at the bottom of this email.  Using it I've
>> found that MVAPICH2/2.0.1 and MVAPICH2/2.1 both segfault...
>> while MVAPICH2/1.9 does NOT.  I haven't tried 2.2 yet, but I'll try to do
>> that tomorrow.
>>
>> Any advice?  Maybe there's a compile switch we missed or a runtime option
>> I should try?
>>
>> Thanks for any help!
>>
>> Derek
>>
>>
>> #include <mpi.h>
>>
>>
>>
>>
>>
>> int main(int argc, char** argv)
>>
>>
>> {
>>
>>
>>   MPI_Init(&argc, &argv);
>>
>>
>>
>>
>>
>>   double r = 1.2;
>>
>>
>>   double o;
>>
>>
>>
>>
>>
>>   MPI_Request req;
>>
>>
>>   MPI_Status  stat;
>>
>>
>>
>>
>>
>>   MPI_Iallreduce (&r, &o, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD, &req);
>>
>>
>>
>>
>>
>>   MPI_Wait(&req, &stat);
>>
>>
>>
>>
>>
>>   MPI_Finalize();
>>
>>
>> }
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160930/ca54491f/attachment.html>


More information about the mvapich-discuss mailing list