[mvapich-discuss] MPI_Iallreduce() Segfault Over 512 Processes

Hari Subramoni subramoni.1 at osu.edu
Thu Sep 29 21:06:43 EDT 2016


Hi Derek,

Good to know that MVAPICH2-2.2 works fine :-)

Regards,
Hari.

On Thu, Sep 29, 2016 at 9:04 PM, Derek Gaston <friedmud at gmail.com> wrote:

> No problem Hari: stuff happens :-)
>
> As it turns out... I just compiled MVAPICH2/2.2 and tried it out... and it
> works just fine!  Ran it up to 2400 processors without issue.
>
> Thanks for the quick response!
>
> Derek
>
> On Thu, Sep 29, 2016 at 9:10 AM Hari Subramoni <subramoni.1 at osu.edu>
> wrote:
>
>> Hi Derek,
>>
>> Sorry to hear that you're facing issues. Can you please try with mvapich2
>> 2.2 and see if the failures exist?
>>
>> Can you also send us the output of mpiname - a and the runtime flags (if
>> any) that you're using?
>>
>> Thanks,
>> Hari.
>>
>> On Sep 29, 2016 9:04 AM, "Derek Gaston" <friedmud at gmail.com> wrote:
>>
>>> Hello all... I'm running into a segfault with MPI_Iallreduce().  It is
>>> segfaulting when using over 512 processors (yes, exactly 512.  It works at
>>> 512 and segfaults at 513!).
>>>
>>> It feels like MVAPICH is switching algorithms or something... and the
>>> one it's switching too isn't happy!
>>>
>>> I'm on an SGI ICE-X cluster with Mellanox ConnectX-3 ( MT27500 Family)
>>> FDR Infiniband cards.
>>>
>>> My test application is down at the bottom of this email.  Using it I've
>>> found that MVAPICH2/2.0.1 and MVAPICH2/2.1 both segfault...
>>> while MVAPICH2/1.9 does NOT.  I haven't tried 2.2 yet, but I'll try to do
>>> that tomorrow.
>>>
>>> Any advice?  Maybe there's a compile switch we missed or a runtime
>>> option I should try?
>>>
>>> Thanks for any help!
>>>
>>> Derek
>>>
>>>
>>> #include <mpi.h>
>>>
>>>
>>>
>>>
>>>
>>> int main(int argc, char** argv)
>>>
>>>
>>> {
>>>
>>>
>>>   MPI_Init(&argc, &argv);
>>>
>>>
>>>
>>>
>>>
>>>   double r = 1.2;
>>>
>>>
>>>   double o;
>>>
>>>
>>>
>>>
>>>
>>>   MPI_Request req;
>>>
>>>
>>>   MPI_Status  stat;
>>>
>>>
>>>
>>>
>>>
>>>   MPI_Iallreduce (&r, &o, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD, &req);
>>>
>>>
>>>
>>>
>>>
>>>   MPI_Wait(&req, &stat);
>>>
>>>
>>>
>>>
>>>
>>>   MPI_Finalize();
>>>
>>>
>>> }
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160929/7baa9c56/attachment-0001.html>


More information about the mvapich-discuss mailing list