[mvapich-discuss] Error in MPI_Neighbor_alltoallv

Hari Subramoni subramoni.1 at osu.edu
Mon Feb 29 23:19:05 EST 2016


This issue has been resolved through discussions off the group. The fix
will be available with the upcoming MVAPICH2 release.

Thx,
Hari.
On Nov 19, 2015 3:08 PM, "Hari Subramoni" <subramoni.1 at osu.edu> wrote:

> Hello,
>
> This is really an out of memory situation. We are working on a patch for
> this. We will get back to you soon. Do you happen to have a reproducer for
> the error? Could you also let us know your system configuration and the
> version of MVAPICH2 you're using?
>
> Thx,
> Hari.
> On Nov 18, 2015 2:17 PM, "Phanisri Pradeep Pratapa" <ppratapa at gatech.edu>
> wrote:
>
>> Hi,
>>
>> I am running a C++ code with MPI 3.0 through mvapich2/2.1.
>>
>> I use MPI_Neighbor_alltoallv in my code and it needs to be called in
>> every iteration. I have created a periodic cartesian topology to enable
>> local communication. I found that this function works correctly for a few
>> iterations and then fails after that giving the following error:
>>
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>
>>
>>
>>
>>
>>
>>
>>
>> *[cli_187]: aborting job:Fatal error in PMPI_Ineighbor_alltoallv:Other
>> MPI error, error stack:PMPI_Ineighbor_alltoallv(229).......:
>> MPI_Ineighbor_alltoallv(sendbuf=0x2aaac9fa5a20, sendcounts=0x2aaac81f1ab0,
>> sdispls=0x2aaac81f05d0, sendtype=MPI_DOUBLE, recvbuf=0x2aaac9f96050,
>> recvcounts=0x2aaac81f4470, rdispls=0x2aaac81f2f90, recvtype=MPI_DOUBLE,
>> comm=comm=0x84000006, request=0x7fffffPMPI_Ineighbor_alltoallv(215).......:
>> MPIR_Ineighbor_alltoallv_impl(112)..: MPIR_Ineighbor_alltoallv_default(78):
>> MPID_Sched_recv(599)................: MPIDU_Sched_add_entry(425)..........:
>> Out of memory*
>> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>>
>> This happens only when each processor is communicating with all the
>> processors (or more, since periodic) and the total number of processors is
>> greater than or equal to 216 (4 nodes). The function works fine for all
>> other cases I have tested. This happens for both blocking as well as
>> non-blocking versions. Moreover I encounter this kind of behaviour probably
>> about 8 out of 10 times I run the code (with the same inputs, commands,
>> options etc.) and the other 2 times it actually works out successfully. I
>> have debugged/run memory checks and found no memory leaks.
>>
>> There was a similar problem I found on this forum which someone else had
>> experienced, but there seems to be no final response to it:
>>
>> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2014-June/005002.html
>>
>> Please let me know if somebody can help.
>>
>> Thank you,
>>
>> Regards,
>>
>> Pradeep
>>
>>
>>
>>
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160229/7b7e88d7/attachment.html>


More information about the mvapich-discuss mailing list