[mvapich-discuss] RMA data corruption in 2.0, 2.1a
Hari Subramoni
subramoni.1 at osu.edu
Thu Oct 9 18:53:03 EDT 2014
We had offline discussion on this issue. A patch has been provided and the
issue is resolved. We are closing this thread. The patch will be available
on next releases.
Regards,
Hari.
On Mon, Oct 6, 2014 at 2:44 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:
> Thank you for the additional data points Dr. Fujita. We are taking a look
> at the issue. We will get back to you soon.
>
> Regards,
> Hari.
>
> On Mon, Oct 6, 2014 at 2:28 PM, Hajime Fujita <hfujita at uchicago.edu>
> wrote:
>
>> Thanks Hari,
>>
>> Some additional information that might help:
>> - On the same machine (Midway), OpenMPI 1.8.1 worked correctly.
>> - On NERSC Edison (Cray XC30 with Cray MPI 7.0), my test program also
>> worked correctly.
>>
>> This is why I suspect this could be an issue in MVAPICH.
>>
>>
>> Thanks,
>> Hajime
>>
>> Hari Subramoni wrote:
>> > Dear Dr. Fujita,
>> >
>> > Thank you for the report. We will take a look at this issue and get back
>> > to you soon.
>> >
>> > Regards,
>> > Hari.
>> >
>> > On Fri, Oct 3, 2014 at 5:22 PM, Hajime Fujita <hfujita at uchicago.edu
>> > <mailto:hfujita at uchicago.edu>> wrote:
>> >
>> > Dear MVAPICH2 team,
>> >
>> > We found a potential bug in MVAPICH2 2.0 and 2.1a regarding RMA.
>> >
>> > When we run the attached program on two nodes (1 process/node), it
>> > produces the wrong result. This setting means inter-process
>> > communications goes over InfiniBand.
>> >
>> > # Requesting interactive job with 2 nodes
>> > $ sinteractive -N 2
>> > $ mpiexec ./rma_putget_test # launches 2 proc
>> > loc_buff[1024]=-1 != 1024
>> >
>> > If it works correctly, there would be no output.
>> >
>> > If we run this on a single machine with multiple processes, it runs
>> > correctly.
>> > If I'm using MPI RMA functions in some incorrect way please let me
>> know.
>> >
>> >
>> > Hardware platform:
>> > UChicago RCC Midway
>> > http://rcc.uchicago.edu/resources/midway_specs.html
>> >
>> > MVAPICH versions and configurations:
>> >
>> > [hfujita at midway-login1 ~]$ mpichversion
>> > MVAPICH2 Version: 2.1a
>> > MVAPICH2 Release date: Sun Sep 21 12:00:00 EDT 2014
>> > MVAPICH2 Device: ch3:mrail
>> > MVAPICH2 configure:
>> > --prefix=/project/aachien/local/mvapich2-2.1a-gcc-4.8
>> --enable-shared
>> > MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -O2
>> > MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND -O2
>> > MVAPICH2 F77: gfortran -L/lib -L/lib -O2
>> > MVAPICH2 FC: gfortran -O2
>> >
>> > [hfujita at midway-login1 ~]$ mpichversion
>> > MVAPICH2 Version: 2.0
>> > MVAPICH2 Release date: Fri Jun 20 20:00:00 EDT 2014
>> > MVAPICH2 Device: ch3:mrail
>> > MVAPICH2 configure: --prefix=/software/mvapich2-2.0-el6-x86_64
>> > --enable-shared
>> > MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -O2
>> > MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND
>> > MVAPICH2 F77: gfortran -L/lib -L/lib -O2
>> > MVAPICH2 FC: gfortran
>> >
>> >
>> > Thank you,
>> > Hajime
>> >
>> > --
>> > Hajime Fujita
>> > Postdoctoral Scholar, Large-Scale Systems Group
>> > Department of Computer Science, The University of Chicago
>> > http://www.cs.uchicago.edu/people/hfujita
>> >
>> > _______________________________________________
>> > mvapich-discuss mailing list
>> > mvapich-discuss at cse.ohio-state.edu
>> > <mailto:mvapich-discuss at cse.ohio-state.edu>
>> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141009/e7cb3eb8/attachment.html>
More information about the mvapich-discuss
mailing list