[mvapich-discuss] RMA data corruption in 2.0, 2.1a

Hari Subramoni subramoni.1 at osu.edu
Thu Oct 9 18:53:03 EDT 2014


We had offline discussion on this issue. A patch has been provided and the
issue is resolved. We are closing this thread. The patch will be available
on next releases.

Regards,
Hari.

On Mon, Oct 6, 2014 at 2:44 PM, Hari Subramoni <subramoni.1 at osu.edu> wrote:

> Thank you for the additional data points Dr. Fujita. We are taking a look
> at the issue. We will get back to you soon.
>
> Regards,
> Hari.
>
> On Mon, Oct 6, 2014 at 2:28 PM, Hajime Fujita <hfujita at uchicago.edu>
> wrote:
>
>> Thanks Hari,
>>
>> Some additional information that might help:
>> - On the same machine (Midway), OpenMPI 1.8.1 worked correctly.
>> - On NERSC Edison (Cray XC30 with Cray MPI 7.0), my test program also
>> worked correctly.
>>
>> This is why I suspect this could be an issue in MVAPICH.
>>
>>
>> Thanks,
>> Hajime
>>
>> Hari Subramoni wrote:
>> > Dear Dr. Fujita,
>> >
>> > Thank you for the report. We will take a look at this issue and get back
>> > to you soon.
>> >
>> > Regards,
>> > Hari.
>> >
>> > On Fri, Oct 3, 2014 at 5:22 PM, Hajime Fujita <hfujita at uchicago.edu
>> > <mailto:hfujita at uchicago.edu>> wrote:
>> >
>> >     Dear MVAPICH2 team,
>> >
>> >     We found a potential bug in MVAPICH2 2.0 and 2.1a regarding RMA.
>> >
>> >     When we run the attached program on two nodes (1 process/node), it
>> >     produces the wrong result. This setting means inter-process
>> >     communications goes over InfiniBand.
>> >
>> >       # Requesting interactive job with 2 nodes
>> >       $ sinteractive -N 2
>> >       $ mpiexec ./rma_putget_test # launches 2 proc
>> >       loc_buff[1024]=-1 != 1024
>> >
>> >     If it works correctly, there would be no output.
>> >
>> >     If we run this on a single machine with multiple processes, it runs
>> >     correctly.
>> >     If I'm using MPI RMA functions in some incorrect way please let me
>> know.
>> >
>> >
>> >     Hardware platform:
>> >       UChicago RCC Midway
>> >       http://rcc.uchicago.edu/resources/midway_specs.html
>> >
>> >     MVAPICH versions and configurations:
>> >
>> >     [hfujita at midway-login1 ~]$ mpichversion
>> >     MVAPICH2 Version:       2.1a
>> >     MVAPICH2 Release date:  Sun Sep 21 12:00:00 EDT 2014
>> >     MVAPICH2 Device:        ch3:mrail
>> >     MVAPICH2 configure:
>> >     --prefix=/project/aachien/local/mvapich2-2.1a-gcc-4.8
>> --enable-shared
>> >     MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
>> >     MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
>> >     MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
>> >     MVAPICH2 FC:    gfortran   -O2
>> >
>> >     [hfujita at midway-login1 ~]$ mpichversion
>> >     MVAPICH2 Version:       2.0
>> >     MVAPICH2 Release date:  Fri Jun 20 20:00:00 EDT 2014
>> >     MVAPICH2 Device:        ch3:mrail
>> >     MVAPICH2 configure:     --prefix=/software/mvapich2-2.0-el6-x86_64
>> >     --enable-shared
>> >     MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
>> >     MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND
>> >     MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
>> >     MVAPICH2 FC:    gfortran
>> >
>> >
>> >     Thank you,
>> >     Hajime
>> >
>> >     --
>> >     Hajime Fujita
>> >     Postdoctoral Scholar, Large-Scale Systems Group
>> >     Department of Computer Science, The University of Chicago
>> >     http://www.cs.uchicago.edu/people/hfujita
>> >
>> >     _______________________________________________
>> >     mvapich-discuss mailing list
>> >     mvapich-discuss at cse.ohio-state.edu
>> >     <mailto:mvapich-discuss at cse.ohio-state.edu>
>> >     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> >
>> >
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141009/e7cb3eb8/attachment.html>


More information about the mvapich-discuss mailing list