[mvapich-discuss] RMA data corruption in 2.0, 2.1a

Hari Subramoni subramoni.1 at osu.edu
Mon Oct 6 14:44:23 EDT 2014


Thank you for the additional data points Dr. Fujita. We are taking a look
at the issue. We will get back to you soon.

Regards,
Hari.

On Mon, Oct 6, 2014 at 2:28 PM, Hajime Fujita <hfujita at uchicago.edu> wrote:

> Thanks Hari,
>
> Some additional information that might help:
> - On the same machine (Midway), OpenMPI 1.8.1 worked correctly.
> - On NERSC Edison (Cray XC30 with Cray MPI 7.0), my test program also
> worked correctly.
>
> This is why I suspect this could be an issue in MVAPICH.
>
>
> Thanks,
> Hajime
>
> Hari Subramoni wrote:
> > Dear Dr. Fujita,
> >
> > Thank you for the report. We will take a look at this issue and get back
> > to you soon.
> >
> > Regards,
> > Hari.
> >
> > On Fri, Oct 3, 2014 at 5:22 PM, Hajime Fujita <hfujita at uchicago.edu
> > <mailto:hfujita at uchicago.edu>> wrote:
> >
> >     Dear MVAPICH2 team,
> >
> >     We found a potential bug in MVAPICH2 2.0 and 2.1a regarding RMA.
> >
> >     When we run the attached program on two nodes (1 process/node), it
> >     produces the wrong result. This setting means inter-process
> >     communications goes over InfiniBand.
> >
> >       # Requesting interactive job with 2 nodes
> >       $ sinteractive -N 2
> >       $ mpiexec ./rma_putget_test # launches 2 proc
> >       loc_buff[1024]=-1 != 1024
> >
> >     If it works correctly, there would be no output.
> >
> >     If we run this on a single machine with multiple processes, it runs
> >     correctly.
> >     If I'm using MPI RMA functions in some incorrect way please let me
> know.
> >
> >
> >     Hardware platform:
> >       UChicago RCC Midway
> >       http://rcc.uchicago.edu/resources/midway_specs.html
> >
> >     MVAPICH versions and configurations:
> >
> >     [hfujita at midway-login1 ~]$ mpichversion
> >     MVAPICH2 Version:       2.1a
> >     MVAPICH2 Release date:  Sun Sep 21 12:00:00 EDT 2014
> >     MVAPICH2 Device:        ch3:mrail
> >     MVAPICH2 configure:
> >     --prefix=/project/aachien/local/mvapich2-2.1a-gcc-4.8 --enable-shared
> >     MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
> >     MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
> >     MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
> >     MVAPICH2 FC:    gfortran   -O2
> >
> >     [hfujita at midway-login1 ~]$ mpichversion
> >     MVAPICH2 Version:       2.0
> >     MVAPICH2 Release date:  Fri Jun 20 20:00:00 EDT 2014
> >     MVAPICH2 Device:        ch3:mrail
> >     MVAPICH2 configure:     --prefix=/software/mvapich2-2.0-el6-x86_64
> >     --enable-shared
> >     MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
> >     MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND
> >     MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
> >     MVAPICH2 FC:    gfortran
> >
> >
> >     Thank you,
> >     Hajime
> >
> >     --
> >     Hajime Fujita
> >     Postdoctoral Scholar, Large-Scale Systems Group
> >     Department of Computer Science, The University of Chicago
> >     http://www.cs.uchicago.edu/people/hfujita
> >
> >     _______________________________________________
> >     mvapich-discuss mailing list
> >     mvapich-discuss at cse.ohio-state.edu
> >     <mailto:mvapich-discuss at cse.ohio-state.edu>
> >     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141006/2f38890a/attachment.html>


More information about the mvapich-discuss mailing list