[mvapich-discuss] RMA data corruption in 2.0, 2.1a
Hari Subramoni
subramoni.1 at osu.edu
Mon Oct 6 14:44:23 EDT 2014
Thank you for the additional data points Dr. Fujita. We are taking a look
at the issue. We will get back to you soon.
Regards,
Hari.
On Mon, Oct 6, 2014 at 2:28 PM, Hajime Fujita <hfujita at uchicago.edu> wrote:
> Thanks Hari,
>
> Some additional information that might help:
> - On the same machine (Midway), OpenMPI 1.8.1 worked correctly.
> - On NERSC Edison (Cray XC30 with Cray MPI 7.0), my test program also
> worked correctly.
>
> This is why I suspect this could be an issue in MVAPICH.
>
>
> Thanks,
> Hajime
>
> Hari Subramoni wrote:
> > Dear Dr. Fujita,
> >
> > Thank you for the report. We will take a look at this issue and get back
> > to you soon.
> >
> > Regards,
> > Hari.
> >
> > On Fri, Oct 3, 2014 at 5:22 PM, Hajime Fujita <hfujita at uchicago.edu
> > <mailto:hfujita at uchicago.edu>> wrote:
> >
> > Dear MVAPICH2 team,
> >
> > We found a potential bug in MVAPICH2 2.0 and 2.1a regarding RMA.
> >
> > When we run the attached program on two nodes (1 process/node), it
> > produces the wrong result. This setting means inter-process
> > communications goes over InfiniBand.
> >
> > # Requesting interactive job with 2 nodes
> > $ sinteractive -N 2
> > $ mpiexec ./rma_putget_test # launches 2 proc
> > loc_buff[1024]=-1 != 1024
> >
> > If it works correctly, there would be no output.
> >
> > If we run this on a single machine with multiple processes, it runs
> > correctly.
> > If I'm using MPI RMA functions in some incorrect way please let me
> know.
> >
> >
> > Hardware platform:
> > UChicago RCC Midway
> > http://rcc.uchicago.edu/resources/midway_specs.html
> >
> > MVAPICH versions and configurations:
> >
> > [hfujita at midway-login1 ~]$ mpichversion
> > MVAPICH2 Version: 2.1a
> > MVAPICH2 Release date: Sun Sep 21 12:00:00 EDT 2014
> > MVAPICH2 Device: ch3:mrail
> > MVAPICH2 configure:
> > --prefix=/project/aachien/local/mvapich2-2.1a-gcc-4.8 --enable-shared
> > MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -O2
> > MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND -O2
> > MVAPICH2 F77: gfortran -L/lib -L/lib -O2
> > MVAPICH2 FC: gfortran -O2
> >
> > [hfujita at midway-login1 ~]$ mpichversion
> > MVAPICH2 Version: 2.0
> > MVAPICH2 Release date: Fri Jun 20 20:00:00 EDT 2014
> > MVAPICH2 Device: ch3:mrail
> > MVAPICH2 configure: --prefix=/software/mvapich2-2.0-el6-x86_64
> > --enable-shared
> > MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -O2
> > MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND
> > MVAPICH2 F77: gfortran -L/lib -L/lib -O2
> > MVAPICH2 FC: gfortran
> >
> >
> > Thank you,
> > Hajime
> >
> > --
> > Hajime Fujita
> > Postdoctoral Scholar, Large-Scale Systems Group
> > Department of Computer Science, The University of Chicago
> > http://www.cs.uchicago.edu/people/hfujita
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > <mailto:mvapich-discuss at cse.ohio-state.edu>
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141006/2f38890a/attachment.html>
More information about the mvapich-discuss
mailing list