[mvapich-discuss] RMA data corruption in 2.0, 2.1a

Hajime Fujita hfujita at uchicago.edu
Mon Oct 6 14:28:29 EDT 2014


Thanks Hari,

Some additional information that might help:
- On the same machine (Midway), OpenMPI 1.8.1 worked correctly.
- On NERSC Edison (Cray XC30 with Cray MPI 7.0), my test program also
worked correctly.

This is why I suspect this could be an issue in MVAPICH.


Thanks,
Hajime

Hari Subramoni wrote:
> Dear Dr. Fujita,
> 
> Thank you for the report. We will take a look at this issue and get back
> to you soon.
> 
> Regards,
> Hari.
> 
> On Fri, Oct 3, 2014 at 5:22 PM, Hajime Fujita <hfujita at uchicago.edu
> <mailto:hfujita at uchicago.edu>> wrote:
> 
>     Dear MVAPICH2 team,
> 
>     We found a potential bug in MVAPICH2 2.0 and 2.1a regarding RMA.
> 
>     When we run the attached program on two nodes (1 process/node), it
>     produces the wrong result. This setting means inter-process
>     communications goes over InfiniBand.
> 
>       # Requesting interactive job with 2 nodes
>       $ sinteractive -N 2
>       $ mpiexec ./rma_putget_test # launches 2 proc
>       loc_buff[1024]=-1 != 1024
> 
>     If it works correctly, there would be no output.
> 
>     If we run this on a single machine with multiple processes, it runs
>     correctly.
>     If I'm using MPI RMA functions in some incorrect way please let me know.
> 
> 
>     Hardware platform:
>       UChicago RCC Midway
>       http://rcc.uchicago.edu/resources/midway_specs.html
> 
>     MVAPICH versions and configurations:
> 
>     [hfujita at midway-login1 ~]$ mpichversion
>     MVAPICH2 Version:       2.1a
>     MVAPICH2 Release date:  Sun Sep 21 12:00:00 EDT 2014
>     MVAPICH2 Device:        ch3:mrail
>     MVAPICH2 configure:
>     --prefix=/project/aachien/local/mvapich2-2.1a-gcc-4.8 --enable-shared
>     MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
>     MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
>     MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
>     MVAPICH2 FC:    gfortran   -O2
> 
>     [hfujita at midway-login1 ~]$ mpichversion
>     MVAPICH2 Version:       2.0
>     MVAPICH2 Release date:  Fri Jun 20 20:00:00 EDT 2014
>     MVAPICH2 Device:        ch3:mrail
>     MVAPICH2 configure:     --prefix=/software/mvapich2-2.0-el6-x86_64
>     --enable-shared
>     MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
>     MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND
>     MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
>     MVAPICH2 FC:    gfortran
> 
> 
>     Thank you,
>     Hajime
> 
>     --
>     Hajime Fujita
>     Postdoctoral Scholar, Large-Scale Systems Group
>     Department of Computer Science, The University of Chicago
>     http://www.cs.uchicago.edu/people/hfujita
> 
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> 
> 



More information about the mvapich-discuss mailing list