[mvapich-discuss] MPI_Win_flush_all() crashes on shared memory

Mingzhe Li li.2192 at osu.edu
Fri Nov 14 15:28:09 EST 2014


Hi Hajime,

Thanks for your note. We saw a similar issue in our internal testing
framework and we already had a fix for this issue. The fix has been applied
to 2.0.1 release. It will also be available in the 2.1b release during the
next few days. If you want to keep using 2.1a, could you please apply the
attached patch to your local 2.1a codebase?

$cd /path/to/2.1a
$patch -p1 < /path/to/flush-all

Thanks,
Mingzhe

On Fri, Nov 14, 2014 at 1:35 PM, Hajime Fujita <hfujita at uchicago.edu> wrote:

> Hello,
>
> I have found a crash issue in MVAPICH2-2.1a.
>
> When I launch the attached program (even with one process), it crashes
> with SIGSEGV. However if I specify MV2_USE_SHARED_MEM=0, it runs correctly.
>
> I think I encountered a similar problem at the beginning of April this
> year. MV2_USE_SHARED_MEM=0 was given from Mingzhe at that moment for a
> workaround. However I'm curious if there is a way to fix this completely,
> as now I have a simpler reproducer.
>
> MVAPICH version is MVAPICH2-2.1a, also includes a patch given by Mingzhe.
> I don't know how to identify the patch, but I believe it corresponds to
> this in 2.0.1.
> >    - Add check for pending operations in one-sided channel in flush_all
>
> Hardware platform:
>   UChicago RCC Midway
>   http://rcc.uchicago.edu/resources/midway_specs.html
>
>
> The following is the log taken on the Midway system. I had one node, one
> process allocation, so just typing "mpiexec" implies "mpiexec -n 1".
> ----
>
> [hfujita at midway070 rma_winflush_test]$ mpiexec ./rma_winflush_testProcess
> 0: going to issue Win_flush_all()
> [midway070:mpi_rank_0][error_sighandler] Caught error: Segmentation fault
> (signal 11)
>
> ============================================================
> =======================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 9363 RUNNING AT midway070
> =   EXIT CODE: 11
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ============================================================
> =======================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
> (signal 11)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
> [hfujita at midway070 rma_winflush_test]$ MV2_USE_SHARED_MEM=0 mpiexec
> ./rma_winflush_test
> Process 0: going to issue Win_flush_all()
> Process 0: after Win_flush_all()
>
>
>
> [hfujita at midway070 rma_winflush_test]$ mpichversion
> MVAPICH2 Version:       2.1a
> MVAPICH2 Release date:  Sun Sep 21 12:00:00 EDT 2014
> MVAPICH2 Device:        ch3:mrail
> MVAPICH2 configure: --prefix=/project/aachien/local/mvapich2-2.1a-gcc-4.8-rma-patch
> --enable-shared --no-create --no-recursion
> MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
> MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
> MVAPICH2 FC:    gfortran   -O2
> ----
>
>
> Thank you,
> Hajime
>
> --
> Hajime Fujita
> Postdoctoral Scholar, Large-Scale Systems Group
> Department of Computer Science, The University of Chicago
> http://www.cs.uchicago.edu/people/hfujita
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141114/5aa49994/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flush-all
Type: application/octet-stream
Size: 807 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141114/5aa49994/attachment.obj>


More information about the mvapich-discuss mailing list