[mvapich-discuss] MPI_Win_flush_all() crashes on shared memory

Mingzhe Li li.2192 at osu.edu
Tue Nov 18 09:33:57 EST 2014


Hi Hajime,

Good to know that it solved your issue. Thank you for trying it out.

Thanks,
Mingzhe
Hi Mingzhe,

Thanks! The patch worked for 2.1a.


Best,
Hajime

Mingzhe Li wrote:

> Hi Hajime,
>
> Thanks for your note. We saw a similar issue in our internal testing
> framework and we already had a fix for this issue. The fix has been
> applied to 2.0.1 release. It will also be available in the 2.1b release
> during the next few days. If you want to keep using 2.1a, could you
> please apply the attached patch to your local 2.1a codebase?
>
> $cd /path/to/2.1a
> $patch -p1 < /path/to/flush-all
>
> Thanks,
> Mingzhe
>
> On Fri, Nov 14, 2014 at 1:35 PM, Hajime Fujita <hfujita at uchicago.edu
> <mailto:hfujita at uchicago.edu>> wrote:
>
>     Hello,
>
>     I have found a crash issue in MVAPICH2-2.1a.
>
>     When I launch the attached program (even with one process), it
>     crashes with SIGSEGV. However if I specify MV2_USE_SHARED_MEM=0, it
>     runs correctly.
>
>     I think I encountered a similar problem at the beginning of April
>     this year. MV2_USE_SHARED_MEM=0 was given from Mingzhe at that
>     moment for a workaround. However I'm curious if there is a way to
>     fix this completely, as now I have a simpler reproducer.
>
>     MVAPICH version is MVAPICH2-2.1a, also includes a patch given by
>     Mingzhe. I don't know how to identify the patch, but I believe it
>     corresponds to this in 2.0.1.
>      >    - Add check for pending operations in one-sided channel in
>     flush_all
>
>     Hardware platform:
>        UChicago RCC Midway
>     http://rcc.uchicago.edu/__resources/midway_specs.html
>     <http://rcc.uchicago.edu/resources/midway_specs.html>
>
>
>     The following is the log taken on the Midway system. I had one node,
>     one process allocation, so just typing "mpiexec" implies "mpiexec -n
> 1".
>     ----
>
>     [hfujita at midway070 rma_winflush_test]$ mpiexec
>     ./rma_winflush_testProcess 0: going to issue Win_flush_all()
>     [midway070:mpi_rank_0][error___sighandler] Caught error:
>     Segmentation fault (signal 11)
>
>     ==============================__============================
> ==__=======================
>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>     =   PID 9363 RUNNING AT midway070
>     =   EXIT CODE: 11
>     =   CLEANING UP REMAINING PROCESSES
>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>     ==============================__============================
> ==__=======================
>     YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault
>     (signal 11)
>     This typically refers to a problem with your application.
>     Please see the FAQ page for debugging suggestions
>     [hfujita at midway070 rma_winflush_test]$ MV2_USE_SHARED_MEM=0 mpiexec
>     ./rma_winflush_test
>     Process 0: going to issue Win_flush_all()
>     Process 0: after Win_flush_all()
>
>
>
>     [hfujita at midway070 rma_winflush_test]$ mpichversion
>     MVAPICH2 Version:       2.1a
>     MVAPICH2 Release date:  Sun Sep 21 12:00:00 EDT 2014
>     MVAPICH2 Device:        ch3:mrail
>     MVAPICH2 configure:
>     --prefix=/project/aachien/__local/mvapich2-2.1a-gcc-4.8-__rma-patch
>     --enable-shared --no-create --no-recursion
>     MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
>     MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
>     MVAPICH2 F77:   gfortran -L/lib -L/lib   -O2
>     MVAPICH2 FC:    gfortran   -O2
>     ----
>
>
>     Thank you,
>     Hajime
>
>     --
>     Hajime Fujita
>     Postdoctoral Scholar, Large-Scale Systems Group
>     Department of Computer Science, The University of Chicago
>     http://www.cs.uchicago.edu/__people/hfujita
>     <http://www.cs.uchicago.edu/people/hfujita>
>
>     _______________________________________________
>     mvapich-discuss mailing list
>     mvapich-discuss at cse.ohio-state.edu
>     <mailto:mvapich-discuss at cse.ohio-state.edu>
>     http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20141118/a01abc62/attachment.html>


More information about the mvapich-discuss mailing list