[mvapich-discuss] MPI_Cancel bug?

Andrew Friedley friedley3 at llnl.gov
Mon Oct 29 11:36:23 EDT 2012


Hi,

I've attached a small program that posts an Irecv, then cancels it.  It works correctly for MPICH2 v1.4.1p1 and v1.5 and Open MPI v1.6.2, but crashes on both MVAPICH2 1.8 and 1.8.1.  On Open MPI I ran over 150 million iterations before killing it; MVAPICH2 crashes consistently on iteration 261896.

The output, when run under valgrind, is shown below.  I guess this is a bug?  Am I canceling (and testing for cancellation) properly?  Any ideas for a workaround?

Thanks,

Andrew

-------------- next part --------------
A non-text attachment was scrubbed...
Name: foo.c
Type: application/octet-stream
Size: 731 bytes
Desc: not available
Url : http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20121029/8f187367/foo.obj
-------------- next part --------------

i 261894
i 261895
i 261896
==33586== Invalid write of size 4
==33586==    at 0x513EF43: MPID_Irecv (in /g/g19/friedley/local/mvapich2-1.8-gcc-cab/lib/libmpich.so.3.3)
==33586==    by 0x513B0B6: PMPI_Irecv (in /g/g19/friedley/local/mvapich2-1.8-gcc-cab/lib/libmpich.so.3.3)
==33586==    by 0x400A3F: main (in /g/g19/friedley/svn/afriedle/hmpi2/foo)
==33586==  Address 0x4 is not stack'd, malloc'd or (recently) free'd
==33586== 
[cab26:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[cab26:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[cab26:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)

(repeat last line MANY times, then this..)

==33586== Stack overflow in thread 1: can't grow stack to 0x7fe001ec0
==33586== 
==33586== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==33586==  Access not within mapped region at address 0x7FE001EC0
==33586==    at 0x5E96E7B: buffered_vfprintf (vfprintf.c:2255)
==33586==  If you believe this happened as a result of a stack
==33586==  overflow in your program's main thread (unlikely but
==33586==  possible), you can try to increase the size of the
==33586==  main thread stack using the --main-stacksize= flag.
==33586==  The main thread stack size used in this run was 16777216.
==33586== Stack overflow in thread 1: can't grow stack to 0x7fe001eb8



More information about the mvapich-discuss mailing list