[mvapich-discuss] 2.0rc1: Crash in MPI-3 RMA program over Infiniband
Hajime Fujita
hfujita at uchicago.edu
Tue Mar 25 14:35:06 EDT 2014
Dear MVAPICH team,
I was glad to hear the release of MVAPICH2-2.0rc1, and immediately tried
it. Then I found that my MPI-3 RMA program started crashing.
The attached simple program is enough to reproduce the issue. Here's the
output:
[hfujita at midway-login1 mpimbench]$ mpiexec -n 2 -host
midway-login1,midway-login2 ./mpimbench
Message-based ping pong
4, 1.272331
8, 0.620984
16, 0.323668
32, 0.221903
64, 0.076136
128, 0.033388
256, 0.016455
512, 0.007715
1024, 0.004121
2048, 0.002435
4096, 0.002345
8192, 0.002069
16384, 0.002067
32768, 0.006494
65536, 0.001325
131072, 0.000686
262144, 0.000491
524288, 0.000423
1048576, 0.000395
RMA-based put
16, 0.491239
32, 0.299855
64, 0.155028
128, 0.078400
256, 0.040418
512, 0.020406
1024, 0.009608
2048, 0.004888
4096, 0.002399
8192, 0.002702
[midway-login1:mpi_rank_0][error_sighandler] Caught error: Segmentation
fault (signal 11)
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 9519 RUNNING AT midway-login1
= EXIT CODE: 11
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:1 at midway-login2] HYD_pmcd_pmip_control_cmd_cb
(pm/pmiserv/pmip_cb.c:913): assert (!closed) failed
[proxy:0:1 at midway-login2] HYDT_dmxu_poll_wait_for_event
(tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1 at midway-login2] main (pm/pmiserv/pmip.c:206): demux engine
error waiting for event
[mpiexec at midway-login1] HYDT_bscu_wait_for_completion
(tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated
badly; aborting
[mpiexec at midway-login1] HYDT_bsci_wait_for_completion
(tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
for completion
[mpiexec at midway-login1] HYD_pmci_wait_for_completion
(pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for
completion
[mpiexec at midway-login1] main (ui/mpich/mpiexec.c:336): process manager
error waiting for completion
This run was done on the UChicago Midway Cluster.
http://rcc.uchicago.edu/resources/midway_specs.html
One observation is that this issue happens only when I use Infiniband
for communication. If I launch the same program on a single node, it
successfully finishes.
And here's the output of mpichversion command.
[hfujita at midway-login1 mpimbench]$ mpichversion
MVAPICH2 Version: 2.0rc1
MVAPICH2 Release date: Sun Mar 23 21:35:26 EDT 2014
MVAPICH2 Device: ch3:mrail
MVAPICH2 configure: --disable-option-checking
--prefix=/project/aachien/local/mvapich2-2.0rc1-gcc-4.8 --enable-shared
--disable-checkerrors --cache-file=/dev/null --srcdir=. CC=gcc
CFLAGS=-DNDEBUG -DNVALGRIND -O2 LDFLAGS=-L/lib -Wl,-rpath,/lib -L/lib
-Wl,-rpath,/lib LIBS=-libmad -libumad -libverbs -lrt -lhwloc -lpthread
-lhwloc
CPPFLAGS=-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpid/ch3/channels/mrail/include
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpid/ch3/channels/mrail/include
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpid/ch3/channels/common/include
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpid/ch3/channels/common/include
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpid/ch3/channels/mrail/src/gen2
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpid/ch3/channels/mrail/src/gen2
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpid/common/locks
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpid/common/locks
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/util/wrappers
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/util/wrappers
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpl/include
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpl/include
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/openpa/src
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/openpa/src
-I/project/aachien/local/src/mvapich2-2.0rc1-gcc-4.8/src/mpi/romio/include
-I/include --with-cross=src/mpid/pamid/cross/bgq8 --enable-threads=multiple
MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -O2 -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND
MVAPICH2 F77: gfortran -O2
MVAPICH2 FC: gfortran
If you need more explanation or information please let me know.
Thanks,
Hajime
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpimbench.c
Type: text/x-csrc
Size: 5175 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20140325/6d4656ca/attachment.bin>
More information about the mvapich-discuss
mailing list