[mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96
Dorian Krause
doriankrause at web.de
Tue Aug 11 17:36:53 EDT 2009
Dear list members,
I have a code which uses MPI_Put + MPI_Win_fence for communication. The
code runs fine with OpenMPI (tested for 8, 16, 32, 48, 64, 96 processors
without problems) and with mvapich for less than 96 processors (the
maximal number I have currently access to). The core I got shows me the
following:
#0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=<value
optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10,
remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c,
rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137
1137 ++(vc_ptr->mrail.rails[rail].postsend_times_1sc);
(gdb) p rail
No symbol "rail" in current context.
(gdb) p vc_ptr
$1 = (MPIDI_VC_t *) 0x10013c60
Current language: auto; currently c
(gdb) p vc_ptr->mrail
$2 = {num_rails = 1, rails = 0x0, next_packet_expected = 0,
next_packet_tosend = 0, outstanding_eager_vbufs = 0, coalesce_vbuf =
0x0, rfp = {RDMA_send_buf_DMA = 0x0, RDMA_recv_buf_DMA = 0x0,
RDMA_send_buf = 0x0, RDMA_recv_buf = 0x0, RDMA_send_buf_mr = {0x0,
0x0, 0x0, 0x0}, RDMA_recv_buf_mr = {0x0, 0x0, 0x0, 0x0},
RDMA_remote_buf_rkey = {0, 0, 0, 0}, rdma_credit = 0 '\0',
remote_RDMA_buf = 0x0, phead_RDMA_send = 0, ptail_RDMA_send = 0,
p_RDMA_recv = 0, p_RDMA_recv_tail = 0, eager_start_cnt = 0,
in_polling_set = 0, cached_outgoing = 0x0, cached_incoming = 0x0,
cached_hit = 0, cached_miss = 0}, srp = {credits = 0x0}, cmanager =
{num_channels = 0, num_local_pollings = 0, msg_channels = 0x0,
next_arriving = 0x0, inqueue = 0, prev = 0x0, next = 0x0,
pending_vbuf = 0, vc = 0x0}, packetized_recv = 0x0, sreq_head = 0x0,
sreq_tail = 0x0, nextflow = 0x0, inflow = 0, remote_vc_addr = 0}
(gdb) p vc_ptr->mrail.rails
$3 = (struct mrail_rail *) 0x0
(gdb) bt
#0 Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=<value
optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10,
remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c,
rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137
#1 0x000000000044a09a in MPIDI_CH3I_RDMA_post (win_ptr=0x6e06a0,
target_rank=0) at rdma_iba_1sc.c:476
#2 0x000000000045f434 in MPIDI_Win_fence (assert=12288,
win_ptr=0x6e06a0) at ch3u_rma_sync.c:165
#3 0x000000000041fecd in PMPI_Win_fence (assert=12288, win=-1610612736)
at win_fence.c:108
#4 0x0000000000409dfc in hgc::OscPt2PtCommunicationGraph::sendP2M
(this=0x10806650, list=@0x10278fe0) at comm/Window.hh:81
#5 0x0000000000404a5d in main (argc=2, argv=0x7ffffac97398) at
Scale4Bonn/scale.cc:129
Obviously vc_ptr->mrail.rails is NULL. Can you help me to understand why?
The relevant code snippet is
mWindow.fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE);
for(int k = 0; k < mTop.numprocs(); ++k) {
if(1 == mMustResend[k]) {
mWindow.put(&mSendBuf[k], 1, MPI_INT, k,
mLocalGroup.myrank(), 1, MPI_INT);
}
}
mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED |
MPI_MODE_NOPUT);
and on the receiver side I just have
mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOPRECEDE);
mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED |
MPI_MODE_NOPUT);
mWindow is an instance of a wrapper class about MPI_Window, The
functions put and fence directly map to MPI_Win_put and MPI_Win_fence ...
For this test I used mvapich2 1.4 rc1 configured with
./configure --prefix=/home/kraused/mvapich2/1.4rc1/gcc-4.1.2/ CFLAGS=-O0
-ggdb CXXFLAGS=-ggdb FCFLAGS=-ggdb
Thanks for your help!
Regards,
Dorian
More information about the mvapich-discuss
mailing list