[mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96

Dorian Krause doriankrause at web.de
Tue Aug 11 17:36:53 EDT 2009


Dear list members,

I have a code which uses MPI_Put + MPI_Win_fence for communication. The 
code runs fine with OpenMPI (tested for 8, 16, 32, 48, 64, 96 processors 
without problems) and with mvapich for less than 96 processors (the 
maximal number I have currently access to). The core I got shows me the 
following:

#0  Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=<value 
optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, 
remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c,
    rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137
1137            ++(vc_ptr->mrail.rails[rail].postsend_times_1sc);
(gdb) p rail
No symbol "rail" in current context.
(gdb) p vc_ptr
$1 = (MPIDI_VC_t *) 0x10013c60
Current language:  auto; currently c
(gdb) p vc_ptr->mrail
$2 = {num_rails = 1, rails = 0x0, next_packet_expected = 0, 
next_packet_tosend = 0, outstanding_eager_vbufs = 0, coalesce_vbuf = 
0x0, rfp = {RDMA_send_buf_DMA = 0x0, RDMA_recv_buf_DMA = 0x0,
    RDMA_send_buf = 0x0, RDMA_recv_buf = 0x0, RDMA_send_buf_mr = {0x0, 
0x0, 0x0, 0x0}, RDMA_recv_buf_mr = {0x0, 0x0, 0x0, 0x0}, 
RDMA_remote_buf_rkey = {0, 0, 0, 0}, rdma_credit = 0 '\0',
    remote_RDMA_buf = 0x0, phead_RDMA_send = 0, ptail_RDMA_send = 0, 
p_RDMA_recv = 0, p_RDMA_recv_tail = 0, eager_start_cnt = 0, 
in_polling_set = 0, cached_outgoing = 0x0, cached_incoming = 0x0,
    cached_hit = 0, cached_miss = 0}, srp = {credits = 0x0}, cmanager = 
{num_channels = 0, num_local_pollings = 0, msg_channels = 0x0, 
next_arriving = 0x0, inqueue = 0, prev = 0x0, next = 0x0,
    pending_vbuf = 0, vc = 0x0}, packetized_recv = 0x0, sreq_head = 0x0, 
sreq_tail = 0x0, nextflow = 0x0, inflow = 0, remote_vc_addr = 0}
(gdb) p vc_ptr->mrail.rails
$3 = (struct mrail_rail *) 0x0
(gdb) bt
#0  Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=<value 
optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10, 
remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c,
    rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137
#1  0x000000000044a09a in MPIDI_CH3I_RDMA_post (win_ptr=0x6e06a0, 
target_rank=0) at rdma_iba_1sc.c:476
#2  0x000000000045f434 in MPIDI_Win_fence (assert=12288, 
win_ptr=0x6e06a0) at ch3u_rma_sync.c:165
#3  0x000000000041fecd in PMPI_Win_fence (assert=12288, win=-1610612736) 
at win_fence.c:108
#4  0x0000000000409dfc in hgc::OscPt2PtCommunicationGraph::sendP2M 
(this=0x10806650, list=@0x10278fe0) at comm/Window.hh:81
#5  0x0000000000404a5d in main (argc=2, argv=0x7ffffac97398) at 
Scale4Bonn/scale.cc:129


Obviously vc_ptr->mrail.rails is NULL. Can you help me to understand why?

The relevant code snippet is

        mWindow.fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE);
        for(int k = 0; k < mTop.numprocs(); ++k) {
                if(1 == mMustResend[k]) {
                        mWindow.put(&mSendBuf[k], 1, MPI_INT, k,
                                mLocalGroup.myrank(), 1, MPI_INT);
                }
        }
        mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | 
MPI_MODE_NOPUT);

and on the receiver side I just have

        mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOPRECEDE);
        mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED | 
MPI_MODE_NOPUT);

mWindow is an instance of a wrapper class about MPI_Window, The 
functions put and fence directly map to MPI_Win_put and MPI_Win_fence ...

For this test I used mvapich2 1.4 rc1 configured with

./configure --prefix=/home/kraused/mvapich2/1.4rc1/gcc-4.1.2/ CFLAGS=-O0 
-ggdb CXXFLAGS=-ggdb FCFLAGS=-ggdb

Thanks for your help!

Regards,
Dorian



More information about the mvapich-discuss mailing list