[mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96

Dhabaleswar Panda panda at cse.ohio-state.edu
Tue Aug 11 17:55:00 EDT 2009


Dorian,

Thanks for your report. Do you see this error with the latest trunk
version of MVAPICH2 1.4? After the RC1 release, some fixes have gone into
the trunk. We are in preparation to bring out RC2.

We will also be taking a look at this issue in the mean time.

Thanks,

DK

On Tue, 11 Aug 2009, Dorian Krause wrote:

> Dear list members,
>
> I have a code which uses MPI_Put + MPI_Win_fence for communication. The
> code runs fine with OpenMPI (tested for 8, 16, 32, 48, 64, 96 processors
> without problems) and with mvapich for less than 96 processors (the
> maximal number I have currently access to). The core I got shows me the
> following:
>
> #0  Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=<value
> optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10,
> remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c,
>     rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137
> 1137            ++(vc_ptr->mrail.rails[rail].postsend_times_1sc);
> (gdb) p rail
> No symbol "rail" in current context.
> (gdb) p vc_ptr
> $1 = (MPIDI_VC_t *) 0x10013c60
> Current language:  auto; currently c
> (gdb) p vc_ptr->mrail
> $2 = {num_rails = 1, rails = 0x0, next_packet_expected = 0,
> next_packet_tosend = 0, outstanding_eager_vbufs = 0, coalesce_vbuf =
> 0x0, rfp = {RDMA_send_buf_DMA = 0x0, RDMA_recv_buf_DMA = 0x0,
>     RDMA_send_buf = 0x0, RDMA_recv_buf = 0x0, RDMA_send_buf_mr = {0x0,
> 0x0, 0x0, 0x0}, RDMA_recv_buf_mr = {0x0, 0x0, 0x0, 0x0},
> RDMA_remote_buf_rkey = {0, 0, 0, 0}, rdma_credit = 0 '\0',
>     remote_RDMA_buf = 0x0, phead_RDMA_send = 0, ptail_RDMA_send = 0,
> p_RDMA_recv = 0, p_RDMA_recv_tail = 0, eager_start_cnt = 0,
> in_polling_set = 0, cached_outgoing = 0x0, cached_incoming = 0x0,
>     cached_hit = 0, cached_miss = 0}, srp = {credits = 0x0}, cmanager =
> {num_channels = 0, num_local_pollings = 0, msg_channels = 0x0,
> next_arriving = 0x0, inqueue = 0, prev = 0x0, next = 0x0,
>     pending_vbuf = 0, vc = 0x0}, packetized_recv = 0x0, sreq_head = 0x0,
> sreq_tail = 0x0, nextflow = 0x0, inflow = 0, remote_vc_addr = 0}
> (gdb) p vc_ptr->mrail.rails
> $3 = (struct mrail_rail *) 0x0
> (gdb) bt
> #0  Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=<value
> optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10,
> remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c,
>     rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137
> #1  0x000000000044a09a in MPIDI_CH3I_RDMA_post (win_ptr=0x6e06a0,
> target_rank=0) at rdma_iba_1sc.c:476
> #2  0x000000000045f434 in MPIDI_Win_fence (assert=12288,
> win_ptr=0x6e06a0) at ch3u_rma_sync.c:165
> #3  0x000000000041fecd in PMPI_Win_fence (assert=12288, win=-1610612736)
> at win_fence.c:108
> #4  0x0000000000409dfc in hgc::OscPt2PtCommunicationGraph::sendP2M
> (this=0x10806650, list=@0x10278fe0) at comm/Window.hh:81
> #5  0x0000000000404a5d in main (argc=2, argv=0x7ffffac97398) at
> Scale4Bonn/scale.cc:129
>
>
> Obviously vc_ptr->mrail.rails is NULL. Can you help me to understand why?
>
> The relevant code snippet is
>
>         mWindow.fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE);
>         for(int k = 0; k < mTop.numprocs(); ++k) {
>                 if(1 == mMustResend[k]) {
>                         mWindow.put(&mSendBuf[k], 1, MPI_INT, k,
>                                 mLocalGroup.myrank(), 1, MPI_INT);
>                 }
>         }
>         mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED |
> MPI_MODE_NOPUT);
>
> and on the receiver side I just have
>
>         mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOPRECEDE);
>         mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED |
> MPI_MODE_NOPUT);
>
> mWindow is an instance of a wrapper class about MPI_Window, The
> functions put and fence directly map to MPI_Win_put and MPI_Win_fence ...
>
> For this test I used mvapich2 1.4 rc1 configured with
>
> ./configure --prefix=/home/kraused/mvapich2/1.4rc1/gcc-4.1.2/ CFLAGS=-O0
> -ggdb CXXFLAGS=-ggdb FCFLAGS=-ggdb
>
> Thanks for your help!
>
> Regards,
> Dorian
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list