[mvapich-discuss] segmentation falut in MPI_Win_fence with #PE = 96

Dorian Krause doriankrause at web.de
Wed Aug 12 12:04:09 EDT 2009


Hi,

thanks for the note. The error is not present with the trunk version!

Btw: I had a hard time to configure the trunk with autoconf-2.6.4 (which 
seems to be the newest version). The maint/updatefiles always failed in 
the F90 binding folder. You can see the error below:

~/mvapich2/src/binding/f90 kraused$ autoconf -I ../../../confdb
configure.in:73: error: AC_LANG_CONFTEST: unknown language: Fortran 90
autoconf/lang.m4:215: AC_LANG_CONFTEST is expanded from...
autoconf/general.m4:2585: _AC_COMPILE_IFELSE is expanded from...
../../lib/m4sugar/m4sh.m4:624: AS_IF is expanded from...
autoconf/general.m4:2033: AC_CACHE_VAL is expanded from...
autoconf/general.m4:2046: AC_CACHE_CHECK is expanded from...
fortran90.m4:332: AC_PROG_F90 is expanded from...
fortran90.m4:953: PAC_PROG_F90 is expanded from...
configure.in:73: the top level
autom4te: /usr/bin/m4 failed with exit status: 1

I solved by reverting to autoconf 2.6.3 ...

Should I post this to the mpich2 mailing list or is there a difference 
between the mpich2 and mvapich2 configure scripts?

Thanks,
Dorian


Dhabaleswar Panda wrote:
> Dorian,
>
> Thanks for your report. Do you see this error with the latest trunk
> version of MVAPICH2 1.4? After the RC1 release, some fixes have gone into
> the trunk. We are in preparation to bring out RC2.
>
> We will also be taking a look at this issue in the mean time.
>
> Thanks,
>
> DK
>
> On Tue, 11 Aug 2009, Dorian Krause wrote:
>
>   
>> Dear list members,
>>
>> I have a code which uses MPI_Put + MPI_Win_fence for communication. The
>> code runs fine with OpenMPI (tested for 8, 16, 32, 48, 64, 96 processors
>> without problems) and with mvapich for less than 96 processors (the
>> maximal number I have currently access to). The core I got shows me the
>> following:
>>
>> #0  Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=<value
>> optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10,
>> remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c,
>>     rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137
>> 1137            ++(vc_ptr->mrail.rails[rail].postsend_times_1sc);
>> (gdb) p rail
>> No symbol "rail" in current context.
>> (gdb) p vc_ptr
>> $1 = (MPIDI_VC_t *) 0x10013c60
>> Current language:  auto; currently c
>> (gdb) p vc_ptr->mrail
>> $2 = {num_rails = 1, rails = 0x0, next_packet_expected = 0,
>> next_packet_tosend = 0, outstanding_eager_vbufs = 0, coalesce_vbuf =
>> 0x0, rfp = {RDMA_send_buf_DMA = 0x0, RDMA_recv_buf_DMA = 0x0,
>>     RDMA_send_buf = 0x0, RDMA_recv_buf = 0x0, RDMA_send_buf_mr = {0x0,
>> 0x0, 0x0, 0x0}, RDMA_recv_buf_mr = {0x0, 0x0, 0x0, 0x0},
>> RDMA_remote_buf_rkey = {0, 0, 0, 0}, rdma_credit = 0 '\0',
>>     remote_RDMA_buf = 0x0, phead_RDMA_send = 0, ptail_RDMA_send = 0,
>> p_RDMA_recv = 0, p_RDMA_recv_tail = 0, eager_start_cnt = 0,
>> in_polling_set = 0, cached_outgoing = 0x0, cached_incoming = 0x0,
>>     cached_hit = 0, cached_miss = 0}, srp = {credits = 0x0}, cmanager =
>> {num_channels = 0, num_local_pollings = 0, msg_channels = 0x0,
>> next_arriving = 0x0, inqueue = 0, prev = 0x0, next = 0x0,
>>     pending_vbuf = 0, vc = 0x0}, packetized_recv = 0x0, sreq_head = 0x0,
>> sreq_tail = 0x0, nextflow = 0x0, inflow = 0, remote_vc_addr = 0}
>> (gdb) p vc_ptr->mrail.rails
>> $3 = (struct mrail_rail *) 0x0
>> (gdb) bt
>> #0  Post_Put_Put_Get_List (winptr=0x6e06a0, size=-1, dreg_tmp=<value
>> optimized out>, vc_ptr=0x10013c60, local_buf=0x7ffffac96e10,
>> remote_buf=0x7ffffac96e08, length=4, lkeys=0x7ffffac96e1c,
>>     rkeys=0x7ffffac96e18, use_multi=0) at rdma_iba_1sc.c:1137
>> #1  0x000000000044a09a in MPIDI_CH3I_RDMA_post (win_ptr=0x6e06a0,
>> target_rank=0) at rdma_iba_1sc.c:476
>> #2  0x000000000045f434 in MPIDI_Win_fence (assert=12288,
>> win_ptr=0x6e06a0) at ch3u_rma_sync.c:165
>> #3  0x000000000041fecd in PMPI_Win_fence (assert=12288, win=-1610612736)
>> at win_fence.c:108
>> #4  0x0000000000409dfc in hgc::OscPt2PtCommunicationGraph::sendP2M
>> (this=0x10806650, list=@0x10278fe0) at comm/Window.hh:81
>> #5  0x0000000000404a5d in main (argc=2, argv=0x7ffffac97398) at
>> Scale4Bonn/scale.cc:129
>>
>>
>> Obviously vc_ptr->mrail.rails is NULL. Can you help me to understand why?
>>
>> The relevant code snippet is
>>
>>         mWindow.fence(MPI_MODE_NOPUT | MPI_MODE_NOPRECEDE);
>>         for(int k = 0; k < mTop.numprocs(); ++k) {
>>                 if(1 == mMustResend[k]) {
>>                         mWindow.put(&mSendBuf[k], 1, MPI_INT, k,
>>                                 mLocalGroup.myrank(), 1, MPI_INT);
>>                 }
>>         }
>>         mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED |
>> MPI_MODE_NOPUT);
>>
>> and on the receiver side I just have
>>
>>         mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOPRECEDE);
>>         mWindow.fence(MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED |
>> MPI_MODE_NOPUT);
>>
>> mWindow is an instance of a wrapper class about MPI_Window, The
>> functions put and fence directly map to MPI_Win_put and MPI_Win_fence ...
>>
>> For this test I used mvapich2 1.4 rc1 configured with
>>
>> ./configure --prefix=/home/kraused/mvapich2/1.4rc1/gcc-4.1.2/ CFLAGS=-O0
>> -ggdb CXXFLAGS=-ggdb FCFLAGS=-ggdb
>>
>> Thanks for your help!
>>
>> Regards,
>> Dorian
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>     
>
>
>   



More information about the mvapich-discuss mailing list