[mvapich-discuss] dat_evd_dequeue erroneous condition is not
handled
Lei Chai
chai.15 at osu.edu
Wed Jul 9 15:01:47 EDT 2008
Hi Nilesh,
Thanks for the patch. It has been applied to the latest mvapich2 svn
trunk with minor enhancement.
Lei
nilesh awate wrote:
> Hi lei,
>
> i have created a small patch which take care of transport error;
> abort the mpi appliaction
> and come out of it.
> i have tried it on mvapich2-1.0.1 & mvapich2-1.0.3
>
> here is the patch
>
> ---
> orig_mvapich2-1.0.1/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c
> 2007-09-06 02:14:15.000000000 +0530
> +++
> mvapich2-1.0.1_patched/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c
> 2008-07-02 15:30:45.000000000 +0530
> @@ -455,6 +455,8 @@
> int i, j, needed;
> static int last_poll = 0;
> int type = T_CHANNEL_NO_ARRIVE;
> + int rank;
> + PMI_Get_rank(&rank);
>
> *vbuf_handle = NULL;
> for (i = last_poll, j = 0;
> @@ -467,6 +469,16 @@
> {
> DEBUG_PRINT ("[poll cq]: get complete queue entry\n");
> assert (event.event_number == DAT_DTO_COMPLETION_EVENT);
> +
> + /* Following is the patch to come out in case of fatal
> error like
> + DAT_DTO_ERR_TRANSPORT (occures when network
> disfunction) */
> +
> + if (event.event_data.dto_completion_event_data.status
> != DAT_DTO_SUCCESS)
> + {
> +
> udapl_error_abort(UDAPL_STATUS_ERR,"[%d]DAT_EVD_ERROR in
> Consume_signals %x \n",rank,
> +
> event.event_data.dto_completion_event_data.status);
> + }
> +
> sc = ((struct vbuf *) event.event_data.
>
> dto_completion_event_data.user_cookie.as_ptr)->desc;
> v = (vbuf *) ((aint_t) sc.cookie.as_ptr);
>
>
> regards
>
> Nilesh
>
>
> ----- Original Message ----
> From: LEI CHAI <chai.15 at osu.edu>
> To: nilesh awate <nilesh_awate at yahoo.com>
> Cc: MVAPICH2 <mvapich-discuss at cse.ohio-state.edu>
> Sent: Wednesday, 18 June, 2008 2:27:32 AM
> Subject: Re: [mvapich-discuss] dat_evd_dequeue erroneous condition is
> not handled
>
> Hi,
>
> We have never got the DAT_DTO_ERR_TRANSPORT error before. This error
> usually means the network has problem and is not functional well. I
> think a proper way to handle it is to report the error and abort the
> mpi program since it is kind of a fatal error.
>
> Lei
>
>
> ----- Original Message -----
> From: nilesh awate <nilesh_awate at yahoo.com>
> Date: Tuesday, June 17, 2008 10:58 am
> Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not
> handled
> To: MVAPICH2 <mvapich-discuss at cse.ohio-state.edu>
>
>
>
> > Hi All,
>
> > I am using mvapich2-1.0.1 over udapl stack.
>
> > I am getting DAT_DTO_ERR_TRANSPORT error at udapl level, but mpi
> application is not terminating with some error
>
> > as i browse through the code i observe following thing.
>
> > ret1 = dat_evd_dequeue (MPIDI_CH3I_RDMA_Process.cq_hndl[i], &event);
> > if (ret1 == DAT_SUCCESS)
> {
> > assert (event.event_number == DAT_DTO_COMPLETION_EVENT);
> > /* but there is no check for
> event.event_data.dto_completion_event_data.status */
> > . . . .
> > . . . .
>
> }
>
> > but above condition is handled in rdma_udapl_1sc.c file while dequeuing
>
> > what is expected behavior of mpi when udapl throws error like
> DAT_DTO_ERR_TRANSPORT ?
>
> > How this kind of error going to be handled at mpi level?
> > OR
> > How underlying udapl errors are reflected by mpi ?
>
> > I am using pallas as an application for testing purpose
>
> > waiting for reply
> > thanking
> > Nilesh
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
> > Bring your gang together. Do your thing. Find your favourite Yahoo!
> Group.
> <http://in.rd.yahoo.com/tagline_groups_9/*http://in.promos.yahoo.com/groups/>
>
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> ------------------------------------------------------------------------
> Bollywood, fun, friendship, sports and more. You name it, we have it.
> <http://in.rd.yahoo.com/tagline_groups_1/*http://in.promos.yahoo.com/groups/bestofyahoo/>
More information about the mvapich-discuss
mailing list