[mvapich-discuss] dat_evd_dequeue erroneous condition is not handled

Lei Chai chai.15 at osu.edu
Wed Jul 9 15:01:47 EDT 2008


Hi Nilesh,

Thanks for the patch. It has been applied to the latest mvapich2 svn 
trunk with minor enhancement.

Lei


nilesh awate wrote:
> Hi lei,
>
> i have created a small patch which take care of transport error;  
> abort the mpi appliaction
> and come out of it.
> i have tried it on mvapich2-1.0.1 & mvapich2-1.0.3
>
> here is the patch
>
> --- 
> orig_mvapich2-1.0.1/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c       
> 2007-09-06 02:14:15.000000000 +0530
> +++ 
> mvapich2-1.0.1_patched/src/mpid/osu_ch3/channels/mrail/src/udapl/udapl_channel_manager.c    
> 2008-07-02 15:30:45.000000000 +0530
> @@ -455,6 +455,8 @@
>      int i, j, needed;
>      static int last_poll = 0;
>      int type = T_CHANNEL_NO_ARRIVE;
> +    int rank;
> +    PMI_Get_rank(&rank);
>
>      *vbuf_handle = NULL;
>      for (i = last_poll, j = 0;
> @@ -467,6 +469,16 @@
>              {
>                  DEBUG_PRINT ("[poll cq]: get complete queue entry\n");
>                  assert (event.event_number == DAT_DTO_COMPLETION_EVENT);
> +
> +               /* Following is the patch to come out in case of fatal 
> error like
> +                   DAT_DTO_ERR_TRANSPORT (occures when network 
> disfunction) */
> +
> +               if (event.event_data.dto_completion_event_data.status 
> != DAT_DTO_SUCCESS)
> +               {
> +                      
> udapl_error_abort(UDAPL_STATUS_ERR,"[%d]DAT_EVD_ERROR in 
> Consume_signals %x  \n",rank,
> +                                        
> event.event_data.dto_completion_event_data.status);
> +                }
> +
>                  sc = ((struct vbuf *) event.event_data.
>                        
> dto_completion_event_data.user_cookie.as_ptr)->desc;
>                  v = (vbuf *) ((aint_t) sc.cookie.as_ptr);
>
>
> regards
>
> Nilesh
>
>
> ----- Original Message ----
> From: LEI CHAI <chai.15 at osu.edu>
> To: nilesh awate <nilesh_awate at yahoo.com>
> Cc: MVAPICH2 <mvapich-discuss at cse.ohio-state.edu>
> Sent: Wednesday, 18 June, 2008 2:27:32 AM
> Subject: Re: [mvapich-discuss] dat_evd_dequeue erroneous condition is 
> not handled
>
> Hi,
>  
> We have never got the DAT_DTO_ERR_TRANSPORT error before. This error 
> usually means the network has problem and is not functional well. I 
> think a proper way to handle it is to report the error and abort the 
> mpi program since it is kind of a fatal error.
>  
> Lei  
>
>
> ----- Original Message -----
> From: nilesh awate <nilesh_awate at yahoo.com>
> Date: Tuesday, June 17, 2008 10:58 am
> Subject: [mvapich-discuss] dat_evd_dequeue erroneous condition is not 
> handled
> To: MVAPICH2 <mvapich-discuss at cse.ohio-state.edu>
>
>
>
> > Hi All,
>
> > I am using mvapich2-1.0.1 over udapl stack.
>
> > I am getting DAT_DTO_ERR_TRANSPORT error at udapl level, but mpi 
> application is not terminating with some error
>
> > as i browse through the code i observe following thing.
>
> > ret1 = dat_evd_dequeue (MPIDI_CH3I_RDMA_Process.cq_hndl[i], &event);
> > if (ret1 == DAT_SUCCESS)
> {
> > assert (event.event_number == DAT_DTO_COMPLETION_EVENT);
> > /* but there is no check for 
> event.event_data.dto_completion_event_data.status */
> > . . . .
> > . . . .
>
> }
>
> > but above condition is handled in rdma_udapl_1sc.c file while dequeuing
>
> > what is expected behavior of mpi when udapl throws error like 
> DAT_DTO_ERR_TRANSPORT ?
>
> > How this kind of error going to be handled at mpi level?
> > OR
> > How underlying udapl errors are reflected by mpi ?
>
> > I am using pallas as an application for testing purpose
>
> > waiting for reply
> > thanking
> > Nilesh
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
> > Bring your gang together. Do your thing. Find your favourite Yahoo! 
> Group. 
> <http://in.rd.yahoo.com/tagline_groups_9/*http://in.promos.yahoo.com/groups/>
>
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> ------------------------------------------------------------------------
> Bollywood, fun, friendship, sports and more. You name it, we have it. 
> <http://in.rd.yahoo.com/tagline_groups_1/*http://in.promos.yahoo.com/groups/bestofyahoo/>



More information about the mvapich-discuss mailing list