[mvapich-discuss] mvapich crash with error 12

Devendar Bureddy bureddy at cse.ohio-state.edu
Tue Aug 20 09:30:16 EDT 2013


Hi Ben

The verbs completion error 12 (IBV_WC_RETRY_EXC_ERR) is usually happens for
the following reasons

- bad QP attributes
- loose cable, bad HCA or a bad switch blade
- remote side is in a bad state
- heavy congestion in the network can causes this too

This error indicates that, In the low-level network transport, sender retry
counter(default: 7) was exceeded while trying to send this message. This
means that the remote side didn't send any Ack or Nack.

Is this error happening at the start of the application run (or) in the
middle of the run?

Can you set higher retry count with run-time parameter
 MV2_DEFAULT_RETRY_COUNT=16 or 32 (default:7, max:255) and see if that
helps?

-Devendar



On Tue, Aug 20, 2013 at 8:37 AM, Ben <Benjamin.M.Auer at nasa.gov> wrote:

>  I'm getting a random crash on occasion in a code with the message
>
> [0->6150] send desc error, wc_opcode=0
> [0->6150] wc.status=12, wc.wr_id=0x28193068, wc.opcode=0,
> vbuf->phead->type=25 = MPIDI_CH3_PKT_RNDV_REQ_TO_SEND
> [4979] Abort: [] Got completion with error 12, vendor code=0x81, dest
> rank=6150
>  at line 583 in file ibv_channel_manager.c
>
> I saw another post suggesting playing with the
>
> >* MV2_DEFAULT_TIME_OUT*>* MV2_DEFAULT_RETRY_COUNT*>* MV2_DEFAULT_RNR_RETRY
>
> *
>
> Although I don't see these as options in the user guide. Does any any more
> insight on what this error message means?
>
> I'm using mvapich 1.8.1
>
> --
> Ben Auer, PhD   SSAI, Scientific Programmer/Analyst
> NASA GSFC,  Global Modeling and Assimilation Office
> Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD  20771
> Phone: 301-286-9176               Fax: 301-614-6246
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>


-- 
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130820/6f23c0ff/attachment-0001.html


More information about the mvapich-discuss mailing list