[mvapich-discuss] mvapich crash with error 12
Devendar Bureddy
bureddy at cse.ohio-state.edu
Tue Aug 20 09:30:16 EDT 2013
Hi Ben
The verbs completion error 12 (IBV_WC_RETRY_EXC_ERR) is usually happens for
the following reasons
- bad QP attributes
- loose cable, bad HCA or a bad switch blade
- remote side is in a bad state
- heavy congestion in the network can causes this too
This error indicates that, In the low-level network transport, sender retry
counter(default: 7) was exceeded while trying to send this message. This
means that the remote side didn't send any Ack or Nack.
Is this error happening at the start of the application run (or) in the
middle of the run?
Can you set higher retry count with run-time parameter
MV2_DEFAULT_RETRY_COUNT=16 or 32 (default:7, max:255) and see if that
helps?
-Devendar
On Tue, Aug 20, 2013 at 8:37 AM, Ben <Benjamin.M.Auer at nasa.gov> wrote:
> I'm getting a random crash on occasion in a code with the message
>
> [0->6150] send desc error, wc_opcode=0
> [0->6150] wc.status=12, wc.wr_id=0x28193068, wc.opcode=0,
> vbuf->phead->type=25 = MPIDI_CH3_PKT_RNDV_REQ_TO_SEND
> [4979] Abort: [] Got completion with error 12, vendor code=0x81, dest
> rank=6150
> at line 583 in file ibv_channel_manager.c
>
> I saw another post suggesting playing with the
>
> >* MV2_DEFAULT_TIME_OUT*>* MV2_DEFAULT_RETRY_COUNT*>* MV2_DEFAULT_RNR_RETRY
>
> *
>
> Although I don't see these as options in the user guide. Does any any more
> insight on what this error message means?
>
> I'm using mvapich 1.8.1
>
> --
> Ben Auer, PhD SSAI, Scientific Programmer/Analyst
> NASA GSFC, Global Modeling and Assimilation Office
> Code 610.1, 8800 Greenbelt Rd, Greenbelt, MD 20771
> Phone: 301-286-9176 Fax: 301-614-6246
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
--
Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130820/6f23c0ff/attachment-0001.html
More information about the mvapich-discuss
mailing list