[mvapich-discuss] completion with error 12, vendor code=0x81

Devendar Bureddy bureddy at cse.ohio-state.edu
Tue Dec 21 11:01:05 EST 2010


Hi Aleksander

Thanks for sharing the problem. The below are the equivalent
environment options we had to set time_out and retry values for OFA in
mvapich2. Can you please try these options with different values to
see if you are able to run successfully.

MV2_DEFAULT_TIME_OUT
MV2_DEFAULT_RETRY_COUNT
MV2_DEFAULT_RNR_RETRY

Thanks
-Devendar

On Mon, Dec 20, 2010 at 1:31 PM,  <aleksander at clustervision.com> wrote:
> Hi all,
>
> I am getting the following error from mvapich2:
>
> [0<-131] recv desc error, wc_opcode=128
> [0->131] wc.status=12, wc.wr_id=0x7cbcb80, wc.opcode=128,
> vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND
> [221] Abort: [] Got completion with error 12, vendor code=0x81, dest
> rank=131
> at line 607 in file ibv_channel_manager.c As far as I know, error 12 is
> timeout.
>
> mvapich2 version: 1.5.1p1
> OFED: 1.5.1-mlnx9
> Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0
> 5GT/s - IB QDR / 10GigE]
> Westmere nodes (12 cores in total) 512 cores used for the run
> Transport: OFA-IB-CH3
>
> I had a similar error from Intel MPI used with DAPL and defining the
> following environment variables has solved the problem:
>        setenv DAPL_ACK_RETRY 7         /* IB RC Ack retry count */
>        setenv DAPL_ACK_TIMER 20        /* IB RC Ack retry timer */
> The above are taken from the DAPL release notes under "settings for large
> clusters".
>
> What would be the equivalent settings for OFA under mvapich2 and where to
> set them?
>
> Best regards,
> Aleksander
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>



More information about the mvapich-discuss mailing list