[mvapich-discuss] completion with error 12, vendor code=0x81
aleksander at clustervision.com
aleksander at clustervision.com
Mon Dec 20 13:31:44 EST 2010
Hi all,
I am getting the following error from mvapich2:
[0<-131] recv desc error, wc_opcode=128
[0->131] wc.status=12, wc.wr_id=0x7cbcb80, wc.opcode=128,
vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND
[221] Abort: [] Got completion with error 12, vendor code=0x81, dest rank=131
at line 607 in file ibv_channel_manager.c As far as I know, error 12 is
timeout.
mvapich2 version: 1.5.1p1
OFED: 1.5.1-mlnx9
Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0
5GT/s - IB QDR / 10GigE]
Westmere nodes (12 cores in total) 512 cores used for the run
Transport: OFA-IB-CH3
I had a similar error from Intel MPI used with DAPL and defining the
following environment variables has solved the problem:
setenv DAPL_ACK_RETRY 7 /* IB RC Ack retry count */
setenv DAPL_ACK_TIMER 20 /* IB RC Ack retry timer */
The above are taken from the DAPL release notes under "settings for
large clusters".
What would be the equivalent settings for OFA under mvapich2 and where
to set them?
Best regards,
Aleksander
More information about the mvapich-discuss
mailing list