[mvapich-discuss] Hardware problem or code bug?

Lana Deere lana.deere at gmail.com
Fri Jul 24 12:59:34 EDT 2020


On Tue, Jul 21, 2020 at 1:53 PM Lana Deere <lana.deere at gmail.com> wrote:

> mlx5: compute-0-8.local: got completion with error:
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000005 00000000 00000000 00000000
> 00000000 12006802 00004016 1d35add3
> [compute-0-8.local:mpi_rank_6][handle_cqe] Send desc error in msg to 6,
> wc_opcode=0
> [compute-0-8.local:mpi_rank_6][handle_cqe] Msg from 6: wc.status=2,
> wc.wr_id=0xc58e040, wc.opcode=0, vbuf->phead->type=0 =
> MPIDI_CH3_PKT_EAGER_SEND
> [compute-0-8.local:mpi_rank_6][handle_cqe]
> src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got
> completion with error 2, vendor code=0x68, dest rank=6
>

I was able to get an translation of the vendor code 0x68:

> vendor_code=0x68  malformed WQE (Work Queue Element)
>



.. Lana (lana.deere at gmail.com)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200724/74c84f62/attachment.html>


More information about the mvapich-discuss mailing list