[mvapich-discuss] Hardware problem or code bug?

Subramoni, Hari subramoni.1 at osu.edu
Fri Jul 24 13:02:41 EDT 2020


Thank you, Lana.

Does this occur with MVAPICH2-2.3.4 GA too?

Thx,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of Lana Deere
Sent: Friday, July 24, 2020 1:00 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] Hardware problem or code bug?

On Tue, Jul 21, 2020 at 1:53 PM Lana Deere <lana.deere at gmail.com<mailto:lana.deere at gmail.com>> wrote:
mlx5: compute-0-8.local: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000005 00000000 00000000 00000000
00000000 12006802 00004016 1d35add3
[compute-0-8.local:mpi_rank_6][handle_cqe] Send desc error in msg to 6, wc_opcode=0
[compute-0-8.local:mpi_rank_6][handle_cqe] Msg from 6: wc.status=2, wc.wr_id=0xc58e040, wc.opcode=0, vbuf->phead->type=0 = MPIDI_CH3_PKT_EAGER_SEND
[compute-0-8.local:mpi_rank_6][handle_cqe] src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got completion with error 2, vendor code=0x68, dest rank=6

I was able to get an translation of the vendor code 0x68:
vendor_code=0x68  malformed WQE (Work Queue Element)



.. Lana (lana.deere at gmail.com<mailto:lana.deere at gmail.com>)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200724/4d308f9a/attachment.html>


More information about the mvapich-discuss mailing list