[mvapich-discuss] Hardware problem or code bug?

Lana Deere lana.deere at gmail.com
Fri Jul 24 13:14:27 EDT 2020


I have not been able to try with 2.3.4 yet.  It will be a bit before I get
the chance, I think, because our cluster is currently in use doing a lot of
QA for a release.

.. Lana (lana.deere at gmail.com)




On Fri, Jul 24, 2020 at 1:02 PM Subramoni, Hari <subramoni.1 at osu.edu> wrote:

> Thank you, Lana.
>
>
>
> Does this occur with MVAPICH2-2.3.4 GA too?
>
>
>
> Thx,
> Hari.
>
>
>
> *From:* mvapich-discuss-bounces at cse.ohio-state.edu <
> mvapich-discuss-bounces at mailman.cse.ohio-state.edu> *On Behalf Of *Lana
> Deere
> *Sent:* Friday, July 24, 2020 1:00 PM
> *To:* mvapich-discuss at cse.ohio-state.edu <
> mvapich-discuss at mailman.cse.ohio-state.edu>
> *Subject:* Re: [mvapich-discuss] Hardware problem or code bug?
>
>
>
> On Tue, Jul 21, 2020 at 1:53 PM Lana Deere <lana.deere at gmail.com> wrote:
>
> mlx5: compute-0-8.local: got completion with error:
> 00000000 00000000 00000000 00000000
> 00000000 00000000 00000000 00000000
> 00000005 00000000 00000000 00000000
> 00000000 12006802 00004016 1d35add3
> [compute-0-8.local:mpi_rank_6][handle_cqe] Send desc error in msg to 6,
> wc_opcode=0
> [compute-0-8.local:mpi_rank_6][handle_cqe] Msg from 6: wc.status=2,
> wc.wr_id=0xc58e040, wc.opcode=0, vbuf->phead->type=0 =
> MPIDI_CH3_PKT_EAGER_SEND
> [compute-0-8.local:mpi_rank_6][handle_cqe]
> src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:548: [] Got
> completion with error 2, vendor code=0x68, dest rank=6
>
>
>
> I was able to get an translation of the vendor code 0x68:
>
> vendor_code=0x68  malformed WQE (Work Queue Element)
>
>
>
>
>
>
> .. Lana (lana.deere at gmail.com)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200724/a2857e5b/attachment-0001.html>


More information about the mvapich-discuss mailing list