[mvapich-discuss] VAPI_PROTOCOL_R3' failed

Sangamesh B forum.san at gmail.com
Tue May 27 07:00:03 EDT 2008


Hi All,

   A DL-POLY application job on a 5-node Infiniband setup cluster gave
following error:

infin_check: ch3_rndvtransfer.c:226: MPIDI_CH3_Rndv_transfer: Assertion
`rndv->protocol == VAPI_PROTOCOL_R3' failed.
infin_check: ch3_rndvtransfer.c:226: MPIDI_CH3_Rndv_transfer: Assertion
`rndv->protocol == VAPI_PROTOCOL_R3' failed.
infin_check: ch3_rndvtransfer.c:226: MPIDI_CH3_Rndv_transfer: Assertion
`rndv->protocol == VAPI_PROTOCOL_R3' failed.
rank 1 in job 20  compute-0-12.local_32785   caused collective abort of all
ranks
  exit status of rank 1: killed by signal 9

The job runs for 20-30 minutes and gives above error.

This is with mvapich2 + ofed-1.2.5.5 + Mellanox HCA's.

Any idea what might be the wrong?

Thanks,
Sangamesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20080527/9d68ec7e/attachment-0001.html


More information about the mvapich-discuss mailing list