[mvapich-discuss] infrequent error in ibv_channel_manager

Fri Mar 10 17:07:25 EST 2017

Hi Hari,

On 03/10/2017 02:34 PM, Hari Subramoni wrote:
> Thank you for the details. Can you also see if there is a segfault
> happening at any process causing this failure?

No evidence of such that I can find. You should know that the 
application that's failing is part of a streaming data acquisition 
system that runs continuously. I usually have only log files to look at 
after the fact, occasionally a core file, but not this time. You did get 
me looking at a few more log files, and I noticed that the failures 
coincide with times that a new MPI job was starting. The vast majority 
of times, a job start isn't coincident with any failure, but the last 
two failures did occur at such times. It's just something I happened to 
notice, and may not be significant.

> Output of "ibv_devinfo -v" will help.

Here it is:
> $ ibv_devinfo -v
> hca_id:	mlx4_0
> 	transport:			InfiniBand (0)
> 	fw_ver:				2.9.1000
> 	node_guid:			0002:c903:0028:25ca
> 	sys_image_guid:			0002:c903:0028:25cd
> 	vendor_id:			0x02c9
> 	vendor_part_id:			26428
> 	hw_ver:				0xB0
> 	board_id:			MT_0D90110009
> 	phys_port_cnt:			1
> 	max_mr_size:			0xffffffffffffffff
> 	page_size_cap:			0xfffffe00
> 	max_qp:				163256
> 	max_qp_wr:			16351
> 	device_cap_flags:		0x007c9c76
> 	max_sge:			32
> 	max_sge_rd:			0
> 	max_cq:				65408
> 	max_cqe:			4194303
> 	max_mr:				524272
> 	max_pd:				32764
> 	max_qp_rd_atom:			16
> 	max_ee_rd_atom:			0
> 	max_res_rd_atom:		2612096
> 	max_qp_init_rd_atom:		128
> 	max_ee_init_rd_atom:		0
> 	atomic_cap:			ATOMIC_HCA (1)
> 	max_ee:				0
> 	max_rdd:			0
> 	max_mw:				0
> 	max_raw_ipv6_qp:		0
> 	max_raw_ethy_qp:		0
> 	max_mcast_grp:			8192
> 	max_mcast_qp_attach:		248
> 	max_total_mcast_qp_attach:	2031616
> 	max_ah:				0
> 	max_fmr:			0
> 	max_srq:			65472
> 	max_srq_wr:			16383
> 	max_srq_sge:			31
> 	max_pkeys:			128
> 	local_ca_ack_delay:		15
> 		port:	1
> 			state:			PORT_ACTIVE (4)
> 			max_mtu:		4096 (5)
> 			active_mtu:		4096 (5)
> 			sm_lid:			3
> 			port_lid:		28
> 			port_lmc:		0x00
> 			link_layer:		InfiniBand
> 			max_msg_sz:		0x40000000
> 			port_cap_flags:		0x02510868
> 			max_vl_num:		4 (3)
> 			bad_pkey_cntr:		0x0
> 			qkey_viol_cntr:		0x0
> 			sm_sl:			0
> 			pkey_tbl_len:		128
> 			gid_tbl_len:		128
> 			subnet_timeout:		18
> 			init_type_reply:	0
> 			active_width:		4X (2)
> 			active_speed:		10.0 Gbps (4)
> 			phys_state:		LINK_UP (5)
> 			GID[  0]:		fe80:0000:0000:0000:0002:c903:0028:25cb

>
> Regards,
> Hari.
>
> On Fri, Mar 10, 2017 at 11:57 AM, Martin Pokorny <mpokorny at nrao.edu
> <mailto:mpokorny at nrao.edu>> wrote:
>
>     Hi Hari,
>
>     Please see below for my comments.
>
>     On 03/10/2017 09:37 AM, Hari Subramoni wrote:
>
>         Sorry to hear that you're facing issues.
>
>         Event 3 is IBV_EVENT_QP_ACCESS_ERR. From the man pages, this can be
>         caused because of one of the following reasons
>
>          1. Misaligned atomic request
>          2. Too many RDMA Read or Atomic requests
>          3. R_Key violation
>          4. Length errors without immediate data
>
>         Out of these, #2 could be related to the application communication
>         pattern. Do you think the application is issuing several
>         back-to-back large message send operations of MPI3-RMA operations?
>
>
>     The majority of MPI traffic is from MPI-IO. I don't recall seeing
>     lots of RMA operations in the source code of the Lustre ADIO module
>     (with which I'm somewhat familiar), but I'll have another look at that.
>
>         For the others, it could be some issue inside the MVAPICH2 library.
>         Since you're using MVAPICH2-2.1, which is more than a year old,
>         may I
>         request that you retry the application with MVAPICH2-2.2-GA?
>         We've fixed
>         several issues since MVAPICH2-2.1 which is available in
>         MVAPICH2-2.2GA.
>
>
>     That's on my list of things to try, but it will have to wait until I
>     can get some testing time, meaning mid next week at the earliest.
>
>         Could you give us some more details about the underlying IB fabric?
>
>
>     Sure -- what sorts of details might be useful?
>
>
>         Regards,
>         Hari.
>
>         On Fri, Mar 10, 2017 at 11:06 AM, Martin Pokorny
>         <mpokorny at nrao.edu <mailto:mpokorny at nrao.edu>
>         <mailto:mpokorny at nrao.edu <mailto:mpokorny at nrao.edu>>> wrote:
>
>             We've recently been seeing the following sorts of errors at
>         a small
>             yet noticeable rate
>
>                 [cbe-node-24:mpi_rank_9][async_thread]
>
>         ../src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:1152:
>                 Got FATAL event 3
>                 : Invalid argument (22)
>                 [cbe-node-28:mpi_rank_24][handle_cqe] Send desc error in
>         msg to
>                 9, wc_opcode=0
>                 [cbe-node-28:mpi_rank_24][handle_cqe] Msg from 9:
>         wc.status=10,
>                 wc.wr_id=0x249f5b0, wc.opcode=0, vbuf->phead->type=4 =
>                 MPIDI_CH3_PKT_RPUT_FINISH
>                 [cbe-node-28:mpi_rank_24][handle_cqe]
>
>         ../src/mpid/ch3/channels/mrail/src/gen2/ibv_channel_manager.c:587:
>                 [] Got completion with error 10, vendor code=0x88, dest
>         rank=9
>
>
>             Unfortunately, I can't send the source for the program that is
>             experiencing this error, nor am I able to come up with a simpler
>             reproducer. I'm hoping that perhaps you might have some
>         advice for
>             helping me diagnose the cause of the error. For example is there
>             some environment variable that might be worth looking at?
>
>             I'm using mvapich2-2.1 on a cluster with IB network. I built
>             mvapich2 as follows:
>             ../configure --enable-romio --with-file-system=lustre
>             --enable-debuginfo --enable-g=dbg,log --with-limic2
>         --enable-rdma-cm
>
>             --
>             Martin Pokorny
>             Software Engineer
>             Jansky Very Large Array correlator backend and CASA software
>             National Radio Astronomy Observatory - New Mexico Operations
>             _______________________________________________
>             mvapich-discuss mailing list
>             mvapich-discuss at cse.ohio-state.edu
>         <mailto:mvapich-discuss at cse.ohio-state.edu>
>             <mailto:mvapich-discuss at cse.ohio-state.edu
>         <mailto:mvapich-discuss at cse.ohio-state.edu>>
>
>         http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>         <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>
>
>         <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>         <http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss>>
>
>
>
>
>     --
>     Martin Pokorny
>     Software Engineer
>     Jansky Very Large Array correlator backend and CASA software
>     National Radio Astronomy Observatory - New Mexico Operations
>
>

-- 
Martin Pokorny
Software Engineer
Jansky Very Large Array correlator back-end and CASA software
National Radio Astronomy Observatory - New Mexico Operations