[mvapich-discuss] QP failed: Cannot allocate memory
Riley, Douglas (AS)
Douglas.Riley at ngc.com
Thu Mar 22 13:07:09 EDT 2012
MVAPICH Team:
I'm currently using:
MVAPICH 1.2-SingleRail
Build-ID: 3635
My cluster has 6 nodes, each node with 48 AMD Opteron cores and each node with 192 GB RAM. I'm running RHEL 5.5 Linux version 2.6.35
My applications often use MVAPICH to significantly oversubscribe the available cores (288). Up to about -n 1200, all works fine under mpirun_rsh; however, at about -n 1250, I receive the terminal error:
QP failed: Cannot allocate memory
As described in the User Manual, I've increase the memlock to the maximum memory on each node; however, the problem persists. If I invoke the environment variable: VIADEV_USE_XRC=1, the error at startup doesn't appear; however the application code then hangs indefinitely (which occurs for either small or large MPI applications). XRC apparently may solve the issue; however, either my MVAPICH version was not built to support, or perhaps may hardware doesn't support it. The following is output from the IB adapter:
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.7.000
node_guid: 0002:c903:000b:9b1c
sys_image_guid: 0002:c903:000b:9b1f
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: MT_0D30110008
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffffe00
max_qp: 261056
max_qp_wr: 16351
device_cap_flags: 0x007c9c76
max_sge: 32
max_sge_rd: 0
max_cq: 65408
max_cqe: 4194303
max_mr: 524272
max_pd: 32764
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4176896
max_qp_init_rd_atom: 128
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA (1)
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 1
max_mcast_grp: 8192
max_mcast_qp_attach: 56
max_total_mcast_qp_attach: 458752
max_ah: 0
max_fmr: 0
max_srq: 65472
max_srq_wr: 16383
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 15
port: 1
state: PORT_ACTIVE (4)
max_mtu: 2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
link_layer: IB
max_msg_sz: 0x40000000
port_cap_flags: 0x0251086a
max_vl_num: 8 (4)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 128
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 5.0 Gbps (2)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:0002:c903:000b:9b1d
Any recommendations to enable larger number of MPI processes on my hardware would be most appreciated.
Many Thanks,
Doug
------------------------
Douglas J Riley, PhD
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20120322/8b3aec76/attachment-0001.html
More information about the mvapich-discuss
mailing list