[mvapich-discuss] out of registration memory when running graph500

Sayan Ghosh sayandeep52 at gmail.com
Sun Aug 2 20:12:21 EDT 2015


Hi,

I ran into some IB registration issues while trying to run the "toy"
graph500 benchmark (one-sided, as well as 2-sided)[
http://www.graph500.org/specifications#sec-3_4] on ALCF Cooley (
https://www.alcf.anl.gov/user-guides/cooley). I am also setting MV2_IBA_HCA to
"mlx5_0" as suggested here:
https://www.alcf.anl.gov/user-guides/changes-tukey-cooley.

Excerpt of error that I am getting:

[9] 9600.0 MB was used for memory usage tracing!
[6] 9600.0 MB was used for memory usage tracing!
[src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 459] Cannot register vbuf
region
[cc016:mpi_rank_13][MRAILI_Get_Vbuf]
src/mpid/ch3/channels/mrail/src/gen2/ibv_send.c:989: vbuf pool allocation
failed: Cannot allocate memory (12)

The MVAPICH2.2.1 user-guide (section 9.1, page 74) says to increase the
OFED kernel module parameter (log_num_mtt) to twice the amount of physical
memory, but I see Cooley's /etc/modprobe.d/mlx4_core.conf to be:

options mlx4_core log_num_mtt=24 log_mtts_per_seg=4

which means max registered memory is 2^24 * 2^4 * 4096 = 1 TB

Please advise.

MVAPICH version:

MVAPICH2 Version:       2.1
MVAPICH2 Release date:  Fri Apr 03 20:00:00 EDT 2015
MVAPICH2 Device:        ch3:mrail
MVAPICH2 configure:     --enable-shared --enable-debuginfo --enable-g=all
--prefix=/soft/libraries/mpi/mvapich2-2.1/gccdbg
MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -g -O2
MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -g -O2
MVAPICH2 F77:   gfortran -L/lib -L/lib   -g -O2
MVAPICH2 FC:    gfortran   -g -O2

Thank you,
Sayan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150802/5eca476b/attachment.html>


More information about the mvapich-discuss mailing list