[mvapich-discuss] out of registration memory when running graph500

Hari Subramoni subramoni.1 at osu.edu
Tue Aug 4 10:34:30 EDT 2015


Hello Sayan,

Apologies about the delay in getting back to you on this.

Can you please try disabling registration cache mechanism
(MV2_USE_LAZY_MEM_UNREGISTER=) and retry?

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-24000011.82

Do you have an idea about what the median message size for Graph500 is? If
it is small, can you try reducing the size of VBUF
(MV2_VBUF_TOTAL_SIZE=<value>) and see if it helps?

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-26300011.105

As a side note, I see that you're using a debug version of the MVAPICH2
build. This is not good for performance. If you are doing this runs to
measure performance, I would suggest that you use a build where debugging
is turned off.

Regards,
Hari.

On Sun, Aug 2, 2015 at 8:12 PM, Sayan Ghosh <sayandeep52 at gmail.com> wrote:

> Hi,
>
> I ran into some IB registration issues while trying to run the "toy"
> graph500 benchmark (one-sided, as well as 2-sided)[
> http://www.graph500.org/specifications#sec-3_4] on ALCF Cooley (
> https://www.alcf.anl.gov/user-guides/cooley). I am also setting
> MV2_IBA_HCA to "mlx5_0" as suggested here:
> https://www.alcf.anl.gov/user-guides/changes-tukey-cooley.
>
> Excerpt of error that I am getting:
>
> [9] 9600.0 MB was used for memory usage tracing!
> [6] 9600.0 MB was used for memory usage tracing!
> [src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 459] Cannot register vbuf
> region
> [cc016:mpi_rank_13][MRAILI_Get_Vbuf]
> src/mpid/ch3/channels/mrail/src/gen2/ibv_send.c:989: vbuf pool allocation
> failed: Cannot allocate memory (12)
>
> The MVAPICH2.2.1 user-guide (section 9.1, page 74) says to increase the
> OFED kernel module parameter (log_num_mtt) to twice the amount of physical
> memory, but I see Cooley's /etc/modprobe.d/mlx4_core.conf to be:
>
> options mlx4_core log_num_mtt=24 log_mtts_per_seg=4
>
> which means max registered memory is 2^24 * 2^4 * 4096 = 1 TB
>
> Please advise.
>
> MVAPICH version:
>
> MVAPICH2 Version:       2.1
> MVAPICH2 Release date:  Fri Apr 03 20:00:00 EDT 2015
> MVAPICH2 Device:        ch3:mrail
> MVAPICH2 configure:     --enable-shared --enable-debuginfo --enable-g=all
> --prefix=/soft/libraries/mpi/mvapich2-2.1/gccdbg
> MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -g -O2
> MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -g -O2
> MVAPICH2 F77:   gfortran -L/lib -L/lib   -g -O2
> MVAPICH2 FC:    gfortran   -g -O2
>
> Thank you,
> Sayan
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150804/ae3210f5/attachment.html>


More information about the mvapich-discuss mailing list