[mvapich-discuss] out of registration memory when running graph500
Hari Subramoni
subramoni.1 at osu.edu
Tue Aug 4 10:34:30 EDT 2015
Hello Sayan,
Apologies about the delay in getting back to you on this.
Can you please try disabling registration cache mechanism
(MV2_USE_LAZY_MEM_UNREGISTER=) and retry?
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-24000011.82
Do you have an idea about what the median message size for Graph500 is? If
it is small, can you try reducing the size of VBUF
(MV2_VBUF_TOTAL_SIZE=<value>) and see if it helps?
http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-26300011.105
As a side note, I see that you're using a debug version of the MVAPICH2
build. This is not good for performance. If you are doing this runs to
measure performance, I would suggest that you use a build where debugging
is turned off.
Regards,
Hari.
On Sun, Aug 2, 2015 at 8:12 PM, Sayan Ghosh <sayandeep52 at gmail.com> wrote:
> Hi,
>
> I ran into some IB registration issues while trying to run the "toy"
> graph500 benchmark (one-sided, as well as 2-sided)[
> http://www.graph500.org/specifications#sec-3_4] on ALCF Cooley (
> https://www.alcf.anl.gov/user-guides/cooley). I am also setting
> MV2_IBA_HCA to "mlx5_0" as suggested here:
> https://www.alcf.anl.gov/user-guides/changes-tukey-cooley.
>
> Excerpt of error that I am getting:
>
> [9] 9600.0 MB was used for memory usage tracing!
> [6] 9600.0 MB was used for memory usage tracing!
> [src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 459] Cannot register vbuf
> region
> [cc016:mpi_rank_13][MRAILI_Get_Vbuf]
> src/mpid/ch3/channels/mrail/src/gen2/ibv_send.c:989: vbuf pool allocation
> failed: Cannot allocate memory (12)
>
> The MVAPICH2.2.1 user-guide (section 9.1, page 74) says to increase the
> OFED kernel module parameter (log_num_mtt) to twice the amount of physical
> memory, but I see Cooley's /etc/modprobe.d/mlx4_core.conf to be:
>
> options mlx4_core log_num_mtt=24 log_mtts_per_seg=4
>
> which means max registered memory is 2^24 * 2^4 * 4096 = 1 TB
>
> Please advise.
>
> MVAPICH version:
>
> MVAPICH2 Version: 2.1
> MVAPICH2 Release date: Fri Apr 03 20:00:00 EDT 2015
> MVAPICH2 Device: ch3:mrail
> MVAPICH2 configure: --enable-shared --enable-debuginfo --enable-g=all
> --prefix=/soft/libraries/mpi/mvapich2-2.1/gccdbg
> MVAPICH2 CC: gcc -DNDEBUG -DNVALGRIND -g -O2
> MVAPICH2 CXX: g++ -DNDEBUG -DNVALGRIND -g -O2
> MVAPICH2 F77: gfortran -L/lib -L/lib -g -O2
> MVAPICH2 FC: gfortran -g -O2
>
> Thank you,
> Sayan
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150804/ae3210f5/attachment.html>
More information about the mvapich-discuss
mailing list