[mvapich-discuss] [../src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 397] Cannot register vbuf region

Deva devendar.bureddy at gmail.com
Mon Dec 16 01:19:29 EST 2013


Can you try MV2_USE_LAZY_MEM_UNREGISTER=0 ? This should reduce memory
registration overhead to some extent.

-Devendar


On Sun, Dec 15, 2013 at 6:52 PM, Jeff Hammond <jeff.science at gmail.com>wrote:

> So there's nothing I can do in userspace?  I've requested the
> sysadmins change the IB settings, but since the machine I'm using
> shares its IB network with the GFPS servers for Mira
> [https://www.alcf.anl.gov/mira], they might balk at it.
>
> Jeff
>
> On Sun, Dec 15, 2013 at 3:42 PM, Deva <devendar.bureddy at gmail.com> wrote:
> > Jeff,
> >
> > This could be related to OFED memory registration limits( log_num_mtt,
> > log_mtts_per_seg).  Similar issue was discussed here :
> >
> http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2013-February/004261.html
> .
> > Can you verify this solution?
> >
> > Few details from user guide on these OFED parameters:
> >
> http://mvapich.cse.ohio-state.edu/support/user_guide_mvapich2-2.0b.html#x1-1130009.1.1
> >
> >
> > -Devendar
> >
> >
> >
> >
> >
> >
> >
> > On Sun, Dec 15, 2013 at 9:39 AM, Jeff Hammond <jeff.science at gmail.com>
> > wrote:
> >>
> >> I am running NWChem using ARMCI over MPI-3 RMA
> >> [http://git.mpich.org/armci-mpi.git/shortlog/refs/heads/mpi3rma].  Two
> >> attempts to run a relatively large job failed as follows:
> >>
> >> [../src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 397] Cannot register
> vbuf
> >> region
> >> [vs9:mpi_rank_350][get_vbuf]
> >> ../src/mpid/ch3/channels/mrail/src/gen2/vbuf.c:798: VBUF reagion
> >> allocation failed. Pool size 640
> >> : Cannot allocate memory (12)
> >>
> >> [../src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 397] Cannot register
> vbuf
> >> region
> >> [vs28:mpi_rank_8][get_vbuf]
> >> ../src/mpid/ch3/channels/mrail/src/gen2/vbuf.c:798: VBUF reagion
> >> allocation failed. Pool size 4736
> >> : Cannot allocate memory (12)
> >>
> >> NWChem is attempting to allocate a relatively large amount of memory
> >> using MPI_Win_allocate, so it doesn't surprise me that this happens.
> >> However, it is not entirely clear if the problem is that generic
> >> memory allocation has failed, i.e. malloc (or equivalent) returned
> >> NULL, or if something related to IB has been exhausted, e.g. ib_reg_mr
> >> has failed.
> >>
> >> If this is not just a simple out-of-memory error, can you suggest
> >> environment variables or source changes (in ARMCI-MPI, not MVAPICH2)
> >> that might alleviate these problems?  I don't know that the installed
> >> Linux has large page support and I can't readily request a new OS
> >> image, but I can switch machines if this is likely to have a positive
> >> impact.
> >>
> >> These are the MVAPICH installation details:
> >>
> >> $ /home/jhammond/TUKEY/MPI/mv2-trunk-gcc/bin/mpichversion
> >> MVAPICH2 Version:       2.0b
> >> MVAPICH2 Release date:  unreleased development copy
> >> MVAPICH2 Device:        ch3:mrail
> >> MVAPICH2 configure:     CC=gcc CXX=g++ --enable-fc FC=gfortran
> >> --enable-f77 F77=gfortran --with-pm=hydra --enable-mcast
> >> --enable-static --prefix=/home/jhammond/TUKEY/MPI/mv2-trunk-gcc
> >> MVAPICH2 CC:    gcc    -DNDEBUG -DNVALGRIND -O2
> >> MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
> >> MVAPICH2 F77:   gfortran   -O2
> >> MVAPICH2 FC:    gfortran   -O2
> >>
> >> I looked at the code and it seems that there might be a way to fix
> >> this, but obviously I'll have to wait for you all to do it.
> >>
> >>     /*
> >>      * It will often be possible for higher layers to recover
> >>      * when no vbuf is available, but waiting for more descriptors
> >>      * to complete. For now, just abort.
> >>      */
> >>     if (NULL == free_vbuf_head)
> >>     {
> >>         if(allocate_vbuf_region(rdma_vbuf_secondary_pool_size) != 0) {
> >>             ibv_va_error_abort(GEN_EXIT_ERR,
> >>                 "VBUF reagion allocation failed. Pool size %d\n",
> >> vbuf_n_allocated);
> >>         }
> >>     }
> >>
> >> Thanks!
> >>
> >> Jeff
> >>
> >> --
> >> Jeff Hammond
> >> jeff.science at gmail.com
> >>
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >
> >
> >
> >
> > --
> >
> >
> > -Devendar
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
>



-- 


-Devendar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20131215/8e93750c/attachment-0001.html>


More information about the mvapich-discuss mailing list