[mvapich-discuss] vbuf pool allocation failure

Mon Feb 11 15:58:27 EST 2013

Hi Adam

Good to know that it fixed your issue. I don't have any particular
reason for altering log_mtts_per_seg. I guess changing either
parameter should be fine.  The formula to compute the limit is some
thing like

2^log_num_mtt  x  2^log_mtts_per_seg * x PAGE_SIZE

I think you should be fine with log_num_mtt = 24 on your 128 GB systems.

-Devendar
On Mon, Feb 11, 2013 at 3:20 PM, Adam Coates <acoates at cs.stanford.edu> wrote:
> Hi Devendar,
>
> Thanks for the reply;  we might have it fixed over here, but just to
> sanity check (and to populate your discussion list for posterity):
>
> Some outputs from us:
>
> $ mpiname -a
> MVAPICH2 1.9a2 Thu Nov  8 11:43:52 EST 2012 ch3:mrail
> ..<snip>..
> Configuration
> --with-cuda=/usr/local/cuda-5.0 --enable-cuda
>
> $ ulimit -l
> unlimited
> [We're not using PBS, etc.;  so that should be good.]
>
> Following other notes on the list + your suggestion, we set
> log_num_mtt=24 for the mlx4_core module and reloaded the driver, which
> appears to have fixed our issue.
>
> Is there a reason to prefer altering log_mtts_per_seg?  The current value is:
> $ more /sys/module/mlx4_core/parameters/log_mtts_per_seg
> 0
>
> which I assume means it's defaulting to 3.  Our nodes have 128GB of
> memory, and 4 GPUs (16GB GPU memory total), so my guess is that the
> factor of 16 increase gained by altering log_num_mtt does not
> completely fix the issue.
>
> Thanks a lot for your help.
>
> Best,
> Adam
>
> On Mon, Feb 11, 2013 at 2:46 PM, Devendar Bureddy
> <bureddy at cse.ohio-state.edu> wrote:
>> Hi Brody
>>
>> It seems, it is hitting a limit on amount of memory that can be
>> registered with HCA.  Can you provide following details?
>>
>> - Is lockable memory set to unlimited on compute nodes?
>> $ ulimit -l
>> unlimited
>>
>> - How much RAM these nodes have? Can you check OFED parameter
>> log_mtts_per_seg. With most of the standard ofed installations,
>> default value of this parameter is '3'.  If your system has more then
>> 16GB, you need to set this parameter to '4'  or more.
>>
>> $ more /sys/module/mlx4_core/parameters/log_mtts_per_seg
>> 3
>>
>> - What is the size of cudaHostRegister() buffer which you mentioned?
>>
>> - What version of MVAPICH2 you are using? and configuration options ?
>>
>> -Devendar
>>
>> On Mon, Feb 11, 2013 at 2:01 PM, Brody Huval <brodyh at stanford.edu> wrote:
>>> Hi,
>>>
>>> Our job is running on 64 GPUs (64 MPI nodes) in a small cluster with
>>> ConnectX3 IB adapters.  We've been running into abort() calls that
>>> bring down the system after perhaps 10 or 15 minutes of running with
>>> the following error:
>>>
>>> [src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 540] Cannot register vbuf region
>>> [8] Abort: vbuf pool allocation failed at line 607 in file
>>> src/mpid/ch3/channels/mrail/src/gen2/vbuf.c
>>>
>>> Unfortunately, MV2_DEBUG_SHOW_BACKTRACE hasn't show us anything useful
>>> here, so we're still hunting for the call that's triggering this.  We
>>> have tried a suggested solution from the archives, setting
>>> MV2_USE_LAZY_MEM_UNREGISTER to 0, but this leads to an immediate
>>> crash.  Our code does not make significant use of pinned memory,
>>> though in the one place that we do use it, it is done with
>>> cudaHostRegister(), and this buffer is not touched by MPI.
>>>
>>> This problem cropped up just recently as we've moved to larger problem
>>> sizes (and thus larger message sizes).  Previous runs with smaller
>>> models have worked just fine.
>>>
>>> Do you have advice on how to find the problem, or a possible solution? Thank you in advance for any help.
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>>
>> --
>> Devendar

-- 
Devendar