[mvapich-discuss] Abort: unable to register vbuf DMA buffer at line 211 in file vbuf.c

Jeff Squyres jsquyres at cisco.com
Sun Jan 27 20:26:33 EST 2008


Are you running through a resource manager (such as Torque/PBS, SLURM,  
SGE/N1GE, LSF, ...etc.)?

Resource managers will usually have different limits for jobs launched  
through queueing mechanisms vs. normal ssh-launched interactive  
logins.  A good test is to launch a job *through the resource manager*  
than runs "ulimit -l" (or whatever flavor of ulimit is appropriate for  
your shell) and see what value you get.

IIRC, the MVAPICH web pages/documentation have some good docs on how  
to set the ulimit properly...?  You might want to check those out for  
some more details.



On Jan 27, 2008, at 8:07 PM, Ben Held wrote:

> Matt,
>
> We have set it to ulimited and are still seeing the same failure.
>
> Any other suggestions?
>
> Regards,
> Ben
>
> -----Original Message-----
> From: Matthew Koop [mailto:koop at cse.ohio-state.edu]
> Sent: Friday, January 18, 2008 6:55 PM
> To: Ben Held
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: RE: [mvapich-discuss] Abort: unable to register vbuf DMA  
> buffer at
> line 211 in file vbuf.c
>
> Ben,
>
> The maximum locked memory you are allowing on the system is lower  
> than is
> expected. Can you try increasing that value to closer to the maximum
> memory of the node?
>
> Matt
>
> On Fri, 18 Jan 2008, Ben Held wrote:
>
>> Matt,
>>
>> The version of MVAPICH is mvapich_gcc-0.9.9-1458.  I believe this  
>> is part
> of
>> the OFED distro - it was installed by the manuf. Of the cluster.
>>
>> ulimit -l reports 131072 on all nodes.
>>
>> Ben
>> -----Original Message-----
>> From: Matthew Koop [mailto:koop at cse.ohio-state.edu]
>> Sent: Thursday, January 17, 2008 9:29 PM
>> To: Ben Held
>> Cc: mvapich-discuss at cse.ohio-state.edu
>> Subject: Re: [mvapich-discuss] Abort: unable to register vbuf DMA  
>> buffer
> at
>> line 211 in file vbuf.c
>>
>> Ben,
>>
>> Sorry to hear about this issue. Can you give me some more details  
>> on your
>> installation -- what distro are you using and is OFED being used?   
>> Also,
>> what version of MVAPICH are you using?
>>
>> Additionally, what is the output of 'ulimit -l' on your system (or
>> equivalent shell command). You may want to check all nodes. Memory
>> registration generally does not fail unless the amount of lockable  
>> memory
>> is too low.
>>
>> Matt
>>
>> On Thu, 17 Jan 2008, Ben Held wrote:
>>
>>> We have recently built our MPI application using MVAPICH1 under  
>>> LINUX
> and
>>> are seeing certain runs fail (success or failure seems to be a  
>>> function
> of
>>> the # of processes - 8 will work, 16 will fail, 32 will work, etc).
> This
>>> code has been thoroughly testing using the standard MPICH (Ethernet
> based)
>>> and LAM and everything is fine.
>>>
>>>
>>>
>>> Does this error:
>>>
>>>
>>>
>>> Abort: unable to register vbuf DMA buffer at line 211 in file vbuf.c
>>>
>>>
>>>
>>> Mean anything?  This is a new cluster (8 node, 8 cores per node) has
> been
>>> tested under using stress tests provided by the cluster manufacturer
>>> (Microway).  This is out of my area of expertise and this is the  
>>> first
> IB
>>> based system I have worked on.
>>>
>>>
>>>
>>> Any thoughts?
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Ben Held
>>> Simulation Technology & Applied Research, Inc.
>>> 11520 N. Port Washington Rd., Suite 201
>>> Mequon, WI 53092
>>> P: 1.262.240.0291 x101
>>> F: 1.262.240.0294
>>> E:  <mailto:ben.held at staarinc.com> ben.held at staarinc.com
>>> <http://www.staarinc.com/> http://www.staarinc.com
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss


-- 
Jeff Squyres
Cisco Systems



More information about the mvapich-discuss mailing list