[mvapich-discuss] Abort: unable to register vbuf DMA buffer at
line 211 in file vbuf.c
Jeff Squyres
jsquyres at cisco.com
Sun Jan 27 20:26:33 EST 2008
Are you running through a resource manager (such as Torque/PBS, SLURM,
SGE/N1GE, LSF, ...etc.)?
Resource managers will usually have different limits for jobs launched
through queueing mechanisms vs. normal ssh-launched interactive
logins. A good test is to launch a job *through the resource manager*
than runs "ulimit -l" (or whatever flavor of ulimit is appropriate for
your shell) and see what value you get.
IIRC, the MVAPICH web pages/documentation have some good docs on how
to set the ulimit properly...? You might want to check those out for
some more details.
On Jan 27, 2008, at 8:07 PM, Ben Held wrote:
> Matt,
>
> We have set it to ulimited and are still seeing the same failure.
>
> Any other suggestions?
>
> Regards,
> Ben
>
> -----Original Message-----
> From: Matthew Koop [mailto:koop at cse.ohio-state.edu]
> Sent: Friday, January 18, 2008 6:55 PM
> To: Ben Held
> Cc: mvapich-discuss at cse.ohio-state.edu
> Subject: RE: [mvapich-discuss] Abort: unable to register vbuf DMA
> buffer at
> line 211 in file vbuf.c
>
> Ben,
>
> The maximum locked memory you are allowing on the system is lower
> than is
> expected. Can you try increasing that value to closer to the maximum
> memory of the node?
>
> Matt
>
> On Fri, 18 Jan 2008, Ben Held wrote:
>
>> Matt,
>>
>> The version of MVAPICH is mvapich_gcc-0.9.9-1458. I believe this
>> is part
> of
>> the OFED distro - it was installed by the manuf. Of the cluster.
>>
>> ulimit -l reports 131072 on all nodes.
>>
>> Ben
>> -----Original Message-----
>> From: Matthew Koop [mailto:koop at cse.ohio-state.edu]
>> Sent: Thursday, January 17, 2008 9:29 PM
>> To: Ben Held
>> Cc: mvapich-discuss at cse.ohio-state.edu
>> Subject: Re: [mvapich-discuss] Abort: unable to register vbuf DMA
>> buffer
> at
>> line 211 in file vbuf.c
>>
>> Ben,
>>
>> Sorry to hear about this issue. Can you give me some more details
>> on your
>> installation -- what distro are you using and is OFED being used?
>> Also,
>> what version of MVAPICH are you using?
>>
>> Additionally, what is the output of 'ulimit -l' on your system (or
>> equivalent shell command). You may want to check all nodes. Memory
>> registration generally does not fail unless the amount of lockable
>> memory
>> is too low.
>>
>> Matt
>>
>> On Thu, 17 Jan 2008, Ben Held wrote:
>>
>>> We have recently built our MPI application using MVAPICH1 under
>>> LINUX
> and
>>> are seeing certain runs fail (success or failure seems to be a
>>> function
> of
>>> the # of processes - 8 will work, 16 will fail, 32 will work, etc).
> This
>>> code has been thoroughly testing using the standard MPICH (Ethernet
> based)
>>> and LAM and everything is fine.
>>>
>>>
>>>
>>> Does this error:
>>>
>>>
>>>
>>> Abort: unable to register vbuf DMA buffer at line 211 in file vbuf.c
>>>
>>>
>>>
>>> Mean anything? This is a new cluster (8 node, 8 cores per node) has
> been
>>> tested under using stress tests provided by the cluster manufacturer
>>> (Microway). This is out of my area of expertise and this is the
>>> first
> IB
>>> based system I have worked on.
>>>
>>>
>>>
>>> Any thoughts?
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Ben Held
>>> Simulation Technology & Applied Research, Inc.
>>> 11520 N. Port Washington Rd., Suite 201
>>> Mequon, WI 53092
>>> P: 1.262.240.0291 x101
>>> F: 1.262.240.0294
>>> E: <mailto:ben.held at staarinc.com> ben.held at staarinc.com
>>> <http://www.staarinc.com/> http://www.staarinc.com
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
--
Jeff Squyres
Cisco Systems
More information about the mvapich-discuss
mailing list