[mvapich-discuss] crash of runs over InfiniBand

Dhabaleswar Panda panda at cse.ohio-state.edu
Mon Oct 19 09:48:48 EDT 2009


Iris - InfiniBand communication relies on pinning and registering
communication buffers (the associated memory) before communication can
take place. It appears that you are running out of memory that can be
pinned when running applications for a longer period of time. You can
carry out the second step and let us know whether the problem goes away or
not.

Thanks,

DK

On Mon, 19 Oct 2009, Iris Pernille Lohmann wrote:

> Dear list members,
>
> I am using MVAPICH 1.4 on a linux cluster. I have made some computations on 1 and 2 nodes using mpirun_rsh. When I run a relatively small computation, the run on 2 nodes works fine, whereas with a relatively large computation, the run on 2 nodes crashes (I get no error messages). Running on 1 node works fine.
>
> I am thinking that it may have something to do with memory, and in the User Guide section 9.3.4 there is a description on setting the soft memlock.
>
> In my limits.conf the soft memlock and hard memlock are already set to 6000000.
>
> Could the problem be that the second step mentioned in section 9.3.4, namely to add the following to /etc/init.d/sshd:
> ulimit -l <phys mem in KB>
>
> has not been done? What does it actually mean?
>
> Or can it be something completely different?
>
>
> Best regards,
>
> Iris Lohmann
>
>
>
>
>
> Iris Pernille Lohmann
>
> MSc, PhD
>
> Ports & Offshore Technology (POT)
>
>
>
> [cid:image001.gif at 01CA50A7.0EF6B450]
>
>
>
> DHI
>
> Agern Allé 5
>
> DK-2970 Hørsholm
>
> Denmark
>
>
>
> Tel:
>
>
>
> +45 4516 9200
>
> Direct:
>
>
>
> 45169427
>
>
>
> ipl at dhigroup.com
>
> www.dhigroup.com
>
>
>
> WATER  *  ENVIRONMENT  *  HEALTH
>
>
>
>




More information about the mvapich-discuss mailing list