[mvapich-discuss] MVAPICH2 Hugepage Issue with RHEL 6.2

Tom Crockett twcroc at wm.edu
Wed Feb 20 12:04:41 EST 2013


Hi,

I'm trying to get MVAPICH2 1.9a2 running under Red Hat Enterprise Linux 
6.2, and keep getting kernel abort messages which are triggered with the 
first invocation of a multi-node MPI program after a host is booted. 
The complaint is:

    "Using mlock ulimits for SHM_HUGETLB deprecated"

A complete copy of the kernel trace is attached.

When the memlock settings in sysctl.conf are set to "unlimited", this 
abort sometimes crashes the node.  When the memlock limits are set to 
match the size of the hugepage allocation, then these do not appear to 
be fatal and the initial and subsequent MVAPICH2 programs run 
satisfactorily.  However, the abort messages are worrisome and somewhat 
annoying, so it would be nice to understand why this is occurring and 
what should be done about it.

MPI programs which run entirely within a single node (up to 8 cores in 
our present configuration) do not trigger this problem.

Here are some specifics about our setup:

Software:
    MVAPICH2 1.9a2
    Mellanox OFED 1.5.3-3.1.0
    RHEL 6.2 (kernel 2.6.32-220.el6.x86_64)
    PGI 11.10

Configuration:

    AnonHugePages:      2048 kB
    HugePages_Total:     512
    HugePages_Free:      512
    HugePages_Rsvd:        0
    HugePages_Surp:        0
    Hugepagesize:       2048 kB

    * soft memlock 1048576
    * hard memlock 1048576

Hardware:
    Dell C6100 w/ 2 x Xeon X5672, 64 GB mem.
    Mellanox ConnectX-2 VPI

We have a total of 64 nodes in this cluster.

Thanks for any insights that you can provide on this issue,

Tom Crockett

College of William and Mary               email:  twcroc at wm.edu
IT/High Performance Computing Group       phone:  (757) 221-2762
Jones Hall, Rm. 304A                      fax:    (757) 221-1321
P.O. Box 8795
Williamsburg, VA  23187-8795
-------------- next part --------------
An embedded message was scrubbed...
From: <user at monsoon.sciclone.wm.edu>
Subject: [abrt] full crash report
Date: Wed, 20 Feb 2013 10:18:46 -0500
Size: 4615
Url: http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20130220/0c593a3a/AttachedMessage-0001.mht


More information about the mvapich-discuss mailing list