[mvapich-discuss] Simple problem with MVAPICH2-X 2.3-1? "VBUF CUDA region allocation failed"

Fri Jun 19 10:53:40 EDT 2020

Hi, Andy.

The SLURM-based RPM we built was for PMI1. I am not sure if that is having an impact. Can you please check if you're able to run MVAPICH2 on these nodes?

I will follow up with you offline on this.

Thx,
Hari.

From: Riebs, Andy <andy.riebs at hpe.com>
Sent: Friday, June 19, 2020 9:05 AM
To: Subramoni, Hari <subramoni.1 at osu.edu>; mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: RE: [mvapich-discuss] Simple problem with MVAPICH2-X 2.3-1? "VBUF CUDA region allocation failed"

[Apologies if this is a duplicate. The first 3 iterations over the past couple of days, from both my gmail and corporate accounts, were returned with the message "... while talking to mail.us.messaging.microsoft.com.:
Mail sent to the wrong Office 365 region." I see that someone else's mail has gotten through, so I'll try again. How I long for sendmail!]

Hi Hari,

I don't know if it matters here, but I did forget to mention that

  *   We're using Slurm 18.08.5, with PMI-x
  *   The compute nodes have 48GB of memory
  *   Mellanox ConnectX-4 IB cards

Here are the ulimit data for the compute nodes:

$ srun bash -c "ulimit -a"
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 191205
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 32768
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

The login node on which I compiled and submitted the program has the same parameters, except for more pending signals.

No apparent difference when I added "MV2_USE_SHARED_MEM=0", except that it fails the single node job, as well:

$ MV2_USE_SHARED_MEM=0 srun -N1 ./a.out
[src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 555] Cannot register vbuf region
[node01:mpi_rank_0][allocate_vbufs] src/mpid/ch3/channels/mrail/src/gen2/vbuf.c:788: VBUF CUDA region allocation failed.
: Invalid argument (22)
srun: error: node01: task 0: Exited with exit code 255
srun: Terminating job step 1993.0

$ MV2_USE_SHARED_MEM=0 srun -N2 ./a.out
[src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 555] Cannot register vbuf region
[node02:mpi_rank_1][allocate_vbufs] src/mpid/ch3/channels/mrail/src/gen2/vbuf.c:788: VBUF CUDA region allocation failed.
: Invalid argument (22)
[src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 555] Cannot register vbuf region
[node01:mpi_rank_0][allocate_vbufs] src/mpid/ch3/channels/mrail/src/gen2/vbuf.c:788: VBUF CUDA region allocation failed.
: Invalid argument (22)
srun: error: node02: task 1: Exited with exit code 255
srun: Terminating job step 1994.0

Regards,
Andy

From: Subramoni, Hari [mailto:subramoni.1 at osu.edu]
Sent: Tuesday, June 16, 2020 10:15 PM
To: Riebs, Andy <andy.riebs at hpe.com<mailto:andy.riebs at hpe.com>>; mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu> <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Cc: Subramoni, Hari <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>>
Subject: RE: [mvapich-discuss] Simple problem with MVAPICH2-X 2.3-1? "VBUF CUDA region allocation failed"

Hi, Andy.

It looks like MVAPICH2-X was not able to register memory with the IB device. This typically happens when there is an issue with max registerable memory. Can you please send the output of ulimit -a on your system? Please refer to the following section of the userguide for more details.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3.4-userguide.html#x1-1370009.1.5

Can you also try running the program on a single node after setting MV2_USE_SHARED_MEM=0 to force MVAPICH2 to use the inter-node communication channel even for intra-node communication operations? This will help us to narrow the issue further.

Best,
Hari.

From: mvapich-discuss-bounces at cse.ohio-state.edu<mailto:mvapich-discuss-bounces at cse.ohio-state.edu> <mvapich-discuss-bounces at mailman.cse.ohio-state.edu<mailto:mvapich-discuss-bounces at mailman.cse.ohio-state.edu>> On Behalf Of Riebs, Andy
Sent: Tuesday, June 16, 2020 5:50 PM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu> <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Subject: [mvapich-discuss] Simple problem with MVAPICH2-X 2.3-1? "VBUF CUDA region allocation failed"

Summary: Attempts to run MPI jobs with 2 or more nodes return "VBUF CUDA region allocation failed" on a cluster with no GPUs.

Long form:

I tried to install simple MPI support with the commands

$ cd ./mvapich2
$  rpm2cpio ~/tmp/mvapich2-x/mvapich2-x-mofed4.5-gnu4.8.5-2.3-1.el7/mvapich2-x-basic-mofed4.5-gnu4.8.5-slurm-2.3-1.el7.x86_64.rpm | cpio -id
$  mv ./opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/*  ./2.3-1

Compiling a simple MPI "hello world" works fine, and it runs fine on a single node:

$ cat mpi_hello.c
#include <stdlib.h>
#include <stdio.h>
#include <mpi.h>

int
main(int argc, char *argv[])
{
        int             rank, size, len;
        char            name[MPI_MAX_PROCESSOR_NAME];

        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);

        MPI_Get_processor_name(name, &len);
        printf ("Hello world! I'm %d of %d on %s\n", rank, size, name);

        MPI_Finalize();
        exit(0);
}
$ mpicc -o a.out mpi_hello.c
$ srun -N1 ./a.out
Hello world! I'm 0 of 1 on node01
$

But it fails when I try to run on 2 or more nodes:

$ srun -N2 ./a.out
[src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 555] Cannot register vbuf region
[node02:mpi_rank_1][allocate_vbufs] src/mpid/ch3/channels/mrail/src/gen2/vbuf.c:788: VBUF CUDA region allocation failed.
: Invalid argument (22)
[src/mpid/ch3/channels/mrail/src/gen2/vbuf.c 555] Cannot register vbuf region
[node01:mpi_rank_0][allocate_vbufs] src/mpid/ch3/channels/mrail/src/gen2/vbuf.c:788: VBUF CUDA region allocation failed.
: Invalid argument (22)
srun: error: node02: task 1: Exited with exit code 255
srun: Terminating job step 1799.0
$ which mpicc
/home/riebs/mvapich2/2.3-1/bin/mpicc
$ ls /home/riebs/mvapich2/2.3-1/
bin  etc  include  lib64  share
$ echo $LD_LIBRARY_PATH
/opt/mellanox/sharp/lib:/home/riebs/mvapich2/2.3-1/lib64:/opt/slurm/18.08.5-2/lib64:/opt/slurm/18.08.5-2/lib
$

The environment:
- CentOS 7.4
- MOFED 4.2
- Arch x86_64
- mvapich2:
$ mpichversion
MVAPICH2 Version:       2.3
MVAPICH2 Release date:  Mon June 8 22:00:00 EST 2020
MVAPICH2 Device:        ch3:mrail
MVAPICH2 configure:     --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm --exec-prefix=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm --bindir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/bin --sbindir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/sbin --sysconfdir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/etc --datadir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/share --includedir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/include --libdir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/lib64 --libexecdir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/share/man --infodir=/opt/mvapich2-x/gnu4.8.5/mofed4.5/basic/slurm/share/info CC=gcc CXX=g++ F77=gfortran FC=gfortran --disable-gl --enable-fortran=yes --enable-cxx=yes --enable-romio --with-ch3-rank-bits=32 --enable-ucr --disable-rpath --disable-static --enable-shared --disable-rdma-cm --without-hydra-ckpointlib --with-pm=slurm --with-pmi=pmi1 --enable-mpit-tool --enable-hybrid CPPFLAGS= CFLAGS=-pipe CXXFLAGS= FFLAGS= FCFLAGS= LDFLAGS=-Wl,-rpath,XORIGIN/placeholder
MVAPICH2 CC:    gcc -pipe     -DNDEBUG -DNVALGRIND -O2
MVAPICH2 CXX:   g++   -DNDEBUG -DNVALGRIND -O2
MVAPICH2 F77:   gfortran   -O2
MVAPICH2 FC:    gfortran   -O2
$

There are no GPUs installed on the cluster, and no sign of CUDA in the environment. I tried specifying "MV2_USE_CUDA=0", but that didn't help.

It seems that I must be missing something pretty obvious here, but I'm not seeing it.

Any suggestions?

Andy

--
Andy Riebs
andy.riebs at hpe.com<mailto:andy.riebs at hpe.com>
Hewlett Packard Enterprise
High Performance Computing Software Engineering

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200619/d5d7e416/attachment-0001.html>