[mvapich-discuss] GROMACS: Memory Allocation (mv2)/Segmentation Fault (mv2-gdr)

Wed Sep 16 01:36:33 EDT 2020

Hi

Unfortunately, pmi1 is not possible since the available options are
none/openmpi/pmi2.
Per your request, below are the output of 'mpiname -a'

MVAPICH2-GDR 2.3.4 Thu June 4 22:00:00 EST 2020 ch3:mrail

Compilation
CC: gcc -I/apps/cuda/10.1/include   -DNDEBUG -DNVALGRIND -O2
CXX: g++ -I/apps/cuda/10.1/include  -DNDEBUG -DNVALGRIND -O2
F77: gfortran -I/apps/cuda/10.1/include  -O2
FC: gfortran -I/apps/cuda/10.1/include  -O2

Configuration
--build=x86_64-redhat-linux-gnu
--host=x86_64-redhat-linux-gnu
--program-prefix=
--disable-dependency-tracking
--prefix=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0
--exec-prefix=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0
--bindir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/bin
--sbindir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/sbin
--sysconfdir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/etc
--datadir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/share
--includedir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/include
--libdir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/lib64
--libexecdir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/libexec
--localstatedir=/var
--sharedstatedir=/var/lib
--mandir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/share/man
--infodir=/opt/mvapich2/gdr/2.3.4/mcast/no-openacc/cuda10.1/mofed4.4/slurm/gnu8.3.0/share/info
--disable-hybrid
--with-ch3-rank-bits=32
--disable-gl
--disable-static
--enable-shared
--without-hydra-ckpointlib
--with-pm=slurm
--with-pmi=pmi1
--enable-cuda
CPPFLAGS=-I/apps/cuda/10.1/include
CFLAGS=-I/apps/cuda/10.1/include
CXXFLAGS=-I/apps/cuda/10.1/include
FFLAGS=-I/apps/cuda/10.1/include
FCFLAGS=-I/apps/cuda/10.1/include
LDFLAGS=-lcuda
-L/apps/cuda/10.1/lib64/stubs
-L/apps/cuda/10.1/lib64
-lcudart
-lrt
-Wl,-rpath,/apps/cuda/10.1/lib64
-Wl,-rpath,XORIGIN/placeholder
-Wl,--build-id
-L/apps/cuda/10.1/lib64/
-lm
CC=gcc
CXX=g++
FC=gfortran
F77=gfortran
F90=
F90FLAGS=

According to our tests, LAMMPS/Tensorflow work well with mv2-gdr/pmi2 so
this is a rather subtle problem.
It could be that we are (18.02) two releases behind the latest one (20.02)

Regards.
Viet-Duc

On Tue, Sep 15, 2020 at 1:09 AM Shafie Khorassani, Kawthar <
shafiekhorassani.1 at buckeyemail.osu.edu> wrote:

> For the srun issue you are seeing at startup, is it possible to try with
> PMI1 instead? In addition, can you provide us with the output of the
> following for the MVAPICH2 version you are using: *bin/mpiname -a*?
>
>
> Thank you,
>
>
> Kawthar Shafie Khorassani
>
>
> ------------------------------
> *From:* Le, Viet Duc <vdle at moasys.com>
> *Sent:* Wednesday, September 9, 2020 10:40 PM
> *To:* Shafie Khorassani, Kawthar <shafiekhorassani.1 at buckeyemail.osu.edu>
> *Cc:* mvapich-discuss at cse.ohio-state.edu <
> mvapich-discuss at mailman.cse.ohio-state.edu>; _ENG CSE Mvapich-Core <
> ENG-cse-mvapich-core at osu.edu>
> *Subject:* Re: [mvapich-discuss] GROMACS: Memory Allocation
> (mv2)/Segmentation Fault (mv2-gdr)
>
> Dear Kawthar,
>
> Thanks for taking time to test and confirm the issue with gromacs.
> We tested with 2019.6 (gcc/4.8.5) but the same error was observed with the
> latest version 2020.3 (gcc/8.3.0) as you stated.
> Regression further to the 2016.4 version didn't help either.
>
> Unfortunately, tuning MV2_CUDA_BLOCK_SIZE didn't circumvent the issue with
> both ivy bridge and skylake cpus in our disposal.
> The failing rate is highest if you lower the cpu:gpu ratio, for instance
> 8:1 is the minimum for benchRIB.tpr.
> I will reach out to GROMACS forum regarding their memory allocation
> routine.
>
> $ srun --mpi=list
> srun: MPI types are...
> srun: none
> srun: openmpi
> srun: pmi2
>
> Using 'pmi2' explicitly, the following error was observed in addition to
> segmentation fault:
> srun: error: eio_message_socket_accept: slurm_receive_msg[10.151.0.7]:
> Zero Bytes were transmitted or received
> From gdb:
> #0  0x0000000001d13b68 in debug ()
> #1  0x00002ba0bbf37161 in ?? ()
> #2  0x00002ba0bbfecd3c in ?? ()
> #3  0x0000000000000000 in ?? ()
> So the backtrace is not really helpful. For now, we settled with
> without-slurm rpm.
>
> Regards.
> Viet-Duc
>
> On Tue, Sep 8, 2020 at 5:39 AM Shafie Khorassani, Kawthar <
> shafiekhorassani.1 at buckeyemail.osu.edu> wrote:
>
> Hi Viet-Duc,
>
> We were able to reproduce the "Not enough memory" issue you were seeing
> using MVAPICH2 with GROMACS2020.3 and GCC/8.3.0.
> <https://urldefense.com/v3/__http://8.3.0.__;!!KGKeukY!hN09-1edwzfs5RKgYZQ8eftg9Hz7KWra2UDlW9pfQq5Xw3o5dgHC1uFBUeL_yRmLU6q8-nbyeBM$>
> We were only able to reproduce this on x-86 based systems with skylake. Can
> you set the following at run-time and let us know if you are able to
> resolve the memory issue: MV2_CUDA_BLOCK_SIZE=8388608? We were however
> unable to reproduce the segfault you were seeing at startup with
> MVAPICH2-GDR + srun. Can you let us know what version of pmi you are using
> here with the MVAPICH2-GDR run (i.e. PMIv1 or PMIv2)?
>
>
> Thank you,
>
>
> Kawthar Shafie Khorassani
>
>
> ________________________________________
> From: mvapich-discuss-bounces at cse.ohio-state.edu <
> mvapich-discuss-bounces at mailman.cse.ohio-state.edu> on behalf of Le, Viet
> Duc <vdle at moasys.com>
> Sent: Friday, August 28, 2020 2:18 AM
> To: mvapich-discuss at cse.ohio-state.edu
> Subject: [mvapich-discuss] GROMACS: Memory Allocation
> (mv2)/Segmentation        Fault (mv2-gdr)
>
> Hello,
>
> When testing the latest version of mvapich2/mvapich2-gdr (2.3.4) with
> gromacs (2019.6), we encounter two peculiar issues.
> Below are our setups and build environments. We hope it may help with
> reproduction of the issues.
>
> [hardwares]
> - Xeon Gold 6230 (Skylake)
> - 2 x Tesla V100 (PIX connection)
>
> [software]
> - CentOS Linux release 7.4.1708
> - slurm 18.08.6
> - gcc/4.8.5, cuda/10.1
> - MLNX_OFED_LINUX-4.4-2.0.7.0
> - mvapich2: ./configure --with-pm=slurm --with-pmi=pmi2
> --with-slurm=/usr/local --enable-cuda  --with-cuda=/apps/cuda/10.1
> - mvapich2-gdr:
> mvapich2-gdr-mcast.cuda10.1.mofed4.4.gnu4.8.5.slurm-2.3.4-1.el7.x86_64.rpm
> (from mavpich2 homepage)
> - reference: openmpi (3.1.5)
>
> [gromacs] 2019.6 is the last version that can be built with gcc/4.8.5
> $ tar xzvf gromacs-2019.6.tar.gz
> $ cd gromacs-2019.6
> $ mkdir build
> $ cd build
> $ cmake ..  -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx
> -DGMX_SIMD=AVX_512 -DGMX_MPI=on -DGMX_CUDA_TARGET_SM=70
> -DGMX_BUILD_OWN_FFTW=ON
> $ make
> The resulting binary-gmx_mpi-locates in ./bin directory under gromacs
> source directory
>
> [input files]
> Inputs are taken from MPIBPC:
> https://urldefense.com/v3/__https://www.mpibpc.mpg.de/grubmueller/bench__;!!KGKeukY!lmxRg3u9qQ9xGQcLdKpbsGuzEAmt2TF-FtXHmxEJ9ULT5Lp498swggDkpF6C_QMuwi8-Svz59lAoTTYofIej$ <https://urldefense.com/v3/__https://www.mpibpc.mpg.de/grubmueller/bench__;!!KGKeukY!i_gABUkdkRdEYB6Cnvnd49rxJzT4bSbl59ig571AQPAATalPTR7R7wKYAHzsNupGXqSbbks4nwKCtqY$
> <https://urldefense.com/v3/__https://www.mpibpc.mpg.de/grubmueller/bench*3Chttps:/*urldefense.com/v3/__https:/*www.mpibpc.mpg.de/grubmueller/bench__;!!KGKeukY!i_gABUkdkRdEYB6Cnvnd49rxJzT4bSbl59ig571AQPAATalPTR7R7wKYAHzsNupGXqSbbks4nwKCtqY$__;JS8v!!KGKeukY!hN09-1edwzfs5RKgYZQ8eftg9Hz7KWra2UDlW9pfQq5Xw3o5dgHC1uFBUeL_yRmLU6q8b5a-cLI$>>
> (benchRIB, 2 M atoms, ribosome in water)
>
> [job scripts]
> Important MV2_* variables such as MV2_USE_CUDA/MV2_USE_GDRCOPY are
> properly set via environment modules.
> >>> begin of slurm script
> #!/usr/bin/env bash
> #SBATCH --partition=skl_v100_2
> #SBATCH --nodes=1
> #SBATCH --ntasks-per-node=8
> #SBATCH --gres=gpu:2
> #SBATCH --job-name=test-mv2
> #SBATCH --error=%j.stderr
> #SBATCH --output=%j.stdout
> #SBATCH --time=24:00:00
> #SBATCH --comment=gromacs
>
> # use gromacs internal affinity setting.
> export MV2_ENABLE_AFFINITY=0
>
> module load gcc/4.8.5 cuda/10.1
> module load cudampi/mvapich2-2.3.4 # or cudampi/mvapich2-gdr-2.3.4,
> respectively.
>
> srun gmx_mpi mdrun -s ./benchRIB.tpr -nsteps 2000 -notunepme -noconfout
> -pin on -v
> <<< end of slurm script
>
> [mvapich2-2.3.4 error: failure to allocate small memory]
> >>> begin of error message
> Source file: src/gromacs/utility/smalloc.cpp (line 226)
> MPI rank:    3 (out of 8)
>
> Fatal error:
> Not enough memory. Failed to realloc 308080 bytes for nbs->cell,
> nbs->cell=5206b8d0
> (called from file [...]/nbnxn_grid.cpp, line 1502)
> <<< end of error message
> Descriptions of error:
> - Crash randomly when random MPI rank fails to allocate memory. Jobs do
> run sometimes, making this error unpredictable.
> - The input benchRIB.tpr is rather small, taking up only 10 GB on host
> Skylake and about 1.5 GB per GPU, as shown from the attached file.
> - If memory is truly insufficient, gromacs will return the above message
> with a very large negative value, for example: 'Failed to reallocate
> -12415232232 bytes...'
> - OpenMPI works reliably without issue. Thus we think that there is a
> memory allocation issue related to mvapich2
>
> [mvapich2-gdr-2.3.4 error: srun segmentation]
> >>> begin of error message
> [gpu31:mpi_rank_2][error_sighandler] Caught error: Segmentation fault
> (signal 11)
> srun: error: gpu31: tasks 0-7: Segmentation fault (core dumped)
> <<< end of error message
> Description of error:
> - Slurm job crashes immediately at startup. Srun does not play well with
> mvapich2-gdr.
>
> The two issues above were also observed when using the latest version of
> gromacs (2020.3) and gcc/8.3.0
> We appreciate your insights into this matter.
>
> Regards.
> Viet-Duc
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200916/4e3e71da/attachment-0001.html>