[mvapich-discuss] GROMACS: Memory Allocation (mv2)/Segmentation Fault (mv2-gdr)

Shafie Khorassani, Kawthar shafiekhorassani.1 at buckeyemail.osu.edu
Mon Sep 7 16:39:49 EDT 2020


Hi Viet-Duc,

We were able to reproduce the "Not enough memory" issue you were seeing using MVAPICH2 with GROMACS2020.3 and GCC/8.3.0. We were only able to reproduce this on x-86 based systems with skylake. Can you set the following at run-time and let us know if you are able to resolve the memory issue: MV2_CUDA_BLOCK_SIZE=8388608? We were however unable to reproduce the segfault you were seeing at startup with MVAPICH2-GDR + srun. Can you let us know what version of pmi you are using here with the MVAPICH2-GDR run (i.e. PMIv1 or PMIv2)?



Thank you,


Kawthar Shafie Khorassani


________________________________________
From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> on behalf of Le, Viet Duc <vdle at moasys.com>
Sent: Friday, August 28, 2020 2:18 AM
To: mvapich-discuss at cse.ohio-state.edu
Subject: [mvapich-discuss] GROMACS: Memory Allocation (mv2)/Segmentation        Fault (mv2-gdr)

Hello,

When testing the latest version of mvapich2/mvapich2-gdr (2.3.4) with gromacs (2019.6), we encounter two peculiar issues.
Below are our setups and build environments. We hope it may help with reproduction of the issues.

[hardwares]
- Xeon Gold 6230 (Skylake)
- 2 x Tesla V100 (PIX connection)

[software]
- CentOS Linux release 7.4.1708
- slurm 18.08.6
- gcc/4.8.5, cuda/10.1
- MLNX_OFED_LINUX-4.4-2.0.7.0
- mvapich2: ./configure --with-pm=slurm --with-pmi=pmi2 --with-slurm=/usr/local --enable-cuda  --with-cuda=/apps/cuda/10.1
- mvapich2-gdr: mvapich2-gdr-mcast.cuda10.1.mofed4.4.gnu4.8.5.slurm-2.3.4-1.el7.x86_64.rpm (from mavpich2 homepage)
- reference: openmpi (3.1.5)

[gromacs] 2019.6 is the last version that can be built with gcc/4.8.5
$ tar xzvf gromacs-2019.6.tar.gz
$ cd gromacs-2019.6
$ mkdir build
$ cd build
$ cmake ..  -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DGMX_SIMD=AVX_512 -DGMX_MPI=on -DGMX_CUDA_TARGET_SM=70 -DGMX_BUILD_OWN_FFTW=ON
$ make
The resulting binary-gmx_mpi-locates in ./bin directory under gromacs source directory

[input files]
Inputs are taken from MPIBPC: https://www.mpibpc.mpg.de/grubmueller/bench<https://urldefense.com/v3/__https://www.mpibpc.mpg.de/grubmueller/bench__;!!KGKeukY!i_gABUkdkRdEYB6Cnvnd49rxJzT4bSbl59ig571AQPAATalPTR7R7wKYAHzsNupGXqSbbks4nwKCtqY$> (benchRIB, 2 M atoms, ribosome in water)

[job scripts]
Important MV2_* variables such as MV2_USE_CUDA/MV2_USE_GDRCOPY are properly set via environment modules.
>>> begin of slurm script
#!/usr/bin/env bash
#SBATCH --partition=skl_v100_2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --gres=gpu:2
#SBATCH --job-name=test-mv2
#SBATCH --error=%j.stderr
#SBATCH --output=%j.stdout
#SBATCH --time=24:00:00
#SBATCH --comment=gromacs

# use gromacs internal affinity setting.
export MV2_ENABLE_AFFINITY=0

module load gcc/4.8.5 cuda/10.1
module load cudampi/mvapich2-2.3.4 # or cudampi/mvapich2-gdr-2.3.4, respectively.

srun gmx_mpi mdrun -s ./benchRIB.tpr -nsteps 2000 -notunepme -noconfout -pin on -v
<<< end of slurm script

[mvapich2-2.3.4 error: failure to allocate small memory]
>>> begin of error message
Source file: src/gromacs/utility/smalloc.cpp (line 226)
MPI rank:    3 (out of 8)

Fatal error:
Not enough memory. Failed to realloc 308080 bytes for nbs->cell, nbs->cell=5206b8d0
(called from file [...]/nbnxn_grid.cpp, line 1502)
<<< end of error message
Descriptions of error:
- Crash randomly when random MPI rank fails to allocate memory. Jobs do run sometimes, making this error unpredictable.
- The input benchRIB.tpr is rather small, taking up only 10 GB on host Skylake and about 1.5 GB per GPU, as shown from the attached file.
- If memory is truly insufficient, gromacs will return the above message with a very large negative value, for example: 'Failed to reallocate -12415232232 bytes...'
- OpenMPI works reliably without issue. Thus we think that there is a memory allocation issue related to mvapich2

[mvapich2-gdr-2.3.4 error: srun segmentation]
>>> begin of error message
[gpu31:mpi_rank_2][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: gpu31: tasks 0-7: Segmentation fault (core dumped)
<<< end of error message
Description of error:
- Slurm job crashes immediately at startup. Srun does not play well with mvapich2-gdr.

The two issues above were also observed when using the latest version of gromacs (2020.3) and gcc/8.3.0
We appreciate your insights into this matter.

Regards.
Viet-Duc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200907/72b535a4/attachment.html>


More information about the mvapich-discuss mailing list