[mvapich-discuss] GROMACS: Memory Allocation (mv2)/Segmentation Fault (mv2-gdr)

Le, Viet Duc vdle at moasys.com
Fri Aug 28 02:18:01 EDT 2020


Hello,

When testing the latest version of mvapich2/mvapich2-gdr (2.3.4) with
gromacs (2019.6), we encounter two peculiar issues.
Below are our setups and build environments. We hope it may help with
reproduction of the issues.

[hardwares]
- Xeon Gold 6230 (Skylake)
- 2 x Tesla V100 (PIX connection)

[software]
- CentOS Linux release 7.4.1708
- slurm 18.08.6
- gcc/4.8.5, cuda/10.1
- MLNX_OFED_LINUX-4.4-2.0.7.0
- mvapich2: ./configure --with-pm=slurm --with-pmi=pmi2
--with-slurm=/usr/local --enable-cuda  --with-cuda=/apps/cuda/10.1
- mvapich2-gdr:
mvapich2-gdr-mcast.cuda10.1.mofed4.4.gnu4.8.5.slurm-2.3.4-1.el7.x86_64.rpm
(from mavpich2 homepage)
- reference: openmpi (3.1.5)

[gromacs] 2019.6 is the last version that can be built with gcc/4.8.5
$ tar xzvf gromacs-2019.6.tar.gz
$ cd gromacs-2019.6
$ mkdir build
$ cd build
$ cmake ..  -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx
-DGMX_SIMD=AVX_512 -DGMX_MPI=on -DGMX_CUDA_TARGET_SM=70
-DGMX_BUILD_OWN_FFTW=ON
$ make
The resulting binary-gmx_mpi-locates in ./bin directory under gromacs
source directory

[input files]
Inputs are taken from MPIBPC: https://urldefense.com/v3/__https://www.mpibpc.mpg.de/grubmueller/bench__;!!KGKeukY!i_gABUkdkRdEYB6Cnvnd49rxJzT4bSbl59ig571AQPAATalPTR7R7wKYAHzsNupGXqSbbks4nwKCtqY$ 
(benchRIB, 2 M atoms, ribosome in water)

[job scripts]
Important MV2_* variables such as MV2_USE_CUDA/MV2_USE_GDRCOPY are properly
set via environment modules.
>>> begin of slurm script
#!/usr/bin/env bash
#SBATCH --partition=skl_v100_2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --gres=gpu:2
#SBATCH --job-name=test-mv2
#SBATCH --error=%j.stderr
#SBATCH --output=%j.stdout
#SBATCH --time=24:00:00
#SBATCH --comment=gromacs

# use gromacs internal affinity setting.
export MV2_ENABLE_AFFINITY=0

module load gcc/4.8.5 cuda/10.1
module load cudampi/mvapich2-2.3.4 # or cudampi/mvapich2-gdr-2.3.4,
respectively.

srun gmx_mpi mdrun -s ./benchRIB.tpr -nsteps 2000 -notunepme -noconfout
-pin on -v
<<< end of slurm script

[mvapich2-2.3.4 error: failure to allocate small memory]
>>> begin of error message
Source file: src/gromacs/utility/smalloc.cpp (line 226)
MPI rank:    3 (out of 8)

Fatal error:
Not enough memory. Failed to realloc 308080 bytes for nbs->cell,
nbs->cell=5206b8d0
(called from file [...]/nbnxn_grid.cpp, line 1502)
<<< end of error message
Descriptions of error:
- Crash randomly when random MPI rank fails to allocate memory. Jobs do run
sometimes, making this error unpredictable.
- The input benchRIB.tpr is rather small, taking up only 10 GB on host
Skylake and about 1.5 GB per GPU, as shown from the attached file.
- If memory is truly insufficient, gromacs will return the above message
with a very large negative value, for example: 'Failed to reallocate
-12415232232 bytes...'
- OpenMPI works reliably without issue. Thus we think that there is a
memory allocation issue related to mvapich2

[mvapich2-gdr-2.3.4 error: srun segmentation]
>>> begin of error message
[gpu31:mpi_rank_2][error_sighandler] Caught error: Segmentation fault
(signal 11)
srun: error: gpu31: tasks 0-7: Segmentation fault (core dumped)
<<< end of error message
Description of error:
- Slurm job crashes immediately at startup. Srun does not play well with
mvapich2-gdr.

The two issues above were also observed when using the latest version of
gromacs (2020.3) and gcc/8.3.0
We appreciate your insights into this matter.

Regards.
Viet-Duc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200828/35dfbbd7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gmx-mv2.JPG
Type: image/jpeg
Size: 55100 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200828/35dfbbd7/attachment-0001.jpe>


More information about the mvapich-discuss mailing list