Hi, Viet-Duc.

Thanks. We will generate a new GDR RPM and provide it to you. That will help you circumvent the issue with SLURM/PMI2.


Unfortunately, pmi1 is not possible since the available options are none/openmpi/pmi2.
Per your request, below are the output of 'mpiname -a'

MVAPICH2-GDR 2.3.4 Thu June 4 22:00:00 EST 2020 ch3:mrail

CC: gcc -I/apps/cuda/10.1/include   -DNDEBUG -DNVALGRIND -O2
CXX: g++ -I/apps/cuda/10.1/include  -DNDEBUG -DNVALGRIND -O2
F77: gfortran -I/apps/cuda/10.1/include  -O2
FC: gfortran -I/apps/cuda/10.1/include  -O2


According to our tests, LAMMPS/Tensorflow work well with mv2-gdr/pmi2 so this is a rather subtle problem.
It could be that we are (18.02) two releases behind the latest one (20.02)


For the srun issue you are seeing at startup, is it possible to try with PMI1 instead? In addition, can you provide us with the output of the following for the MVAPICH2 version you are using: bin/mpiname -a?

Thanks for taking time to test and confirm the issue with gromacs.
We tested with 2019.6 (gcc/4.8.5) but the same error was observed with the latest version 2020.3 (gcc/8.3.0) as you stated.
Regression further to the 2016.4 version didn't help either.

Unfortunately, tuning MV2_CUDA_BLOCK_SIZE didn't circumvent the issue with both ivy bridge and skylake cpus in our disposal.
The failing rate is highest if you lower the cpu:gpu ratio, for instance 8:1 is the minimum for benchRIB.tpr.
I will reach out to GROMACS forum regarding their memory allocation routine.

$ srun --mpi=list
srun: MPI types are...
srun: none
srun: openmpi
srun: pmi2

Using 'pmi2' explicitly, the following error was observed in addition to segmentation fault:
srun: error: eio_message_socket_accept: slurm_receive_msg[]: Zero Bytes were transmitted or received
From gdb:
#0  0x0000000001d13b68 in debug ()
#1  0x00002ba0bbf37161 in ?? ()
#2  0x00002ba0bbfecd3c in ?? ()
#3  0x0000000000000000 in ?? ()
So the backtrace is not really helpful. For now, we settled with without-slurm rpm.


We were able to reproduce the "Not enough memory" issue you were seeing using MVAPICH2 with GROMACS2020.3 and GCC/8.3.0.<https://urldefense.com/v3/__http:/8.3.0.__;!!KGKeukY!hN09-1edwzfs5RKgYZQ8eftg9Hz7KWra2UDlW9pfQq5Xw3o5dgHC1uFBUeL_yRmLU6q8-nbyeBM$> We were only able to reproduce this on x-86 based systems with skylake. Can you set the following at run-time and let us know if you are able to resolve the memory issue: MV2_CUDA_BLOCK_SIZE=8388608? We were however unable to reproduce the segfault you were seeing at startup with MVAPICH2-GDR + srun. Can you let us know what version of pmi you are using here with the MVAPICH2-GDR run (i.e. PMIv1 or PMIv2)?

When testing the latest version of mvapich2/mvapich2-gdr (2.3.4) with gromacs (2019.6), we encounter two peculiar issues.
Below are our setups and build environments. We hope it may help with reproduction of the issues.

- Xeon Gold 6230 (Skylake)
- 2 x Tesla V100 (PIX connection)

- CentOS Linux release 7.4.1708
- slurm 18.08.6
- gcc/4.8.5, cuda/10.1
- mvapich2: ./configure --with-pm=slurm --with-pmi=pmi2 --with-slurm=/usr/local --enable-cuda  --with-cuda=/apps/cuda/10.1
- mvapich2-gdr: mvapich2-gdr-mcast.cuda10.1.mofed4.4.gnu4.8.5.slurm-2.3.4-1.el7.x86_64.rpm (from mavpich2 homepage)
- reference: openmpi (3.1.5)

[gromacs] 2019.6 is the last version that can be built with gcc/4.8.5
$ tar xzvf gromacs-2019.6.tar.gz
$ cd gromacs-2019.6
$ mkdir build
$ cd build
$ make
The resulting binary-gmx_mpi-locates in ./bin directory under gromacs source directory

[input files]
Inputs are taken from MPIBPC: https://www.mpibpc.mpg.de/grubmueller/bench<https://urldefense.com/v3/__https://www.mpibpc.mpg.de/grubmueller/bench__;!!KGKeukY!i_gABUkdkRdEYB6Cnvnd49rxJzT4bSbl59ig571AQPAATalPTR7R7wKYAHzsNupGXqSbbks4nwKCtqY$<https://urldefense.com/v3/__https:/www.mpibpc.mpg.de/grubmueller/bench*3Chttps:/*urldefense.com/v3/__https:/*www.mpibpc.mpg.de/grubmueller/bench__;!!KGKeukY!i_gABUkdkRdEYB6Cnvnd49rxJzT4bSbl59ig571AQPAATalPTR7R7wKYAHzsNupGXqSbbks4nwKCtqY$__;JS8v!!KGKeukY!hN09-1edwzfs5RKgYZQ8eftg9Hz7KWra2UDlW9pfQq5Xw3o5dgHC1uFBUeL_yRmLU6q8b5a-cLI$>> (benchRIB, 2 M atoms, ribosome in water)

[job scripts]
Important MV2_* variables such as MV2_USE_CUDA/MV2_USE_GDRCOPY are properly set via environment modules.
>>> begin of slurm script
#!/usr/bin/env bash
#SBATCH --partition=skl_v100_2
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --gres=gpu:2
#SBATCH --job-name=test-mv2
#SBATCH --error=%j.stderr
#SBATCH --output=%j.stdout
#SBATCH --time=24:00:00
#SBATCH --comment=gromacs

# use gromacs internal affinity setting.

module load gcc/4.8.5 cuda/10.1
module load cudampi/mvapich2-2.3.4 # or cudampi/mvapich2-gdr-2.3.4, respectively.

srun gmx_mpi mdrun -s ./benchRIB.tpr -nsteps 2000 -notunepme -noconfout -pin on -v
<<< end of slurm script

[mvapich2-2.3.4 error: failure to allocate small memory]
>>> begin of error message
Source file: src/gromacs/utility/smalloc.cpp (line 226)
MPI rank:    3 (out of 8)

Fatal error:
Not enough memory. Failed to realloc 308080 bytes for nbs->cell, nbs->cell=5206b8d0
(called from file [...]/nbnxn_grid.cpp, line 1502)
<<< end of error message
Descriptions of error:
- Crash randomly when random MPI rank fails to allocate memory. Jobs do run sometimes, making this error unpredictable.
- The input benchRIB.tpr is rather small, taking up only 10 GB on host Skylake and about 1.5 GB per GPU, as shown from the attached file.
- If memory is truly insufficient, gromacs will return the above message with a very large negative value, for example: 'Failed to reallocate -12415232232 bytes...'
- OpenMPI works reliably without issue. Thus we think that there is a memory allocation issue related to mvapich2

[mvapich2-gdr-2.3.4 error: srun segmentation]
>>> begin of error message
[gpu31:mpi_rank_2][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: gpu31: tasks 0-7: Segmentation fault (core dumped)
<<< end of error message
Description of error:
- Slurm job crashes immediately at startup. Srun does not play well with mvapich2-gdr.

The two issues above were also observed when using the latest version of gromacs (2020.3) and gcc/8.3.0
We appreciate your insights into this matter.

