[mvapich-discuss] Segault on HDF5 1.10.4 "make check" with MVAPICH 2.3 compiled by GCC 8.2

Ryan Novosielski novosirj at rutgers.edu
Thu Feb 21 20:20:50 EST 2019


Of course, sorry — I didn’t think to include that.

I have no particular reason to believe that it is related to GPFS, I just mention it on the off chance that it matters. I can try on XFS as well, probably more easily than you can get GPFS to rule it out.

For both, I used the method where I build outside the source directory (I forget the term for that).

MVAPICH2:


#!/bin/sh


module purge

module load gcc/8.2

module list


../mvapich2-2.3/configure --with-pmi=pmi2 --with-pm=slurm --prefix=/opt/sw/packages/gcc-8_2/mvapich2/2.3 && \

        make -j32

HDF5:


#!/bin/sh


module purge

module load gcc/8.2 mvapich2/2.3

module list


RUNPARALLEL="srun --mpi=pmi2 --mem=12G -p main -t 1:00:00 -n6 -N1" CC=mpicc F9X=mpifort CXX=mpicxx ../hdf5-1.10.4/configure --prefix=/opt/sw/packages/gcc-8_2/mvapich2-2_3/hdf5/1.10.4 --enable-fortran --enable-build-mode=production --enable-parallel \

        && make -j32 && make check

Thank you!

--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

On Feb 21, 2019, at 19:55, Subramoni, Hari <subramoni.1 at osu.edu<mailto:subramoni.1 at osu.edu>> wrote:

Hi, Ryan.

Can you please let us know how you configured MVAPICH2 and how you built and ran HDF5 with the said version of MVAPICH2?

Unfortunately, we do not have GPFS locally. However, let me try to reproduce the problem locally.

Thx,
Hari.

-----Original Message-----
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu<mailto:mvapich-discuss-bounces at cse.ohio-state.edu>> On Behalf Of Ryan Novosielski
Sent: Thursday, February 21, 2019 3:40 PM
To: mvapich-discuss at cse.ohio-state.edu<mailto:mvapich-discuss at cse.ohio-state.edu> <mvapich-discuss at mailman.cse.ohio-state.edu<mailto:mvapich-discuss at mailman.cse.ohio-state.edu>>
Subject: [mvapich-discuss] Segault on HDF5 1.10.4 "make check" with MVAPICH 2.3 compiled by GCC 8.2

Hi there,

I’m only seeing this particular failure with GCC 8.2, MVAPICH 2.3 (that’s the only version I’ve tried on though), and HDF5 1.10.4. GCC 4.8 and 7.4 both allow the make check on HDF5 to pass properly. All of this is on CentOS 7.5, compiling on GPFS 4.2 storage (I’ve seen some screwy FS-dependent things lately, so I mention it).

The below is what happens. Is there any more data I can gather to help with this? It appears as if it hangs for almost exactly 20 minutes each time and something whacks it. A successful run is only 2-3 seconds long. Note that a running a “sleep 1800” (30 minutes) does not do this. Either related or not, the combination of OpenMPI 3.1.3 and GCC 4.8 (but not 7.4 or 8.2) does a similar thing, but on the t_mpi test, not t_filters_parallel, and without mentioning the signal 11 (that might just be they way they present errors being different — don’t know):

I’m launching the tests via srun via make check with these options:

RUNPARALLEL = srun --mpi=pmi2 --mem=12G -p main -t 1:00:00 -n6 -N1

HDF5 make check when it gets to the sticking point:

Testing  t_filters_parallel
============================
t_filters_parallel  Test Log
============================
srun: job 84117363 queued and waiting for resources
srun: job 84117363 has been allocated resources [slepner063.amarel.rutgers.edu<http://slepner063.amarel.rutgers.edu>:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: slepner063: task 0: Segmentation fault
srun: error: slepner063: tasks 1-3: Alarm clock 0.01user 0.01system 20:01.44elapsed 0%CPU (0avgtext+0avgdata 5144maxresident)k
0inputs+0outputs (0major+1524minor)pagefaults 0swaps
make[4]: *** [t_filters_parallel.chkexe_] Error 1
make[4]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[3]: *** [build-check-p] Error 1
make[3]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[2]: *** [test] Error 2
make[2]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make: *** [check-recursive] Error 1


--
____
|| \\UTGERS,       |---------------------------*O*---------------------------
||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
    `'

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190222/fe9ad6c6/attachment-0001.html>


More information about the mvapich-discuss mailing list