[mvapich-discuss] Segault on HDF5 1.10.4 "make check" with MVAPICH 2.3 compiled by GCC 8.2

Subramoni, Hari subramoni.1 at osu.edu
Thu Feb 21 19:55:21 EST 2019


Hi, Ryan.

Can you please let us know how you configured MVAPICH2 and how you built and ran HDF5 with the said version of MVAPICH2?

Unfortunately, we do not have GPFS locally. However, let me try to reproduce the problem locally.

Thx,
Hari.

-----Original Message-----
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of Ryan Novosielski
Sent: Thursday, February 21, 2019 3:40 PM
To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: [mvapich-discuss] Segault on HDF5 1.10.4 "make check" with MVAPICH 2.3 compiled by GCC 8.2

Hi there,

I’m only seeing this particular failure with GCC 8.2, MVAPICH 2.3 (that’s the only version I’ve tried on though), and HDF5 1.10.4. GCC 4.8 and 7.4 both allow the make check on HDF5 to pass properly. All of this is on CentOS 7.5, compiling on GPFS 4.2 storage (I’ve seen some screwy FS-dependent things lately, so I mention it).

The below is what happens. Is there any more data I can gather to help with this? It appears as if it hangs for almost exactly 20 minutes each time and something whacks it. A successful run is only 2-3 seconds long. Note that a running a “sleep 1800” (30 minutes) does not do this. Either related or not, the combination of OpenMPI 3.1.3 and GCC 4.8 (but not 7.4 or 8.2) does a similar thing, but on the t_mpi test, not t_filters_parallel, and without mentioning the signal 11 (that might just be they way they present errors being different — don’t know):

I’m launching the tests via srun via make check with these options:

RUNPARALLEL = srun --mpi=pmi2 --mem=12G -p main -t 1:00:00 -n6 -N1

HDF5 make check when it gets to the sticking point:

Testing  t_filters_parallel
============================
 t_filters_parallel  Test Log
============================
srun: job 84117363 queued and waiting for resources
srun: job 84117363 has been allocated resources [slepner063.amarel.rutgers.edu:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: slepner063: task 0: Segmentation fault
srun: error: slepner063: tasks 1-3: Alarm clock 0.01user 0.01system 20:01.44elapsed 0%CPU (0avgtext+0avgdata 5144maxresident)k
0inputs+0outputs (0major+1524minor)pagefaults 0swaps
make[4]: *** [t_filters_parallel.chkexe_] Error 1
make[4]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[3]: *** [build-check-p] Error 1
make[3]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[2]: *** [test] Error 2
make[2]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make: *** [check-recursive] Error 1


--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'




More information about the mvapich-discuss mailing list