[mvapich-discuss] Segault on HDF5 1.10.4 "make check" with MVAPICH 2.3 compiled by GCC 8.2

Ryan Novosielski novosirj at rutgers.edu
Thu Feb 21 15:39:57 EST 2019


Hi there,

I’m only seeing this particular failure with GCC 8.2, MVAPICH 2.3 (that’s the only version I’ve tried on though), and HDF5 1.10.4. GCC 4.8 and 7.4 both allow the make check on HDF5 to pass properly. All of this is on CentOS 7.5, compiling on GPFS 4.2 storage (I’ve seen some screwy FS-dependent things lately, so I mention it).

The below is what happens. Is there any more data I can gather to help with this? It appears as if it hangs for almost exactly 20 minutes each time and something whacks it. A successful run is only 2-3 seconds long. Note that a running a “sleep 1800” (30 minutes) does not do this. Either related or not, the combination of OpenMPI 3.1.3 and GCC 4.8 (but not 7.4 or 8.2) does a similar thing, but on the t_mpi test, not t_filters_parallel, and without mentioning the signal 11 (that might just be they way they present errors being different — don’t know):

I’m launching the tests via srun via make check with these options:

RUNPARALLEL = srun --mpi=pmi2 --mem=12G -p main -t 1:00:00 -n6 -N1

HDF5 make check when it gets to the sticking point:

Testing  t_filters_parallel
============================
 t_filters_parallel  Test Log
============================
srun: job 84117363 queued and waiting for resources
srun: job 84117363 has been allocated resources
[slepner063.amarel.rutgers.edu:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
srun: error: slepner063: task 0: Segmentation fault
srun: error: slepner063: tasks 1-3: Alarm clock
0.01user 0.01system 20:01.44elapsed 0%CPU (0avgtext+0avgdata 5144maxresident)k
0inputs+0outputs (0major+1524minor)pagefaults 0swaps
make[4]: *** [t_filters_parallel.chkexe_] Error 1
make[4]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[3]: *** [build-check-p] Error 1
make[3]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[2]: *** [test] Error 2
make[2]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
make: *** [check-recursive] Error 1


--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20190221/f047dc7c/attachment.sig>


More information about the mvapich-discuss mailing list