[mvapich-discuss] Segault on HDF5 1.10.4 "make check" with MVAPICH 2.3 compiled by GCC 8.2

Ryan Novosielski novosirj at rutgers.edu
Fri Mar 22 14:16:46 EDT 2019


I don’t think I ultimately saw a response to this. Any feedback?

I suppose I shall try with MVAPICH2 2.3.1 in the meantime.

> On Feb 21, 2019, at 8:20 PM, Ryan Novosielski <novosirj at rutgers.edu> wrote:
> 
> Of course, sorry — I didn’t think to include that. 
> 
> I have no particular reason to believe that it is related to GPFS, I just mention it on the off chance that it matters. I can try on XFS as well, probably more easily than you can get GPFS to rule it out. 
> 
> For both, I used the method where I build outside the source directory (I forget the term for that). 
> 
> MVAPICH2:
> 
> #!/bin/sh
> 
> module purge
> module load gcc/8.2
> module list
> 
> ../mvapich2-2.3/configure --with-pmi=pmi2 --with-pm=slurm --prefix=/opt/sw/packages/gcc-8_2/mvapich2/2.3 && \
>         make -j32
> 
> HDF5:
> 
> #!/bin/sh
> 
> module purge
> module load gcc/8.2 mvapich2/2.3
> module list
> 
> RUNPARALLEL="srun --mpi=pmi2 --mem=12G -p main -t 1:00:00 -n6 -N1" CC=mpicc F9X=mpifort CXX=mpicxx ../hdf5-1.10.4/configure --prefix=/opt/sw/packages/gcc-8_2/mvapich2-2_3/hdf5/1.10.4 --enable-fortran --enable-build-mode=production --enable-parallel \
>         && make -j32 && make check
> 
> Thank you!
> 
> --
> ____
> || \\UTGERS,       |---------------------------*O*---------------------------
> ||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
>     `'
> 
> On Feb 21, 2019, at 19:55, Subramoni, Hari <subramoni.1 at osu.edu> wrote:
> 
>> Hi, Ryan.
>> 
>> Can you please let us know how you configured MVAPICH2 and how you built and ran HDF5 with the said version of MVAPICH2?
>> 
>> Unfortunately, we do not have GPFS locally. However, let me try to reproduce the problem locally.
>> 
>> Thx,
>> Hari.
>> 
>> -----Original Message-----
>> From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> On Behalf Of Ryan Novosielski
>> Sent: Thursday, February 21, 2019 3:40 PM
>> To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
>> Subject: [mvapich-discuss] Segault on HDF5 1.10.4 "make check" with MVAPICH 2.3 compiled by GCC 8.2
>> 
>> Hi there,
>> 
>> I’m only seeing this particular failure with GCC 8.2, MVAPICH 2.3 (that’s the only version I’ve tried on though), and HDF5 1.10.4. GCC 4.8 and 7.4 both allow the make check on HDF5 to pass properly. All of this is on CentOS 7.5, compiling on GPFS 4.2 storage (I’ve seen some screwy FS-dependent things lately, so I mention it).
>> 
>> The below is what happens. Is there any more data I can gather to help with this? It appears as if it hangs for almost exactly 20 minutes each time and something whacks it. A successful run is only 2-3 seconds long. Note that a running a “sleep 1800” (30 minutes) does not do this. Either related or not, the combination of OpenMPI 3.1.3 and GCC 4.8 (but not 7.4 or 8.2) does a similar thing, but on the t_mpi test, not t_filters_parallel, and without mentioning the signal 11 (that might just be they way they present errors being different — don’t know):
>> 
>> I’m launching the tests via srun via make check with these options:
>> 
>> RUNPARALLEL = srun --mpi=pmi2 --mem=12G -p main -t 1:00:00 -n6 -N1
>> 
>> HDF5 make check when it gets to the sticking point:
>> 
>> Testing  t_filters_parallel
>> ============================
>> t_filters_parallel  Test Log
>> ============================
>> srun: job 84117363 queued and waiting for resources
>> srun: job 84117363 has been allocated resources [slepner063.amarel.rutgers.edu:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
>> srun: error: slepner063: task 0: Segmentation fault
>> srun: error: slepner063: tasks 1-3: Alarm clock 0.01user 0.01system 20:01.44elapsed 0%CPU (0avgtext+0avgdata 5144maxresident)k
>> 0inputs+0outputs (0major+1524minor)pagefaults 0swaps
>> make[4]: *** [t_filters_parallel.chkexe_] Error 1
>> make[4]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
>> make[3]: *** [build-check-p] Error 1
>> make[3]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
>> make[2]: *** [test] Error 2
>> make[2]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
>> make[1]: *** [check-am] Error 2
>> make[1]: Leaving directory `/scratch/novosirj/install-files/hdf5-1.10.4-build-gcc-4.8-mvapich2-2.3/testpar'
>> make: *** [check-recursive] Error 1
>> 
>> 
>> --
>> ____
>> || \\UTGERS,       |---------------------------*O*---------------------------
>> ||_// the State     |         Ryan Novosielski - novosirj at rutgers.edu
>> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
>> ||  \\    of NJ     | Office of Advanced Research Computing - MSB C630, Newark
>>     `'
>> 
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss

--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'




More information about the mvapich-discuss mailing list