[Mvapich-discuss] [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel tests
Mark Dixon
mark.c.dixon at durham.ac.uk
Wed Mar 10 10:52:06 EST 2021
Hi both,
Thanks so much for taking a look at this, it's really appreciated.
Unfortunately, I still seem to be having difficulties, even with
mvapich2-2.3.5-2.tar.gz.
- x86_64 / centos7 / ext4
testpdf5 still times out at the same point (after printing 6 lines of
"Testing -- multi-chunk collective chunk io (cchunk3)".
- power9 / centos7 / xfs
- power9 / centos7 / lustre
For both of these, when running "make check" t_mpi does not complete
even after leaving it running over the weekend:
foo 115598 115596 0 Mar05 pts/13 00:00:00 mpiexec -n 6 ./t_mpi
foo 115600 115599 0 Mar05 ? 00:00:00 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
foo 115601 115599 0 Mar05 ? 00:13:44 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
foo 115602 115599 0 Mar05 ? 00:13:39 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
foo 115603 115599 0 Mar05 ? 00:13:43 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
foo 115604 115599 0 Mar05 ? 00:13:46 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
foo 115605 115599 0 Mar05 ? 00:13:29 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
Note that this is before it hits testphdf5.
Any ideas what I'm doing wrong, please?
Thanks,
Mark
On Wed, 3 Mar 2021, Smith, Jeff wrote:
> [EXTERNAL EMAIL]
> Hi Mark,
>
> Here is the download link for the latest tarball.
>
> https://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.5-2.tar.gz
>
> -Jeff
> ________________________________
> From: Subramoni, Hari <subramoni.1 at osu.edu>
> Sent: Wednesday, March 3, 2021 12:12 PM
> To: Mark Dixon <mark.c.dixon at durham.ac.uk>
> Cc: _ENG CSE Mvapich-Core <ENG-cse-mvapich-core at osu.edu>; Subramoni, Hari <subramoni.1 at osu.edu>
> Subject: RE: [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel tests
>
> Hi, Mark.
>
> Please accept my sincere apologies for the delay here again.
>
> It looks like we have been able to fix the issue after the MVAPICH2 2.3.5 release. If we provide you with a new tarball, could you please verify it at your end as well?
>
> Jeff - can you please provide Mark with a tarball of the latest master?
>
> Best,
> Hari.
>
> -----Original Message-----
> From: Subramoni, Hari <subramoni.1 at osu.edu>
> Sent: Thursday, February 11, 2021 9:18 AM
> To: Mark Dixon <mark.c.dixon at durham.ac.uk>
> Cc: mvapich-discuss at lists.osu.edu; Subramoni, Hari <subramoni.1 at osu.edu>
> Subject: RE: [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel tests
>
> Hi, Mark.
>
> Please accept my sincere apologies for the delay here.
>
> Unfortunately, we have not got around to looking into this yet. We will look into it over the next couple of days and get back to you.
>
> Best,
> Hari.
>
> PS: I've CC'ed the e-mail to the new address for MVAPICH-Discuss
>
> -----Original Message-----
> From: Mark Dixon <mark.c.dixon at durham.ac.uk>
> Sent: Thursday, February 11, 2021 4:23 AM
> To: Subramoni, Hari <subramoni.1 at osu.edu>
> Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
> Subject: RE: [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel tests
>
> (Resending, as this doesn't appear in the list archives so may have been lost in the move of the mailing list....)
>
> Hi all,
>
> Has anyone had the time to take a look at this and run my script demonstrating this problem with MPI-IO / HDF5, please?
>
> I've verified this as an issue on the two platforms I've tested:
>
> - rhel 7 / POWER9 / Lustre
> - centos 7 / Intel / ext4
>
> Thanks,
>
> Mark
>
>
> On Tue, 15 Dec 2020, Mark Dixon wrote:
>
>> Hi Hari,
>>
>> Thanks for replying. Just tried it out on an x86 box with Truescale
>> IB, running centos7.8 - same result.
>>
>> This time, files were on an ext4 filesystem and no lustre was
>> available (so compiled with "--enable-romio --with-file-system=ufs")
>>
>> Best,
>>
>> Mark
>>
>> On Tue, 15 Dec 2020, Subramoni, Hari wrote:
>>
>>>
>>> [EXTERNAL EMAIL] Do not open links or attachments unless you
>>> recognise the sender and know the content is safe. Otherwise, use
>>> the Report Message button or report to phishing at durham.ac.uk<"mailto:phishing at durham.ac.uk>.
>>>
>>> Hi, Mark.
>>>
>>> Sorry to hear that you're facing issues.
>>>
>>> Can you please let us know if the is issue particular to POWER9 +
>>> Lustre + UFS combination or does it happen on x86 systems as well?
>>>
>>> We will try out the steps you've mentioned locally and see if we are
>>> able to reproduce it.
>>>
>>> Thx,,
>>> Hari.
>>>
>>> -----Original Message-----
>>> From: mvapich-discuss-bounces at cse.ohio-state.edu
>>> <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of
>>> Mark Dixon
>>> Sent: Tuesday, December 15, 2020 12:11 PM
>>> To: mvapich-discuss at cse.ohio-state.edu
>>> <mvapich-discuss at mailman.cse.ohio-state.edu>
>>> Subject: [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel
>>> tests
>>>
>>> Hi there,
>>>
>>> I'm having trouble getting HDF5's parallel tests to pass when built
>>> on top of MVAPICH2. I was wondering if anyone else is seeing this, please?
>>>
>>> For reference (not sure it's relevant), I had similar trouble with
>>> the version of ROMIO bundled inside OpenMPI
>>> (https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/
>>> 6871__;!!KGKeukY!nWOQssfAxiSnSZJJ7bf62O6vhk438EzOmw7yXziKQ3niAaJZUYlm
>>> w-6k6nKp0Si8pwjiqKIs-3I4CJc$
>>> )
>>>
>>> Thanks,
>>>
>>> Mark
>>>
>>>
>>> #!/bin/bash
>>>
>>> # We have run this on an IBM POWER9 rhel7.6-alt system with MOFED
>>> 4.7. The # # hdf5 test "testphdf5" does not complete until it is
>>> terminated by (in # # this case) a 1 hour alarm timeout.
>>> #
>>> # The last lines that the test printed was 6 copies of this:
>>> #
>>> # Testing -- multi-chunk collective chunk io (cchunk3) # # This has
>>> been # run from a location on an xfs filesystem, and # on a lustre
>>> filesystem, # with the same result.
>>>
>>> set -x
>>> set -e
>>>
>>> # (needed on our system to ensure we are using the OS-provided #
>>> version of GCC, etc.) module purge || true
>>>
>>> test -d build || mkdir build
>>> test -d src || mkdir src
>>>
>>> prefix=`pwd`/build
>>> export PATH=${prefix}/bin:$PATH
>>>
>>> cd src
>>>
>>>
>>> # mvapich2
>>>
>>> wget
>>>
>>> https://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.
>>> 5.tar.gz
>>> tar xf mvapich2-2.3.5.tar.gz
>>> (
>>> cd mvapich2-2.3.5
>>>
>>> ./configure --prefix=$prefix \
>>> --enable-romio \
>>> --with-file-system=lustre+ufs
>>>
>>> make -j12
>>> make install
>>> )
>>>
>>>
>>> # hdf5
>>>
>>> wget
>>>
>>> https://urldefense.com/v3/__https://support.hdfgroup.org/ftp/HDF5/rel
>>> eases/hdf5-1.10/hdf5-1.10.7/src/hdf5-1.10.7.tar.gz__;!!KGKeukY!nWOQss
>>> fAxiSnSZJJ7bf62O6vhk438EzOmw7yXziKQ3niAaJZUYlmw-6k6nKp0Si8pwjiqKIsvYU
>>> 7Edc$
>>> tar xf hdf5-1.10.7.tar.gz
>>> (
>>> cd hdf5-1.10.7
>>>
>>> export CC=mpicc
>>> export CXX=mpicxx
>>> export FC=mpif90
>>> export F77=mpif77
>>> export F90=mpif90
>>>
>>> export HDF5_ALARM_SECONDS=3600
>>>
>>> ./configure --prefix=$prefix --enable-parallel
>>> make -j12
>>> make check
>>> make install
>>> )
>>>
>>> _______________________________________________
>>> mvapich-discuss mailing list
>>> mvapich-discuss at cse.ohio-state.edu
>>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>
>
More information about the Mvapich-discuss
mailing list