[Mvapich-discuss] [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel tests

Mark Dixon mark.c.dixon at durham.ac.uk
Wed Mar 10 10:52:06 EST 2021


Hi both,

Thanks so much for taking a look at this, it's really appreciated. 
Unfortunately, I still seem to be having difficulties, even with 
mvapich2-2.3.5-2.tar.gz.

- x86_64 / centos7 / ext4

   testpdf5 still times out at the same point (after printing 6 lines of
   "Testing  -- multi-chunk collective chunk io (cchunk3)".

- power9 / centos7 / xfs
- power9 / centos7 / lustre

   For both of these, when running "make check" t_mpi does not complete
   even after leaving it running over the weekend:

   foo     115598 115596  0 Mar05 pts/13   00:00:00 mpiexec -n 6 ./t_mpi
   foo     115600 115599  0 Mar05 ?        00:00:00 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
   foo     115601 115599  0 Mar05 ?        00:13:44 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
   foo     115602 115599  0 Mar05 ?        00:13:39 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
   foo     115603 115599  0 Mar05 ?        00:13:43 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
   foo     115604 115599  0 Mar05 ?        00:13:46 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi
   foo     115605 115599  0 Mar05 ?        00:13:29 /nobackup/users/foo/mvpi/src/hdf5-1.10.7/testpar/.libs/t_mpi

   Note that this is before it hits testphdf5.

Any ideas what I'm doing wrong, please?

Thanks,

Mark

On Wed, 3 Mar 2021, Smith, Jeff wrote:

> [EXTERNAL EMAIL]
> Hi Mark,
>
> Here is the download link for the latest tarball.
>
> https://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.5-2.tar.gz
>
> -Jeff
> ________________________________
> From: Subramoni, Hari <subramoni.1 at osu.edu>
> Sent: Wednesday, March 3, 2021 12:12 PM
> To: Mark Dixon <mark.c.dixon at durham.ac.uk>
> Cc: _ENG CSE Mvapich-Core <ENG-cse-mvapich-core at osu.edu>; Subramoni, Hari <subramoni.1 at osu.edu>
> Subject: RE: [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel tests
>
> Hi, Mark.
>
> Please accept my sincere apologies for the delay here again.
>
> It looks like we have been able to fix the issue after the MVAPICH2 2.3.5 release. If we provide you with a new tarball, could you please verify it at your end as well?
>
> Jeff - can you please provide Mark with a tarball of the latest master?
>
> Best,
> Hari.
>
> -----Original Message-----
> From: Subramoni, Hari <subramoni.1 at osu.edu>
> Sent: Thursday, February 11, 2021 9:18 AM
> To: Mark Dixon <mark.c.dixon at durham.ac.uk>
> Cc: mvapich-discuss at lists.osu.edu; Subramoni, Hari <subramoni.1 at osu.edu>
> Subject: RE: [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel tests
>
> Hi, Mark.
>
> Please accept my sincere apologies for the delay here.
>
> Unfortunately, we have not got around to looking into this yet. We will look into it over the next couple of days and get back to you.
>
> Best,
> Hari.
>
> PS: I've CC'ed the e-mail to the new address for MVAPICH-Discuss
>
> -----Original Message-----
> From: Mark Dixon <mark.c.dixon at durham.ac.uk>
> Sent: Thursday, February 11, 2021 4:23 AM
> To: Subramoni, Hari <subramoni.1 at osu.edu>
> Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
> Subject: RE: [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel tests
>
> (Resending, as this doesn't appear in the list archives so may have been lost in the move of the mailing list....)
>
> Hi all,
>
> Has anyone had the time to take a look at this and run my script demonstrating this problem with MPI-IO / HDF5, please?
>
> I've verified this as an issue on the two platforms I've tested:
>
> - rhel 7 / POWER9 / Lustre
> - centos 7 / Intel / ext4
>
> Thanks,
>
> Mark
>
>
> On Tue, 15 Dec 2020, Mark Dixon wrote:
>
>> Hi Hari,
>>
>> Thanks for replying. Just tried it out on an x86 box with Truescale
>> IB, running centos7.8 - same result.
>>
>> This time, files were on an ext4 filesystem and no lustre was
>> available (so compiled with "--enable-romio --with-file-system=ufs")
>>
>> Best,
>>
>> Mark
>>
>> On Tue, 15 Dec 2020, Subramoni, Hari wrote:
>>
>>>
>>>  [EXTERNAL EMAIL] Do not open links or attachments unless you
>>> recognise the  sender and know the content is safe. Otherwise, use
>>> the Report Message  button or report to phishing at durham.ac.uk<"mailto:phishing at durham.ac.uk>.
>>>
>>>  Hi, Mark.
>>>
>>>  Sorry to hear that you're facing issues.
>>>
>>>  Can you please let us know if the is issue particular to POWER9 +
>>> Lustre +  UFS combination or does it happen on x86 systems as well?
>>>
>>>  We will try out the steps you've mentioned locally and see if we are
>>> able  to reproduce it.
>>>
>>>  Thx,,
>>>  Hari.
>>>
>>>  -----Original Message-----
>>>  From: mvapich-discuss-bounces at cse.ohio-state.edu
>>>  <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of
>>> Mark  Dixon
>>>  Sent: Tuesday, December 15, 2020 12:11 PM
>>>  To: mvapich-discuss at cse.ohio-state.edu
>>>  <mvapich-discuss at mailman.cse.ohio-state.edu>
>>>  Subject: [mvapich-discuss] MVAPICH2 2.3.5 and HDF5 1.10.7 parallel
>>> tests
>>>
>>>  Hi there,
>>>
>>>  I'm having trouble getting HDF5's parallel tests to pass when built
>>> on top  of MVAPICH2. I was wondering if anyone else is seeing this, please?
>>>
>>>  For reference (not sure it's relevant), I had similar trouble with
>>> the  version of ROMIO bundled inside OpenMPI
>>> (https://urldefense.com/v3/__https://github.com/open-mpi/ompi/issues/
>>> 6871__;!!KGKeukY!nWOQssfAxiSnSZJJ7bf62O6vhk438EzOmw7yXziKQ3niAaJZUYlm
>>> w-6k6nKp0Si8pwjiqKIs-3I4CJc$
>>>  )
>>>
>>>  Thanks,
>>>
>>>  Mark
>>>
>>>
>>>  #!/bin/bash
>>>
>>> #  We have run this on an IBM POWER9 rhel7.6-alt system with MOFED
>>> 4.7. The #  # hdf5 test "testphdf5" does not complete until it is
>>> terminated by (in #  # this case) a 1 hour alarm timeout.
>>> #
>>> #  The last lines that the test printed was 6 copies of this:
>>> #
>>> # Testing  -- multi-chunk collective chunk io (cchunk3) # # This has
>>> been # run from a location on an xfs filesystem, and # on a lustre
>>> filesystem, # with the same result.
>>>
>>>  set -x
>>>  set -e
>>>
>>>  # (needed on our system to ensure we are using the OS-provided #
>>> version  of GCC, etc.) module purge || true
>>>
>>>  test -d build || mkdir build
>>>  test -d src || mkdir src
>>>
>>>  prefix=`pwd`/build
>>>  export PATH=${prefix}/bin:$PATH
>>>
>>>  cd src
>>>
>>>
>>>  # mvapich2
>>>
>>>  wget
>>>
>>> https://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.3.
>>> 5.tar.gz
>>>  tar xf mvapich2-2.3.5.tar.gz
>>>  (
>>>     cd mvapich2-2.3.5
>>>
>>>     ./configure --prefix=$prefix \
>>>         --enable-romio \
>>>         --with-file-system=lustre+ufs
>>>
>>>     make -j12
>>>     make install
>>> )
>>>
>>>
>>>  # hdf5
>>>
>>>  wget
>>>
>>> https://urldefense.com/v3/__https://support.hdfgroup.org/ftp/HDF5/rel
>>> eases/hdf5-1.10/hdf5-1.10.7/src/hdf5-1.10.7.tar.gz__;!!KGKeukY!nWOQss
>>> fAxiSnSZJJ7bf62O6vhk438EzOmw7yXziKQ3niAaJZUYlmw-6k6nKp0Si8pwjiqKIsvYU
>>> 7Edc$
>>>  tar xf hdf5-1.10.7.tar.gz
>>>  (
>>>     cd hdf5-1.10.7
>>>
>>>     export CC=mpicc
>>>     export CXX=mpicxx
>>>     export FC=mpif90
>>>     export F77=mpif77
>>>     export F90=mpif90
>>>
>>>     export HDF5_ALARM_SECONDS=3600
>>>
>>>     ./configure --prefix=$prefix --enable-parallel
>>>     make -j12
>>>     make check
>>>     make install
>>> )
>>>
>>>  _______________________________________________
>>>  mvapich-discuss mailing list
>>>  mvapich-discuss at cse.ohio-state.edu
>>>  http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>>
>>
>



More information about the Mvapich-discuss mailing list