[mvapich-discuss] (no subject)

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Jan 7 15:58:43 EST 2016


Hello David.  Thanks for your report.  I haven't seen this issue before but
I have taken a look at HWLOC's FAQ and think we can do the following:

1. Download and install hwloc 1.11.0 to see if lstopo gives you the same
type of error.
2. If it does give the same error try hwloc 1.11.2 to see if it is still
present.

If either 1 or 2 does not show the error then this is an issue that will be
resolved in an upcoming MVAPICH2 release where we will update to hwloc
1.11.2.  In the meantime you can set HWLOC_HIDE_ERRORS=1 in your
environment.

If both show the error then I think you should follow the advice given in
the FAQ which I will post below for your benefit.  Specifically about using
hwloc-gather-topology and sending them the tarball.

http://www.open-mpi.org/projects/hwloc/doc/v1.11.0/a00030.php
…
What should I do when hwloc reports "operating system" warnings?

When the operating system reports invalid locality information (because of
either software or hardware bugs), hwloc may fail to insert some objects in
the topology because they cannot fit in the already built tree of
resources. If so, hwloc will report a warning like the following. The
object causing this error is ignored, the discovery continues but the
resulting topology will miss some object and may be asymmetric (see also
What happens if my topology is asymmetric?).

****************************************************************************
* hwloc has encountered what looks like an error from the operating system.
…
*
* L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset 0x0000003f)
without inclusion!
* Error occurred in topology.c line 940
*
* Please report this error message to the hwloc user's mailing list,
* along with the output from the hwloc-gather-topology script.
****************************************************************************
As explained in the message, reporting this issue to the hwloc developers
(by sending the tarball that is generated by the hwloc-gather-topology
script on this platform) is a good way to make sure that this is a software
(operating system) or hardware bug (BIOS, etc).

These errors are common on large AMD platforms because several BIOS
releases fail to properly report L3 caches. In the above example, the
hardware reports a L3 cache that is shared by 2 cores in the first NUMA
node and 4 cores in the second NUMA node. That's wrong, it should actually
be shared by all 6 cores in a single NUMA node. The resulting topology will
miss some L3 caches. If your application not care about cache sharing, or
if you do not plan to request cache-aware binding in your process launcher,
you may likely ignore this error. The warning may be hidden by setting
HWLOC_HIDE_ERRORS=1 in your environment.

On Wed, Jan 6, 2016 at 5:23 PM David Winslow <
david.winslow at serendipitynow.com> wrote:

> --===============7187499161893175848==
> Content-Type: multipart/alternative;
>         boundary="Apple-Mail=_CEDB886D-C7A0-4C7E-B22E-AC91576793DE"
>
> --Apple-Mail=_CEDB886D-C7A0-4C7E-B22E-AC91576793DE
> Content-Transfer-Encoding: quoted-printable
> Content-Type: text/plain; charset="utf-8"
>
> We just upgraded our servers from Centos 6.5 to 7.2. With the upgrade, =
> we recompiled MVAPICH2.2.2b using the same method as before. We have two =
> types of serves: Older Dells and fairly new Supermicros. When I run the =
> below osu_bw on the Dell machines, it works as it always did; however, =
> on our Supermicros with AMD Opteron Processor 6344 (48 cores), we now =
> get a HWLOC message. Our software appears to work, but this error never =
> showed up before to upgrade of the OS.
>
> I=E2=80=99m not sure what is wrong; I see this error has occurred with =
> others and may be a kernel bug or BIOS problems. I=E2=80=99ve tried =
> updating the kernel to the latest from Centos =
> (3.10.0-327.3.1.el7.x86_64) and that didn=E2=80=99t solve it. I=E2=80=99ve=
>  even upgraded it to 4.4.3. I still see a similar error when I run =
> hwloc-info.
>
> Question. If i compile with "=E2=80=94without-hwloc=E2=80=9D, I suspect =
> that the error will go away but how would CPUs then be mapped? Where =
> would MVAPICH2 get the information? Is running without hwloc =
> problematic?=20
>
>
> mpirun -demux select -np 2 -hostfile =
> /home/david.winslow/DISTRIBUTED_COMPUTING/hostfile -genv =
> MV2_ENABLE_AFFINITY=3D0 -genv IPATH_NO_CPUAFFINITY=3D1 -genv =
> MV2_DEBUG_SHOW_BACKTRACE=3D1 =
> /opt/mvapich2-2.2b-install-psm/libexec/mvapich2/osu_bw
> =
> **************************************************************************=
> **
> * hwloc 1.11.0rc3-git has encountered what looks like an error from the =
> operating system.
> *
> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset =
> 0x0000003f) without inclusion!
> * Error occurred in topology.c line 983
> *
> * The following FAQ entry in the hwloc documentation may help:
> *   What should I do when hwloc reports "operating system" warnings?
> * Otherwise please report this error message to the hwloc user's mailing =
> list,
> * along with the output+tarball generated by the hwloc-gather-topology =
> script.
> =
> **************************************************************************=
> **
> Authorized uses only. All activity may be monitored and reported.
> =
> **************************************************************************=
> **
> * hwloc 1.11.0rc3-git has encountered what looks like an error from the =
> operating system.
> *
> * L3 (cpuset 0x000003f0) intersects with NUMANode (P#0 cpuset =
> 0x0000003f) without inclusion!
> * Error occurred in topology.c line 983
> *
> * The following FAQ entry in the hwloc documentation may help:
> *   What should I do when hwloc reports "operating system" warnings?
> * Otherwise please report this error message to the hwloc user's mailing =
> list,
> * along with the output+tarball generated by the hwloc-gather-topology =
> script.
> =
> **************************************************************************=
> **
> # OSU MPI Bandwidth Test
> # Size      Bandwidth (MB/s)
> 1                       0.96
> 2                       1.93
> 4                       4.53
> 8                       9.21
> 16                     17.20
> 32                     34.98
> 64                     69.60
> 128                   136.12
> 256                   273.36
> 512                   513.37
> 1024                  874.99
> 2048                 1369.63
> 4096                 2143.63
> 8192                 1836.13
> 16384                2776.28
> 32768                2837.68
> 65536                2814.84
> 131072               2849.19
> 262144               2871.47
> 524288               2881.64
> 1048576              2886.59
> 2097152              2888.57
> 4194304              2891.03
>
> output of mpirun =E2=80=94version
>
> HYDRA build details:
>     Version:                                 3.1.4
>     Release Date:                            Thu Nov 12 06:32:40 EST =
> 2015
>     CC:                              gcc
>     CXX:                             g++
>     F77:
>     F90:
>     Configure options:                       '--disable-option-checking' =
> '--prefix=3D/opt/mvapich2-2.2b-install-psm' '--with-device=3Dch3:psm' =
> '--disable-fortran' '--cache-file=3D/dev/null' '--srcdir=3D.' 'CC=3Dgcc' =
> 'CFLAGS=3D -DNDEBUG -DNVALGRIND -O2' 'LDFLAGS=3D -L/lib -Wl,-rpath,/lib =
> -L/lib -Wl,-rpath,/lib' 'LIBS=3D-libverbs -lpsm_infinipath -lm -lpthread =
> ' 'CPPFLAGS=3D -I/opt/mvapich2-2.2b/src/mpid/ch3/channels/psm/include =
> -I/opt/mvapich2-2.2b/src/mpid/ch3/channels/psm/include =
> -I/opt/mvapich2-2.2b/src/util/wrappers =
> -I/opt/mvapich2-2.2b/src/util/wrappers =
> -I/opt/mvapich2-2.2b/src/mpl/include =
> -I/opt/mvapich2-2.2b/src/mpl/include -I/opt/mvapich2-2.2b/src/openpa/src =
> -I/opt/mvapich2-2.2b/src/openpa/src -D_REENTRANT =
> -I/opt/mvapich2-2.2b/src/mpi/romio/include -I/include -I/include'
>     Process Manager:                         pmi
>     Launchers available:                     ssh rsh fork slurm ll lsf =
> sge manual persist
>     Topology libraries available:            hwloc
>     Resource management kernels available:   user slurm ll lsf sge pbs =
> cobalt
>     Checkpointing libraries available:
>     Demux engines available:                 poll select
>
>
>
> --Apple-Mail=_CEDB886D-C7A0-4C7E-B22E-AC91576793DE
> Content-Transfer-Encoding: quoted-printable
> Content-Type: text/html; charset="utf-8"
>
> <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
> charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
> -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
> class=3D"">We just upgraded our servers from Centos 6.5 to 7.2. With the =
> upgrade, we recompiled MVAPICH2.2.2b using the same method as before. We =
> have two types of serves: Older Dells and fairly new Supermicros. When I =
> run the below osu_bw on the Dell machines, it works as it always did; =
> however, on our Supermicros with AMD Opteron Processor 6344 (48 cores), =
> we now get a HWLOC message. Our software appears to work, but this error =
> never showed up before to upgrade of the OS.<div class=3D""><br =
> class=3D""></div><div class=3D"">I=E2=80=99m not sure what is wrong; I =
> see this error has occurred with others and may be a kernel bug or BIOS =
> problems. I=E2=80=99ve tried updating the kernel to the latest from =
> Centos (3.10.0-327.3.1.el7.x86_64) and that didn=E2=80=99t solve it. =
> I=E2=80=99ve even upgraded it to 4.4.3. I still see a similar error when =
> I run hwloc-info.</div><div class=3D""><br class=3D""></div><div =
> class=3D"">Question. If i compile with "=E2=80=94without-hwloc=E2=80=9D, =
> I suspect that the error will go away but how would CPUs then be mapped? =
> Where would MVAPICH2 get the information? Is running without hwloc =
> problematic? <br class=3D""><div class=3D""><br class=3D""></div><div=
>  class=3D""><br class=3D""></div><div class=3D""><div class=3D""><font =
> face=3D"Courier New" class=3D"">mpirun -demux select -np 2 -hostfile =
> /home/david.winslow/DISTRIBUTED_COMPUTING/hostfile -genv =
> MV2_ENABLE_AFFINITY=3D0 -genv IPATH_NO_CPUAFFINITY=3D1 -genv =
> MV2_DEBUG_SHOW_BACKTRACE=3D1 =
> /opt/mvapich2-2.2b-install-psm/libexec/mvapich2/osu_bw</font></div><div =
> class=3D""><font face=3D"Courier New" =
> class=3D"">***************************************************************=
> *************</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">* hwloc 1.11.0rc3-git has encountered what looks like an =
> error from the operating system.</font></div><div class=3D""><font =
> face=3D"Courier New" class=3D"">*</font></div><div class=3D""><font =
> face=3D"Courier New" class=3D"">* L3 (cpuset 0x000003f0) intersects with =
> NUMANode (P#0 cpuset 0x0000003f) without inclusion!</font></div><div =
> class=3D""><font face=3D"Courier New" class=3D"">* Error occurred in =
> topology.c line 983</font></div><div class=3D""><font face=3D"Courier =
> New" class=3D"">*</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">* The following FAQ entry in the hwloc documentation may =
> help:</font></div><div class=3D""><font face=3D"Courier New" class=3D"">* =
>   What should I do when hwloc reports "operating system" =
> warnings?</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">* Otherwise please report this error message to the hwloc =
> user's mailing list,</font></div><div class=3D""><font face=3D"Courier =
> New" class=3D"">* along with the output+tarball generated by the =
> hwloc-gather-topology script.</font></div><div class=3D""><font =
> face=3D"Courier New" =
> class=3D"">***************************************************************=
> *************</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">Authorized uses only. All activity may be monitored and =
> reported.</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">***************************************************************=
> *************</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">* hwloc 1.11.0rc3-git has encountered what looks like an =
> error from the operating system.</font></div><div class=3D""><font =
> face=3D"Courier New" class=3D"">*</font></div><div class=3D""><font =
> face=3D"Courier New" class=3D"">* L3 (cpuset 0x000003f0) intersects with =
> NUMANode (P#0 cpuset 0x0000003f) without inclusion!</font></div><div =
> class=3D""><font face=3D"Courier New" class=3D"">* Error occurred in =
> topology.c line 983</font></div><div class=3D""><font face=3D"Courier =
> New" class=3D"">*</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">* The following FAQ entry in the hwloc documentation may =
> help:</font></div><div class=3D""><font face=3D"Courier New" class=3D"">* =
>   What should I do when hwloc reports "operating system" =
> warnings?</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">* Otherwise please report this error message to the hwloc =
> user's mailing list,</font></div><div class=3D""><font face=3D"Courier =
> New" class=3D"">* along with the output+tarball generated by the =
> hwloc-gather-topology script.</font></div><div class=3D""><font =
> face=3D"Courier New" =
> class=3D"">***************************************************************=
> *************</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D""># OSU MPI Bandwidth Test</font></div><div class=3D""><font =
> face=3D"Courier New" class=3D""># Size      Bandwidth =
> (MB/s)</font></div><div class=3D""><font face=3D"Courier New" class=3D"">1=
>                      =
>   0.96</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">2                 =
>       1.93</font></div><div class=3D""><font =
> face=3D"Courier New" class=3D"">4           =
>             4.53</font></div><div =
> class=3D""><font face=3D"Courier New" class=3D"">8       =
>                 =
> 9.21</font></div><div class=3D""><font face=3D"Courier New" class=3D"">16 =
>                     =
> 17.20</font></div><div class=3D""><font face=3D"Courier New" class=3D"">32=
>                      =
> 34.98</font></div><div class=3D""><font face=3D"Courier New" class=3D"">64=
>                      =
> 69.60</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">128                 =
>   136.12</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">256                 =
>   273.36</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">512                 =
>   513.37</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">1024                 =
>  874.99</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">2048                 =
> 1369.63</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">4096                 =
> 2143.63</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">8192                 =
> 1836.13</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">16384               =
>  2776.28</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">32768               =
>  2837.68</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">65536               =
>  2814.84</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">131072               =
> 2849.19</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">262144               =
> 2871.47</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">524288               =
> 2881.64</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">1048576             =
>  2886.59</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">2097152             =
>  2888.57</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">4194304             =
>  2891.03</font></div><div class=3D""><br class=3D""></div><div =
> class=3D"">output of mpirun =E2=80=94version<br class=3D""><div =
> class=3D""><br class=3D""></div><div class=3D""><div class=3D""><font =
> face=3D"Courier New" class=3D"">HYDRA build details:</font></div><div =
> class=3D""><font face=3D"Courier New" class=3D"">    Version: =
>                     =
>             3.1.4</font></div><div =
> class=3D""><font face=3D"Courier New" class=3D"">    Release =
> Date:                   =
>          Thu Nov 12 06:32:40 EST =
> 2015</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">    CC:             =
>                 =
>  gcc</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">    CXX:             =
>                 =
> g++</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">    F77:</font></div><div class=3D""><font =
> face=3D"Courier New" class=3D"">    F90:</font></div><div =
> class=3D""><font face=3D"Courier New" class=3D"">    Configure =
> options:                   =
>     '--disable-option-checking' =
> '--prefix=3D/opt/mvapich2-2.2b-install-psm' '--with-device=3Dch3:psm' =
> '--disable-fortran' '--cache-file=3D/dev/null' '--srcdir=3D.' 'CC=3Dgcc' =
> 'CFLAGS=3D -DNDEBUG -DNVALGRIND -O2' 'LDFLAGS=3D -L/lib -Wl,-rpath,/lib =
> -L/lib -Wl,-rpath,/lib' 'LIBS=3D-libverbs -lpsm_infinipath -lm -lpthread =
> ' 'CPPFLAGS=3D -I/opt/mvapich2-2.2b/src/mpid/ch3/channels/psm/include =
> -I/opt/mvapich2-2.2b/src/mpid/ch3/channels/psm/include =
> -I/opt/mvapich2-2.2b/src/util/wrappers =
> -I/opt/mvapich2-2.2b/src/util/wrappers =
> -I/opt/mvapich2-2.2b/src/mpl/include =
> -I/opt/mvapich2-2.2b/src/mpl/include -I/opt/mvapich2-2.2b/src/openpa/src =
> -I/opt/mvapich2-2.2b/src/openpa/src -D_REENTRANT =
> -I/opt/mvapich2-2.2b/src/mpi/romio/include -I/include =
> -I/include'</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">    Process Manager:         =
>                 =
> pmi</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">    Launchers available:       =
>               ssh rsh fork slurm ll =
> lsf sge manual persist</font></div><div class=3D""><font face=3D"Courier =
> New" class=3D"">    Topology libraries available:   =
>          hwloc</font></div><div class=3D""><font =
> face=3D"Courier New" class=3D"">    Resource management =
> kernels available:   user slurm ll lsf sge pbs =
> cobalt</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">    Checkpointing libraries =
> available:</font></div><div class=3D""><font face=3D"Courier New" =
> class=3D"">    Demux engines available:       =
>           poll select</font></div><div =
> class=3D""><br class=3D""></div><div class=3D""><br =
> class=3D""></div></div></div></div></div></body></html>=
>
> --Apple-Mail=_CEDB886D-C7A0-4C7E-B22E-AC91576793DE--
>
> --===============7187499161893175848==
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
> --===============7187499161893175848==--
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160107/7bf73e09/attachment-0001.html>


More information about the mvapich-discuss mailing list