[mvapich-discuss] RESEND Re: mvapich2-2.3.2 crash when CPU affinity is enabled]

Hashmi, Jahanzeb hashmi.29 at buckeyemail.osu.edu
Thu Apr 2 14:05:59 EDT 2020


We had offline discussion with the user on this issue. The error was caused by a specific scenario involving non-NUMA machines. The user has posted a small patch which we will take in and make it available in the next MVAPICH2 release.

Best,

Jahanzeb

________________________________
From: mvapich-discuss <mvapich-discuss-bounces at cse.ohio-state.edu> on behalf of Honggang LI <honli at redhat.com>
Sent: Thursday, March 26, 2020 6:38 AM
To: Subramoni, Hari <subramoni.1 at osu.edu>
Cc: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
Subject: Re: [mvapich-discuss] RESEND Re: mvapich2-2.3.2 crash when CPU affinity is enabled]

I don't know why but the message hasn't been delivered again. So resend
it again.

==========================================
   The message you sent to mvapich-discuss at mailman.cse.ohio-state.edu hasn't
   been delivered yet due to: Recipient server unavailable or busy.
==========================================

On Thu, Mar 26, 2020 at 12:13:36PM +0800, Honggang LI wrote:
> On Thu, Mar 26, 2020 at 02:50:09AM +0000, Subramoni, Hari wrote:
> > Hi, Honggang.
> >
> > We have some follow up questions and clarifications.
> >
> > From the original backtrace you sent, the failure seemed to be in a hwloc function that is used to get intra-node processor architecture. Given this, the following statement is a little confusing for us.
> >
>
> Sorry, it is my bad. We observed this issue with mlx4 and mlx5 device.
> mvapich2 work for me when over OPA device. It is not a mlx5 specific
> issue.
>
> > "It seems it is a Mellanox MLX5 specific issue. We had test mlx4 and OPA/HFI1. It works for me with mlx4 and OPA"
> >
> > Could you please let us know what you mean by this? Did you happen to test MVAPICH2 2.3.3GA with OPA/HFI1 NICs and Mellanox MLX4 NICs on the same compute node and had everything work fine? Did the failure happen only when you installed a Mellanox MLX5 NIC on the same compute node?
> >
>
> We run mvapich2 over several pairs of machines. Each pair has same HCA
> installed. Tests were not run the same compute node.
>
> > Could you also send us the output of "lscpu" for the compute node in question?
>
> node with mlx4 failed the test.
> [root at rdma-dev-00 ~]$ ibstat
> CA 'mlx4_0'
>        CA type: MT4099
>        Number of ports: 2
>        Firmware version: 2.42.5000
>        Hardware version: 1
>        Node GUID: 0x0002c90300317b10
>        System image GUID: 0x0002c90300317b13
>        Port 1:
>                State: Active
>                Physical state: LinkUp
>                Rate: 56
>                Base lid: 6
>                LMC: 0
>                SM lid: 13
>                Capability mask: 0x02594868
>                Port GUID: 0x0002c90300317b11
>                Link layer: InfiniBand
>        Port 2:
>                State: Active
>                Physical state: LinkUp
>                Rate: 56
>                Base lid: 8
>                LMC: 0
>                SM lid: 1
>                Capability mask: 0x02594868
>                Port GUID: 0x0002c90300317b12
>                Link layer: InfiniBand
>
> [root at rdma-dev-00 ~]$ lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              12
> On-line CPU(s) list: 0-11
> Thread(s) per core:  2
> Core(s) per socket:  6
> Socket(s):           1
> NUMA node(s):        1
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               62
> Model name:          Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz
> Stepping:            4
> CPU MHz:             1883.701
> CPU max MHz:         3000.0000
> CPU min MHz:         1200.0000
> BogoMIPS:            4999.79
> Virtualization:      VT-x
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            15360K
> NUMA node0 CPU(s):   0-11
> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
>
> ===============================
> node with mlx5 failed the test.
> root at rdma-qe-06 ~]$ ibstat
> CA 'mlx5_0'
>        CA type: MT4113
>        Number of ports: 2
>        Firmware version: 10.16.1200
>        Hardware version: 0
>        Node GUID: 0xf452140300085ef0
>        System image GUID: 0xf452140300085ef0
>        Port 1:
>                State: Active
>                Physical state: LinkUp
>                Rate: 56
>                Base lid: 3
>                LMC: 0
>                SM lid: 13
>                Capability mask: 0x26596848
>                Port GUID: 0xf452140300085ef0
>                Link layer: InfiniBand
>        Port 2:
>                State: Active
>                Physical state: LinkUp
>                Rate: 56
>                Base lid: 26
>                LMC: 0
>                SM lid: 1
>                Capability mask: 0x26596848
>                Port GUID: 0xf452140300085ef8
>                Link layer: InfiniBand
> [root at rdma-qe-06 ~]$ lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              8
> On-line CPU(s) list: 0-7
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):           1
> NUMA node(s):        1
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               58
> Model name:          Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz
> Stepping:            9
> CPU MHz:             1597.783
> CPU max MHz:         3800.0000
> CPU min MHz:         1600.0000
> BogoMIPS:            6784.51
> Virtualization:      VT-x
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            8192K
> NUMA node0 CPU(s):   0-7
> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
> [root at rdma-qe-06 ~]$
>
>
> ============================
> node with OPA works for me.
> [root at rdma-qe-14 ~]$ ibstat
> CA 'hfi1_0'
>        CA type:
>        Number of ports: 1
>        Firmware version: 1.27.0
>        Hardware version: 10
>        Node GUID: 0x00117501016710f0
>        System image GUID: 0x00117501016710f0
>        Port 1:
>                State: Active
>                Physical state: LinkUp
>                Rate: 100
>                Base lid: 3
>                LMC: 0
>                SM lid: 1
>                Capability mask: 0x00490020
>                Port GUID: 0x00117501016710f0
>                Link layer: InfiniBand
> [root at rdma-qe-14 ~]$ lscpu
> Architecture:        x86_64
> CPU op-mode(s):      32-bit, 64-bit
> Byte Order:          Little Endian
> CPU(s):              16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):           2
> NUMA node(s):        2
> Vendor ID:           GenuineIntel
> CPU family:          6
> Model:               63
> Model name:          Intel(R) Xeon(R) CPU E5-2623 v3 @ 3.00GHz
> Stepping:            2
> CPU MHz:             3293.331
> CPU max MHz:         3500.0000
> CPU min MHz:         1200.0000
> BogoMIPS:            5993.06
> Virtualization:      VT-x
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            256K
> L3 cache:            10240K
> NUMA node0 CPU(s):   0,2,4,6,8,10,12,14
> NUMA node1 CPU(s):   1,3,5,7,9,11,13,15
> Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts flush_l1d
>
>
>
> > Regards,
> > Hari.
> >
> > -----Original Message-----
> > From: mvapich-discuss-bounces at cse.ohio-state.edu <mvapich-discuss-bounces at mailman.cse.ohio-state.edu> On Behalf Of Honggang LI
> > Sent: Wednesday, March 25, 2020 9:17 AM
> > To: mvapich-discuss at cse.ohio-state.edu <mvapich-discuss at mailman.cse.ohio-state.edu>
> > Subject: [mvapich-discuss] RESEND Re: mvapich2-2.3.2 crash when CPU affinity is enabled]
> >
> > Resend this email as our mail system notified me that the previous reply was lost.
> >
> > On Wed, Mar 25, 2020 at 06:20:37AM +0000, Hashmi, Jahanzeb wrote:
> >
> > >    Hi,
> > >    Sorry to know that you are facing issues with mvapich2-2.3.3. We have
> > >    tried to reproduce this at our end with the information you provided,
> > >    however we are unable to reproduce the issue. The build configuration and
> > >    the output is given below. Could you please let us know your system
> > >    configuration e.g., architecture, kernel version, hostfile format
> > >    (intra/inter), compiler version?
> >
> > [root at rdma-qe-06 ~]$  uname -r
> > 4.18.0-189.el8.x86_64
> >
> > [root at rdma-qe-06 ~]$ cat hfile_one_core
> > 172.31.0.6
> > 172.31.0.7
> >
> > [root at rdma-qe-06 ~]$ gcc -v
> > Using built-in specs.
> > COLLECT_GCC=gcc
> > COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
> > OFFLOAD_TARGET_NAMES=nvptx-none
> > OFFLOAD_TARGET_DEFAULT=1
> > Target: x86_64-redhat-linux
> > Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://urldefense.com/v3/__http://bugzilla.redhat.com/bugzilla__;!!KGKeukY!k1joY393Q6yqshxcgQtO_CtKQ3JM9rBiGhMVQ4gsPtzEW-SKf5To1qzEvlNRM6B77QQ-jbb6hWB6KK8$  --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux Thread model: posix gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
> >
> > I also attached the "config.log" of mvapich2-2.3.3 . It seems it is a mellanox MLX5 specific issue. We had test mlx4 and OPA/HFI1. It works for me with mlx4 and OPA.
> >
> > https://urldefense.com/v3/__http://people.redhat.com/honli/mvapich2/config.log__;!!KGKeukY!k1joY393Q6yqshxcgQtO_CtKQ3JM9rBiGhMVQ4gsPtzEW-SKf5To1qzEvlNRM6B77QQ-jbb6kP1i_to$
> >
> > >    [hashmij at haswell3 mvapich2-2.3.3]$ ./install/bin/mpiname -a
> > >    MVAPICH2 2.3.3 Thu January 09 22:00:00 EST 2019 ch3:mrail
> > >    Compilation
> > >    CC: gcc    -DNDEBUG -DNVALGRIND -g -O2
> > >    CXX: g++   -DNDEBUG -DNVALGRIND -g -O2
> > >    F77: gfortran -L/lib -L/lib   -g -O2
> > >    FC: gfortran   -g -O2
> > >    Configuration
> > >    --prefix=/home/hashmij/release-testing/mvapich2-2.3.3/install
> > >    --enable-error-messages=all --enable-g=dbg,debug
> > >    [hashmij at haswell3 mvapich2-2.3.3]$ ./install/bin/mpirun -genv
> > >    MV2_ENABLE_AFFINITY 1 -genv MV2_DEBUG_SHOW_BACKTRACE 1 -hostfile ~/hosts
> > >    -np 2 ./install/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw
> > >    # OSU MPI Bandwidth Test v5.6.2
> > >    # Size      Bandwidth (MB/s)
> > >    1                       7.33
> > >    2                      14.43
> > >    4                      30.03
> > >    8                      60.00
> > >    16                    119.77
> > >    32                    237.23
> > >    64                    469.27
> > >    128                   861.36
> > >    256                  1616.93
> > >    512                  2804.30
> > >    1024                 3900.08
> > >    2048                 5328.12
> > >    4096                 6738.44
> > >    8192                 8009.26
> > >    16384                8477.34
> > >    32768               12230.39
> > >    65536               13757.87
> > >    131072              13768.37
> > >    262144              12064.52
> > >    524288              11696.69
> > >    1048576             11877.53
> > >    2097152             11720.65
> > >    4194304             10932.82
> >
> > The bandwidth is high. It is seems you were not running the test with mlx5.
> >
> > thanks
> >
> >
> > _______________________________________________
> > mvapich-discuss mailing list
> > mvapich-discuss at cse.ohio-state.edu
> > http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >


_______________________________________________
mvapich-discuss mailing list
mvapich-discuss at cse.ohio-state.edu
http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20200402/5c1e4a4b/attachment-0001.html>


More information about the mvapich-discuss mailing list