[mvapich-discuss] RESEND Re: mvapich2-2.3.2 crash when CPU affinity is enabled]

Honggang LI honli at redhat.com
Wed Mar 25 09:17:21 EDT 2020


Resend this email as our mail system notified me that the previous reply was
lost.

On Wed, Mar 25, 2020 at 06:20:37AM +0000, Hashmi, Jahanzeb wrote:

>    Hi,
>    Sorry to know that you are facing issues with mvapich2-2.3.3. We have
>    tried to reproduce this at our end with the information you provided,
>    however we are unable to reproduce the issue. The build configuration and
>    the output is given below. Could you please let us know your system
>    configuration e.g., architecture, kernel version, hostfile format
>    (intra/inter), compiler version?

[root at rdma-qe-06 ~]$  uname -r
4.18.0-189.el8.x86_64

[root at rdma-qe-06 ~]$ cat hfile_one_core
172.31.0.6
172.31.0.7

[root at rdma-qe-06 ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://urldefense.com/v3/__http://bugzilla.redhat.com/bugzilla__;!!KGKeukY!k1joY393Q6yqshxcgQtO_CtKQ3JM9rBiGhMVQ4gsPtzEW-SKf5To1qzEvlNRM6B77QQ-jbb6hWB6KK8$  --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

I also attached the "config.log" of mvapich2-2.3.3 . It seems it is a
mellanox MLX5 specific issue. We had test mlx4 and OPA/HFI1. It works
for me with mlx4 and OPA.

https://urldefense.com/v3/__http://people.redhat.com/honli/mvapich2/config.log__;!!KGKeukY!k1joY393Q6yqshxcgQtO_CtKQ3JM9rBiGhMVQ4gsPtzEW-SKf5To1qzEvlNRM6B77QQ-jbb6kP1i_to$ 

>    [hashmij at haswell3 mvapich2-2.3.3]$ ./install/bin/mpiname -a
>    MVAPICH2 2.3.3 Thu January 09 22:00:00 EST 2019 ch3:mrail
>    Compilation
>    CC: gcc    -DNDEBUG -DNVALGRIND -g -O2
>    CXX: g++   -DNDEBUG -DNVALGRIND -g -O2
>    F77: gfortran -L/lib -L/lib   -g -O2
>    FC: gfortran   -g -O2
>    Configuration
>    --prefix=/home/hashmij/release-testing/mvapich2-2.3.3/install
>    --enable-error-messages=all --enable-g=dbg,debug
>    [hashmij at haswell3 mvapich2-2.3.3]$ ./install/bin/mpirun -genv
>    MV2_ENABLE_AFFINITY 1 -genv MV2_DEBUG_SHOW_BACKTRACE 1 -hostfile ~/hosts
>    -np 2 ./install/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_bw
>    # OSU MPI Bandwidth Test v5.6.2
>    # Size      Bandwidth (MB/s)
>    1                       7.33
>    2                      14.43
>    4                      30.03
>    8                      60.00
>    16                    119.77
>    32                    237.23
>    64                    469.27
>    128                   861.36
>    256                  1616.93
>    512                  2804.30
>    1024                 3900.08
>    2048                 5328.12
>    4096                 6738.44
>    8192                 8009.26
>    16384                8477.34
>    32768               12230.39
>    65536               13757.87
>    131072              13768.37
>    262144              12064.52
>    524288              11696.69
>    1048576             11877.53
>    2097152             11720.65
>    4194304             10932.82

The bandwidth is high. It is seems you were not running the test with
mlx5.

thanks




More information about the mvapich-discuss mailing list