[mvapich-discuss] Undefinied behaviour in MVAPICH-2.2-RC1

Jan Bierbaum jan.bierbaum at tudos.org
Tue Apr 12 13:39:03 EDT 2016


Hi!

Running the newly released MVAPICH 2.2rc1 with full debug options and
the sanitizers of gcc (Debian 5.3.1-14) enabled on top - see the end of
this mail for all options used - I found three (potential) problem classes:


1) Null pointers passed where they should not be
> /home/user/mvapich2-2.2rc1/contrib/hwloc/src/topology.c:1326:3: runtime error: null pointer passed as argument 1, which is declared to never be null
[...]
> /home/user/mvapich2-2.2rc1/src/mpid/ch3/channels/mrail/src/rdma/ch3_smp_progress.c:1699:9: runtime error: null pointer passed as argument 1, which is declared to never be null

The first instance is a bit weird since I compiled MVAPICH explicitly
with the '--without-hwloc' option. However, it is still used. But I
guess this bug should be filed to hwloc instead unless the null pointer
was originally passed in by MVAPICH...


2) Buffer overflow
> ==13170==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe5ebfdb60 at pc 0x2b6158738445 bp 0x7ffe5ebf9210 sp 0x7ffe5ebf89c0
> READ of size 11872 at 0x7ffe5ebfdb60 thread T0
>     #0 0x2b6158738444 in __asan_memcpy (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x88444)
>     #1 0x2b615a7b6306 in MPIUI_Memcpy /home/user/mvapich2-2.2rc1/src/include/mpiimpl.h:179
>     #2 0x2b615a7b7413 in MV2_set_iallreduce_tuning_table /home/user/mvapich2-2.2rc1/src/mpi/coll/iallreduce_tuning.c:217
>     #3 0x2b615a741703 in MV2_collectives_arch_init /home/user/mvapich2-2.2rc1/src/mpi/coll/ch3_shmem_coll.c:407
>     #4 0x2b615ac4fae6 in MPIDI_CH3_Init /home/user/mvapich2-2.2rc1/src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c:497
>     #5 0x2b615ac0baab in MPID_Init /home/user/mvapich2-2.2rc1/src/mpid/ch3/src/mpid_init.c:363
>     #6 0x2b615a9ac338 in MPIR_Init_thread /home/user/mvapich2-2.2rc1/src/mpi/init/initthread.c:512
>     #7 0x2b615a9a9d7c in PMPI_Init /home/user/mvapich2-2.2rc1/src/mpi/init/init.c:195
>     #8 0x2b61596b8678 in pmpi_init_ /home/user/mvapich2-2.2rc1/src/binding/fortran/mpif_h/initf.c:275
[...]
> 
> Address 0x7ffe5ebfdb60 is located in stack of thread T0 at offset 18720 in frame
>     #0 0x2b615a7b63c1 in MV2_set_iallreduce_tuning_table /home/user/mvapich2-2.2rc1/src/mpi/coll/iallreduce_tuning.c:23
> 
>   This frame has 3 object(s):
>     [32, 8512) 'mv2_tmp_iallreduce_thresholds_table'
>     [8544, 18720) 'mv2_tmp_iallreduce_thresholds_table'
>     [18752, 32320) 'mv2_tmp_iallreduce_thresholds_table' <== Memory access at offset 18720 partially underflows this variable

A patch for this one is attached. The patch also inserts a few
additional asserts that keep this kind of bug from being introduced again.


3) Erroneous bitshifts
> /home/user/mvapich2-2.2rc1/src/mpi/comm/commutil.c:846:5: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
> /home/user/mvapich2-2.2rc1/src/mpi/comm/commutil.c:849:21: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
[...]
> /home/user/mvapich2-2.2rc1/src/mpi/comm/commutil.c:1760:36: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
> /home/user/mvapich2-2.2rc1/src/mpi/comm/commutil.c:1776:31: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'

For this and the first bug I cannot easily provide a fix since I'm not
aware of the precise semantics of the code involved. The bitshifts might
be off-by-one errors though.



Regards, Jan


PS: Here's the full configuration in case you are interested.
> $ mpichversion
> MVAPICH2 Version:       2.2rc1
> MVAPICH2 Release date:  Tue Mar 29 22:00:00 EST 2016
> MVAPICH2 Device:        ch3:mrail
> MVAPICH2 configure:     --prefix=/home/user/mvapich_2.2rc1_debug --enable-error-checking=all --enable-error-messages=all --enable-timing=none --enable-g=most --enable-mpit-pvars=none --enable-fast=Og --enable-check-compiler-flags --enable-fortran=all --enable-cxx --enable-threads=multiple --enable-weak-symbols --disable-dependency-tracking --enable-fast-install --enable-alloca --with-device=ch3:mrail --with-rdma=gen2 --disable-rdma-cm --without-hwloc
> MVAPICH2 CC:    gcc -march=native -Og -ftrapv -fsanitize=undefined -fsanitize=address   -g -Og
> MVAPICH2 CXX:   g++ -march=native -Og -ftrapv -fsanitize=undefined -fsanitize=address  -g -Og
> MVAPICH2 F77:   gfortran -L/lib -L/lib -march=native -Og -ftrapv -fsanitize=undefined -fsanitize=address  -g -Og
> MVAPICH2 FC:    gfortran -march=native -Og -ftrapv -fsanitize=undefined -fsanitize=address  -g -Og

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-undefinied-behaviour-and-add-asserts.patch
Type: text/x-patch
Size: 3058 bytes
Desc: not available
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160412/3dd6f5b0/attachment.bin>


More information about the mvapich-discuss mailing list