[mvapich-discuss] Undefinied behaviour in MVAPICH-2.2-RC1

Hari Subramoni subramoni.1 at osu.edu
Tue Apr 12 13:51:29 EDT 2016


Hello Jan,

Thank you for the patch. We will take a look at it include it with the next
release of MVAPICH2.

Regards,
Hari.

On Tue, Apr 12, 2016 at 1:39 PM, Jan Bierbaum <jan.bierbaum at tudos.org>
wrote:

> Hi!
>
> Running the newly released MVAPICH 2.2rc1 with full debug options and
> the sanitizers of gcc (Debian 5.3.1-14) enabled on top - see the end of
> this mail for all options used - I found three (potential) problem classes:
>
>
> 1) Null pointers passed where they should not be
> > /home/user/mvapich2-2.2rc1/contrib/hwloc/src/topology.c:1326:3: runtime
> error: null pointer passed as argument 1, which is declared to never be null
> [...]
> >
> /home/user/mvapich2-2.2rc1/src/mpid/ch3/channels/mrail/src/rdma/ch3_smp_progress.c:1699:9:
> runtime error: null pointer passed as argument 1, which is declared to
> never be null
>
> The first instance is a bit weird since I compiled MVAPICH explicitly
> with the '--without-hwloc' option. However, it is still used. But I
> guess this bug should be filed to hwloc instead unless the null pointer
> was originally passed in by MVAPICH...
>
>
> 2) Buffer overflow
> > ==13170==ERROR: AddressSanitizer: stack-buffer-overflow on address
> 0x7ffe5ebfdb60 at pc 0x2b6158738445 bp 0x7ffe5ebf9210 sp 0x7ffe5ebf89c0
> > READ of size 11872 at 0x7ffe5ebfdb60 thread T0
> >     #0 0x2b6158738444 in __asan_memcpy
> (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x88444)
> >     #1 0x2b615a7b6306 in MPIUI_Memcpy
> /home/user/mvapich2-2.2rc1/src/include/mpiimpl.h:179
> >     #2 0x2b615a7b7413 in MV2_set_iallreduce_tuning_table
> /home/user/mvapich2-2.2rc1/src/mpi/coll/iallreduce_tuning.c:217
> >     #3 0x2b615a741703 in MV2_collectives_arch_init
> /home/user/mvapich2-2.2rc1/src/mpi/coll/ch3_shmem_coll.c:407
> >     #4 0x2b615ac4fae6 in MPIDI_CH3_Init
> /home/user/mvapich2-2.2rc1/src/mpid/ch3/channels/mrail/src/rdma/ch3_init.c:497
> >     #5 0x2b615ac0baab in MPID_Init
> /home/user/mvapich2-2.2rc1/src/mpid/ch3/src/mpid_init.c:363
> >     #6 0x2b615a9ac338 in MPIR_Init_thread
> /home/user/mvapich2-2.2rc1/src/mpi/init/initthread.c:512
> >     #7 0x2b615a9a9d7c in PMPI_Init
> /home/user/mvapich2-2.2rc1/src/mpi/init/init.c:195
> >     #8 0x2b61596b8678 in pmpi_init_
> /home/user/mvapich2-2.2rc1/src/binding/fortran/mpif_h/initf.c:275
> [...]
> >
> > Address 0x7ffe5ebfdb60 is located in stack of thread T0 at offset 18720
> in frame
> >     #0 0x2b615a7b63c1 in MV2_set_iallreduce_tuning_table
> /home/user/mvapich2-2.2rc1/src/mpi/coll/iallreduce_tuning.c:23
> >
> >   This frame has 3 object(s):
> >     [32, 8512) 'mv2_tmp_iallreduce_thresholds_table'
> >     [8544, 18720) 'mv2_tmp_iallreduce_thresholds_table'
> >     [18752, 32320) 'mv2_tmp_iallreduce_thresholds_table' <== Memory
> access at offset 18720 partially underflows this variable
>
> A patch for this one is attached. The patch also inserts a few
> additional asserts that keep this kind of bug from being introduced again.
>
>
> 3) Erroneous bitshifts
> > /home/user/mvapich2-2.2rc1/src/mpi/comm/commutil.c:846:5: runtime error:
> left shift of 1 by 31 places cannot be represented in type 'int'
> > /home/user/mvapich2-2.2rc1/src/mpi/comm/commutil.c:849:21: runtime
> error: left shift of 1 by 31 places cannot be represented in type 'int'
> [...]
> > /home/user/mvapich2-2.2rc1/src/mpi/comm/commutil.c:1760:36: runtime
> error: left shift of 1 by 31 places cannot be represented in type 'int'
> > /home/user/mvapich2-2.2rc1/src/mpi/comm/commutil.c:1776:31: runtime
> error: left shift of 1 by 31 places cannot be represented in type 'int'
>
> For this and the first bug I cannot easily provide a fix since I'm not
> aware of the precise semantics of the code involved. The bitshifts might
> be off-by-one errors though.
>
>
>
> Regards, Jan
>
>
> PS: Here's the full configuration in case you are interested.
> > $ mpichversion
> > MVAPICH2 Version:       2.2rc1
> > MVAPICH2 Release date:  Tue Mar 29 22:00:00 EST 2016
> > MVAPICH2 Device:        ch3:mrail
> > MVAPICH2 configure:     --prefix=/home/user/mvapich_2.2rc1_debug
> --enable-error-checking=all --enable-error-messages=all
> --enable-timing=none --enable-g=most --enable-mpit-pvars=none
> --enable-fast=Og --enable-check-compiler-flags --enable-fortran=all
> --enable-cxx --enable-threads=multiple --enable-weak-symbols
> --disable-dependency-tracking --enable-fast-install --enable-alloca
> --with-device=ch3:mrail --with-rdma=gen2 --disable-rdma-cm --without-hwloc
> > MVAPICH2 CC:    gcc -march=native -Og -ftrapv -fsanitize=undefined
> -fsanitize=address   -g -Og
> > MVAPICH2 CXX:   g++ -march=native -Og -ftrapv -fsanitize=undefined
> -fsanitize=address  -g -Og
> > MVAPICH2 F77:   gfortran -L/lib -L/lib -march=native -Og -ftrapv
> -fsanitize=undefined -fsanitize=address  -g -Og
> > MVAPICH2 FC:    gfortran -march=native -Og -ftrapv -fsanitize=undefined
> -fsanitize=address  -g -Og
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160412/a042890c/attachment-0001.html>


More information about the mvapich-discuss mailing list