[mvapich-discuss] mvapich2 + MALLOC_CHECK_

Sayantan Sur surs at cse.ohio-state.edu
Wed Jun 2 15:47:32 EDT 2010


Hi Dan,

Thanks for reporting this. I don't think anyone has reported this
earlier. I was able to reproduce on our systems, and am currently
looking into this issue.

Thanks.

On Tue, Jun 1, 2010 at 6:25 PM, Dan Kokron <daniel.kokron at nasa.gov> wrote:
> I am attempting to debug an application that fails during MPI_Finalize.
> After trying the usual debugging options (-g etc), I set MALLOC_CHECK_=2
> to see what would happen.  It now fails with the following trace during
> MPI_Init.  I didn't see any mention of this issue in the archives.
> Maybe I missed it.
>
> #0  0x00000000052e5bb5 in raise () from /lib64/libc.so.6
> #1  0x00000000052e6fb0 in abort () from /lib64/libc.so.6
> #2  0x00000000005718f9 in for__signal_handler ()
> #3  <signal handler called>
> #4  0x00000000052e5bb5 in raise () from /lib64/libc.so.6
> #5  0x00000000052e6fb0 in abort () from /lib64/libc.so.6
> #6  0x0000000000412126 in free_check (mem=0x4138000, caller=0x0) at hooks.c:274
> #7  0x000000000041480a in free (mem=0x4138000) at mvapich_malloc.c:3443
> #8  0x00000000004180ce in mvapich2_minit () at mem_hooks.c:86
> #9  0x00000000005526a8 in MPIDI_CH3I_RDMA_init (pg=0x411f618, pg_rank=21) at rdma_iba_init.c:153
> #10 0x000000000054d148 in MPIDI_CH3_Init (has_parent=0, pg=0x411f618, pg_rank=21) at ch3_init.c:161
> #11 0x00000000004d9cce in MPID_Init (argc=0x0, argv=0x0, requested=0, provided=0x7feffba78, has_args=0x7feffba80, has_env=0x7feffba7c) at mpid_init.c:189
> #12 0x0000000000435780 in MPIR_Init_thread (argc=0x0, argv=0x0, required=0, provided=0x0) at initthread.c:305
> #13 0x0000000000434582 in PMPI_Init (argc=0x0, argv=0x0) at init.c:135
> #14 0x0000000000410e0f in pmpi_init_ (ierr=0x7feffe774) at initf.c:129
> #15 0x000000000040bdbf in gcrm_test_io () at gcrm_test_io.f90:27
> #16 0x000000000040bcdc in main ()
>
> Valgrind-3.5.0 gives the following
>
> ==21574== Conditional jump or move depends on uninitialised value(s)
> ==21574==    at 0x41182C: mem2chunk_check (hooks.c:165)
> ==21574==    by 0x4120C3: free_check (hooks.c:268)
> ==21574==    by 0x414809: free (mvapich_malloc.c:3443)
> ==21574==    by 0x4180CD: mvapich2_minit (mem_hooks.c:86)
> ==21574==    by 0x5526A7: MPIDI_CH3I_RDMA_init (rdma_iba_init.c:153)
> ==21574==    by 0x54D147: MPIDI_CH3_Init (ch3_init.c:161)
> ==21574==    by 0x4D9CCD: MPID_Init (mpid_init.c:189)
> ==21574==    by 0x43577F: MPIR_Init_thread (initthread.c:305)
> ==21574==    by 0x434581: PMPI_Init (init.c:135)
> ==21574==    by 0x410E0E: mpi_init_ (initf.c:129)
> ==21574==    by 0x40BDBE: MAIN__ (gcrm_test_io.f90:27)
> ==21574==    by 0x40BCDB: main (in /gpfsm/dhome/dkokron/play/mpi-io/gcrm_test_io.x)
> ==21574==  Uninitialised value was created
> ==21574==    at 0x536FC7A: brk (in /lib64/libc-2.4.so)
> ==21574==    by 0x536FD41: sbrk (in /lib64/libc-2.4.so)
> ==21574==    by 0x418251: mvapich2_sbrk (mem_hooks.c:148)
> ==21574==    by 0x414058: sYSMALLOc (mvapich_malloc.c:2983)
> ==21574==    by 0x41647E: _int_malloc (mvapich_malloc.c:4318)
> ==21574==    by 0x411FE8: malloc_check (hooks.c:252)
> ==21574==    by 0x414607: malloc (mvapich_malloc.c:3395)
> ==21574==    by 0x4113AA: malloc_hook_ini (hooks.c:28)
> ==21574==    by 0x414607: malloc (mvapich_malloc.c:3395)
> ==21574==    by 0x57E382: for__get_vm (in /gpfsm/dhome/dkokron/play/mpi-io/gcrm_test_io.x)
> ==21574==    by 0x5722B2: for_rtl_init_ (in /gpfsm/dhome/dkokron/play/mpi-io/gcrm_test_io.x)
> ==21574==    by 0x40BCD6: main (in /gpfsm/dhome/dkokron/play/mpi-io/gcrm_test_io.x)
>
> I am using mvapich2-1.4-2010-05-25 configured as follows
>
> ./configure CC=icc CXX=icpc F77=ifort F90=ifort CFLAGS="-DRDMA_CM -fpic
> -O0 -traceback -debug" CXXFLAGS="-DRDMA_CM -fpic -O0 -traceback -debug"
> FFLAGS="-fpic -O0 -traceback -debug -nolib-inline -check bounds -check
> uninit -fp-stack-check -ftrapuv" F90FLAGS="-fpic -O0 -traceback -debug
> -nolib-inline -check bounds -check uninit -fp-stack-check -ftrapuv"
> --prefix=/discover/nobackup/dkokron/mv2-1.4.1_debug
> --enable-error-checking=all --enable-error-messages=all --enable-g=all
> --enable-f77 --enable-f90 --enable-cxx --enable-mpe --enable-romio
> --enable-threads=multiple --with-rdma=gen2
>
> on Linux
> 2.6.16.60-0.42.5-smp
>
> and Intel compilers (v 11.0.083)
>
> Note that line number 86 in my mem_hooks.c is (I added some debug
> prints)
>
>    free(ptr_calloc);
> --->free(ptr_valloc);  <---
>    free(ptr_memalign);
>
> --
> Dan Kokron
> Global Modeling and Assimilation Office
> NASA Goddard Space Flight Center
> Greenbelt, MD 20771
> Daniel.S.Kokron at nasa.gov
> Phone: (301) 614-5192
> Fax:   (301) 614-5304
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>
>



-- 
Sayantan Sur

Research Scientist
Department of Computer Science
The Ohio State University.



More information about the mvapich-discuss mailing list