[mvapich-discuss] mvapich2 + MALLOC_CHECK_

Sayantan Sur surs at cse.ohio-state.edu
Wed Jun 2 20:07:56 EDT 2010


Hi Dan,

I looked into this issue. It seems that this is a bug in ptmalloc2.
MVAPICH/MVAPICH2 uses ptmalloc2 implementation of malloc to provide
safe registration/de-registration. It seems that the malloc checking
for memory allocated through 'valloc' might be buggy.

This bug seems to have been solved in ptmalloc3. The last time we
looked into upgrading to ptmalloc3, we saw this message on ptmalloc's
website. "In multi-thread Applications, ptmalloc2 is currently
slightly more memory-efficient than ptmalloc3."
[http://www.malloc.de/en/] We decided not to upgrade to ptmalloc3.

If you use MALLOC_CHECK_=1, then you will get a warning, but your
program will proceed. Presumably, you chose to use this checking to
find bugs in your MPI program? Maybe you can overlook this one warning
for now and let us know how it works. We will also investigate
ptmalloc3 and plan to incorporate this in future release.

Thanks.

On Wed, Jun 2, 2010 at 3:47 PM, Sayantan Sur <surs at cse.ohio-state.edu> wrote:
> Hi Dan,
>
> Thanks for reporting this. I don't think anyone has reported this
> earlier. I was able to reproduce on our systems, and am currently
> looking into this issue.
>
> Thanks.
>
> On Tue, Jun 1, 2010 at 6:25 PM, Dan Kokron <daniel.kokron at nasa.gov> wrote:
>> I am attempting to debug an application that fails during MPI_Finalize.
>> After trying the usual debugging options (-g etc), I set MALLOC_CHECK_=2
>> to see what would happen.  It now fails with the following trace during
>> MPI_Init.  I didn't see any mention of this issue in the archives.
>> Maybe I missed it.
>>
>> #0  0x00000000052e5bb5 in raise () from /lib64/libc.so.6
>> #1  0x00000000052e6fb0 in abort () from /lib64/libc.so.6
>> #2  0x00000000005718f9 in for__signal_handler ()
>> #3  <signal handler called>
>> #4  0x00000000052e5bb5 in raise () from /lib64/libc.so.6
>> #5  0x00000000052e6fb0 in abort () from /lib64/libc.so.6
>> #6  0x0000000000412126 in free_check (mem=0x4138000, caller=0x0) at hooks.c:274
>> #7  0x000000000041480a in free (mem=0x4138000) at mvapich_malloc.c:3443
>> #8  0x00000000004180ce in mvapich2_minit () at mem_hooks.c:86
>> #9  0x00000000005526a8 in MPIDI_CH3I_RDMA_init (pg=0x411f618, pg_rank=21) at rdma_iba_init.c:153
>> #10 0x000000000054d148 in MPIDI_CH3_Init (has_parent=0, pg=0x411f618, pg_rank=21) at ch3_init.c:161
>> #11 0x00000000004d9cce in MPID_Init (argc=0x0, argv=0x0, requested=0, provided=0x7feffba78, has_args=0x7feffba80, has_env=0x7feffba7c) at mpid_init.c:189
>> #12 0x0000000000435780 in MPIR_Init_thread (argc=0x0, argv=0x0, required=0, provided=0x0) at initthread.c:305
>> #13 0x0000000000434582 in PMPI_Init (argc=0x0, argv=0x0) at init.c:135
>> #14 0x0000000000410e0f in pmpi_init_ (ierr=0x7feffe774) at initf.c:129
>> #15 0x000000000040bdbf in gcrm_test_io () at gcrm_test_io.f90:27
>> #16 0x000000000040bcdc in main ()
>>
>> Valgrind-3.5.0 gives the following
>>
>> ==21574== Conditional jump or move depends on uninitialised value(s)
>> ==21574==    at 0x41182C: mem2chunk_check (hooks.c:165)
>> ==21574==    by 0x4120C3: free_check (hooks.c:268)
>> ==21574==    by 0x414809: free (mvapich_malloc.c:3443)
>> ==21574==    by 0x4180CD: mvapich2_minit (mem_hooks.c:86)
>> ==21574==    by 0x5526A7: MPIDI_CH3I_RDMA_init (rdma_iba_init.c:153)
>> ==21574==    by 0x54D147: MPIDI_CH3_Init (ch3_init.c:161)
>> ==21574==    by 0x4D9CCD: MPID_Init (mpid_init.c:189)
>> ==21574==    by 0x43577F: MPIR_Init_thread (initthread.c:305)
>> ==21574==    by 0x434581: PMPI_Init (init.c:135)
>> ==21574==    by 0x410E0E: mpi_init_ (initf.c:129)
>> ==21574==    by 0x40BDBE: MAIN__ (gcrm_test_io.f90:27)
>> ==21574==    by 0x40BCDB: main (in /gpfsm/dhome/dkokron/play/mpi-io/gcrm_test_io.x)
>> ==21574==  Uninitialised value was created
>> ==21574==    at 0x536FC7A: brk (in /lib64/libc-2.4.so)
>> ==21574==    by 0x536FD41: sbrk (in /lib64/libc-2.4.so)
>> ==21574==    by 0x418251: mvapich2_sbrk (mem_hooks.c:148)
>> ==21574==    by 0x414058: sYSMALLOc (mvapich_malloc.c:2983)
>> ==21574==    by 0x41647E: _int_malloc (mvapich_malloc.c:4318)
>> ==21574==    by 0x411FE8: malloc_check (hooks.c:252)
>> ==21574==    by 0x414607: malloc (mvapich_malloc.c:3395)
>> ==21574==    by 0x4113AA: malloc_hook_ini (hooks.c:28)
>> ==21574==    by 0x414607: malloc (mvapich_malloc.c:3395)
>> ==21574==    by 0x57E382: for__get_vm (in /gpfsm/dhome/dkokron/play/mpi-io/gcrm_test_io.x)
>> ==21574==    by 0x5722B2: for_rtl_init_ (in /gpfsm/dhome/dkokron/play/mpi-io/gcrm_test_io.x)
>> ==21574==    by 0x40BCD6: main (in /gpfsm/dhome/dkokron/play/mpi-io/gcrm_test_io.x)
>>
>> I am using mvapich2-1.4-2010-05-25 configured as follows
>>
>> ./configure CC=icc CXX=icpc F77=ifort F90=ifort CFLAGS="-DRDMA_CM -fpic
>> -O0 -traceback -debug" CXXFLAGS="-DRDMA_CM -fpic -O0 -traceback -debug"
>> FFLAGS="-fpic -O0 -traceback -debug -nolib-inline -check bounds -check
>> uninit -fp-stack-check -ftrapuv" F90FLAGS="-fpic -O0 -traceback -debug
>> -nolib-inline -check bounds -check uninit -fp-stack-check -ftrapuv"
>> --prefix=/discover/nobackup/dkokron/mv2-1.4.1_debug
>> --enable-error-checking=all --enable-error-messages=all --enable-g=all
>> --enable-f77 --enable-f90 --enable-cxx --enable-mpe --enable-romio
>> --enable-threads=multiple --with-rdma=gen2
>>
>> on Linux
>> 2.6.16.60-0.42.5-smp
>>
>> and Intel compilers (v 11.0.083)
>>
>> Note that line number 86 in my mem_hooks.c is (I added some debug
>> prints)
>>
>>    free(ptr_calloc);
>> --->free(ptr_valloc);  <---
>>    free(ptr_memalign);
>>
>> --
>> Dan Kokron
>> Global Modeling and Assimilation Office
>> NASA Goddard Space Flight Center
>> Greenbelt, MD 20771
>> Daniel.S.Kokron at nasa.gov
>> Phone: (301) 614-5192
>> Fax:   (301) 614-5304
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>>
>
>
>
> --
> Sayantan Sur
>
> Research Scientist
> Department of Computer Science
> The Ohio State University.
>



-- 
Sayantan Sur

Research Scientist
Department of Computer Science
The Ohio State University.



More information about the mvapich-discuss mailing list