[mvapich-discuss] Runtime warning - Error in initializing MVAPICH2 ptmalloc library

Nenad Vukicevic nenad at intrepid.com
Thu Mar 10 17:37:43 EST 2016


Jonathan, I quickly made all pointers file content static and that
resolved the issue.   Gary pointed out to the following gcc issues,
but they are all related to optimizing memset() functions (which are
indeed removed).

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67618
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68034
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69976
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70090



On Thu, Mar 10, 2016 at 2:18 PM, Jonathan Perkins
<perkinjo at cse.ohio-state.edu> wrote:
> Good find.  I have a guess about what is going on here.  Let me know if this
> makes sense to you.
>
> Could it be that the compiler is noticing that calloc sets the memory to 0
> and thus optimizes out the memset.  Then after it optimizes out memset it
> then optimizes out calloc since nothing else is done with that memory?
>
> As a test, can you change the memset for the ptr_calloc to something that is
> non-zero and see if this resolves the issue for you?
>
>
> On Thu, Mar 10, 2016 at 4:59 PM Nenad Vukicevic <nenad at intrepid.com> wrote:
>>
>> Jonathan, I tracked this down to the following problem in the code in
>> mvapich2_minit():
>>
>> 152 int mvapich2_minit()
>> 153 {
>> ...
>>
>> 168     ptr_malloc = malloc(PT_TEST_ALLOC_SIZE);
>> 169     ptr_calloc = calloc(PT_TEST_ALLOC_SIZE, 1);
>> 170     ptr_realloc = realloc(ptr_malloc, PT_TEST_ALLOC_SIZE);
>> 171     ptr_valloc = valloc(PT_TEST_ALLOC_SIZE);
>> 172     ptr_memalign = memalign(64, PT_TEST_ALLOC_SIZE);
>> 173
>> 174     memset(ptr_calloc, 0, PT_TEST_ALLOC_SIZE);
>> 175     memset(ptr_realloc, 0, PT_TEST_ALLOC_SIZE);
>> 176     memset(ptr_valloc, 0, PT_TEST_ALLOC_SIZE);
>> 177     memset(ptr_memalign, 0, PT_TEST_ALLOC_SIZE);
>> 178
>> 179     free(ptr_calloc);
>> 180     free(ptr_valloc);
>> 181     free(ptr_memalign);
>>
>> I added some debugging print statement that confirmed that only
>> 'mvapich2_minfo.is_our_calloc' is set to 0.  After dissembling the
>> object file I can see that calls to 'calloc()' and memset() are
>> missing.  This is probably an optimization where compiler notices that
>> memory has not been used and removes the call.
>>
>> Our gcc is 5.3.1 20151207.
>>
>> On Wed, Feb 24, 2016 at 2:10 PM, Jonathan Perkins
>> <perkinjo at cse.ohio-state.edu> wrote:
>> > Thank you for providing this.  I'll see if my build matches up once I
>> > get my
>> > hands on a fedora environment.
>> >
>> > As far as RPM goes, I wasn't actually asking for the RPM file itself, I
>> > was
>> > asking for the name of the RPM.  We provide multiple RPMs (X, GDR, EA,
>> > etc.)
>> > and I wanted to be sure we were debugging the correct code/build.  Your
>> > output of mpiname is sufficient so you do not need to send any more info
>> > at
>> > this time.
>> >
>> > On Wed, Feb 24, 2016 at 4:40 PM Nenad Vukicevic <nenad at intrepid.com>
>> > wrote:
>> >>
>> >> I can provide you with RPM, but it is pretty much the same what you
>> >> provided for RHEL6.  I just rebuilt it on Fedora.
>> >>
>> >>
>> >>
>> >> On the other hand, the latest runs are all done from the locally built
>> >> tree.  Here is the output form mpiname:
>> >>
>> >>
>> >>
>> >> MVAPICH2 2.2b Mon Nov 12 20:00:00 EST 2015 ch3:mrail
>> >>
>> >>
>> >>
>> >> Compilation
>> >>
>> >> CC: gcc    -DNDEBUG -DNVALGRIND -g -O2
>> >>
>> >> CXX: g++   -DNDEBUG -DNVALGRIND -g -O2
>> >>
>> >> F77: gfortran -L/lib -L/lib   -g -O2
>> >>
>> >> FC: gfortran   -g -O2
>> >>
>> >>
>> >>
>> >> Configuration
>> >>
>> >> --prefix=/usr/local/mvapich-debug --enable-g=all
>> >> --enable-error-messages=all
>> >>
>> >>
>> >>
>> >> And ldd:
>> >>
>> >>
>> >>
>> >> [nenad at dev one-sided]$ ldd osu_acc_latency
>> >>
>> >>         linux-vdso.so.1 (0x00007ffff7ffd000)
>> >>
>> >>         libpthread.so.0 => /usr/lib64/libpthread.so.0
>> >> (0x00007ffff7d69000)
>> >>
>> >>         libm.so.6 => /usr/lib64/libm.so.6 (0x00007ffff7a66000)
>> >>
>> >>         libmpi.so.12 => /usr/local/mvapich-debug/lib/libmpi.so.12
>> >> (0x00007ffff72f7000)
>> >>
>> >>         libc.so.6 => /usr/lib64/libc.so.6 (0x00007ffff6f36000)
>> >>
>> >>         /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
>> >>
>> >>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007ffff6d2a000)
>> >>
>> >>         libudev.so.1 => /usr/lib64/libudev.so.1 (0x00007ffff6d09000)
>> >>
>> >>         libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0
>> >> (0x00007ffff6aff000)
>> >>
>> >>         libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007ffff6795000)
>> >>
>> >>         libibmad.so.5 => /usr/lib64/libibmad.so.5 (0x00007ffff657b000)
>> >>
>> >>         librdmacm.so.1 => /usr/lib64/librdmacm.so.1
>> >> (0x00007ffff6365000)
>> >>
>> >>         libibumad.so.3 => /usr/lib64/libibumad.so.3
>> >> (0x00007ffff615c000)
>> >>
>> >>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1
>> >> (0x00007ffff5f49000)
>> >>
>> >>         libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007ffff5d45000)
>> >>
>> >>         librt.so.1 => /usr/lib64/librt.so.1 (0x00007ffff5b3c000)
>> >>
>> >>         libgfortran.so.3 => /usr/lib64/libgfortran.so.3
>> >> (0x00007ffff5810000)
>> >>
>> >>         libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007ffff55f9000)
>> >>
>> >>         libquadmath.so.0 => /usr/lib64/libquadmath.so.0
>> >> (0x00007ffff53b9000)
>> >>
>> >>         libselinux.so.1 => /usr/lib64/libselinux.so.1
>> >> (0x00007ffff5196000)
>> >>
>> >>         libresolv.so.2 => /usr/lib64/libresolv.so.2
>> >> (0x00007ffff4f7a000)
>> >>
>> >>         libdw.so.1 => /usr/lib64/libdw.so.1 (0x00007ffff4d30000)
>> >>
>> >>         libcap.so.2 => /usr/lib64/libcap.so.2 (0x00007ffff4b2b000)
>> >>
>> >>         libz.so.1 => /usr/lib64/libz.so.1 (0x00007ffff4915000)
>> >>
>> >>         liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007ffff46ee000)
>> >>
>> >>         libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200
>> >> (0x00007ffff4488000)
>> >>
>> >>         libnl-3.so.200 => /usr/lib64/libnl-3.so.200
>> >> (0x00007ffff4267000)
>> >>
>> >>         libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007ffff3ff6000)
>> >>
>> >>         libelf.so.1 => /usr/lib64/libelf.so.1 (0x00007ffff3dde000)
>> >>
>> >>         libbz2.so.1 => /usr/lib64/libbz2.so.1 (0x00007ffff3bcd000)
>> >>
>> >>         libattr.so.1 => /usr/lib64/libattr.so.1 (0x00007ffff39c7000)
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
>> >> Sent: Wednesday, February 24, 2016 12:28 PM
>> >> To: Nenad Vukicevic <nenad at intrepid.com>
>> >> Cc: mvapich-discuss at cse.ohio-state.edu
>> >> Subject: Re:
>> >>
>> >>
>> >>
>> >> Thanks for trying this out.  I believe you said that this was with
>> >> Fedora
>> >> 23 while using one of our RPMs.  Can you share with us the RPM used and
>> >> the
>> >> output of mpiname -a?  Can you also send us the output of ldd
>> >> osu_acc_latency?  We'll try to reproduce this issue and think of a
>> >> workaround.
>> >>
>> >>
>> >>
>> >> On Wed, Feb 24, 2016 at 12:54 PM Nenad Vukicevic <nenad at intrepid.com>
>> >> wrote:
>> >>
>> >> I got the same result.
>> >>
>> >>
>> >>
>> >> [nenad at dev one-sided]$ mpirun -n 2 osu_acc_latency
>> >>
>> >> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
>> >> without InfiniBand registration cache support.
>> >>
>> >> # OSU MPI_Accumulate latency Test v5.2
>> >>
>> >> # Window creation: MPI_Win_allocate
>> >>
>> >> # Synchronization: MPI_Win_flush
>> >>
>> >> # Size          Latency (us)
>> >>
>> >> 0                       0.14
>> >>
>> >> 1                       3.50
>> >>
>> >> 2                       3.12
>> >>
>> >> 4                       3.01
>> >>
>> >> 8                       3.99
>> >>
>> >> 16                      4.02
>> >>
>> >> 32                      4.04
>> >>
>> >> 64                      4.14
>> >>
>> >> 128                     4.79
>> >>
>> >> 256                     5.08
>> >>
>> >> 512                     5.50
>> >>
>> >> 1024                    6.42
>> >>
>> >> 2048                    7.51
>> >>
>> >> 4096                    8.93
>> >>
>> >> 8192                   13.53
>> >>
>> >> 16384                  26.76
>> >>
>> >> 32768                  43.19
>> >>
>> >> 65536                 193.44
>> >>
>> >> 131072                261.60
>> >>
>> >> 262144                397.56
>> >>
>> >> 524288                670.97
>> >>
>> >> 1048576              1214.61
>> >>
>> >> 2097152              2692.48
>> >>
>> >> 4194304              5265.03
>> >>
>> >>
>> >>
>> >> On Wed, Feb 24, 2016 at 8:22 AM, Nenad Vukicevic <nenad at intrepid.com>
>> >> wrote:
>> >>
>> >> We are not doing anything special,  the warning appears on a simple
>> >> hello
>> >> program. I have MVAPICh built from 2.2b with and without debugging, and
>> >> also
>> >> with RPM created from the RHEL 6 spec.  Note that this is fairly new
>> >> Fedora
>> >> (FC23).
>> >>
>> >>
>> >>
>> >> I'll try OMB for warnings.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Feb 24, 2016 at 7:28 AM, Jonathan Perkins
>> >> <perkinjo at cse.ohio-state.edu> wrote:
>> >>
>> >> Hello Nenad.  Can you tell us more about your application?
>> >> Specifically
>> >> if there are any libraries or special handling of memory allocation
>> >> that may
>> >> be linked in.  This type of problem usually occurs if MVAPICH2 isn't
>> >> able to
>> >> properly intercept the malloc and free calls.
>> >>
>> >> If you don't believe that your application is doing anything like this,
>> >> can you verify that you're able to run the OMB suite (such as
>> >> osu_latency)
>> >> without this warning being emitted.
>> >>
>> >>
>> >>
>> >> On Wed, Feb 24, 2016 at 3:11 AM Nenad Vukicevic <nenad at intrepid.com>
>> >> wrote:
>> >>
>> >> X-MS-Exchange-CrossTenant-FromEntityHeade
>> >> --===============3181330637561286136==
>> >> Content-Type: multipart/alternative;
>> >> boundary="089e0112c5cce95adb052c794745"
>> >>
>> >> --089e0112c5cce95adb052c794745
>> >> Content-Type: text/plain; charset="UTF-8"
>> >>
>> >> We are running mvapich 2.2b on Fedora 23. We are getting the following
>> >> warning when running the code:
>> >>
>> >> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
>> >> without
>> >> InfiniBand registration cache support.
>> >>
>> >> I saw some previous discussions on the subject but none of the
>> >> suggested
>> >> solutions worked (there was a patch on 2.1a(?), plus LD_PRELAOD of
>> >> mpich
>> >> library, etc..).
>> >>
>> >> This error shows up on a simple hello test but only if we run on
>> >> multiple
>> >> nodes.  I can run multiple threads on the same node without causing
>> >> this
>> >> warning.
>> >>
>> >> Any idea what we can try?  I understand that there will be a slight
>> >> decrease in performance if we disable ptmalloc.
>> >>
>> >> --
>> >> Nenad
>> >>
>> >> --089e0112c5cce95adb052c794745
>> >> Content-Type: text/html; charset="UTF-8"
>> >> Content-Transfer-Encoding: quoted-printable
>> >>
>> >> <div dir=3D"ltr">We are running mvapich 2.2b on Fedora 23. We are
>> >> getting
>> >> t=
>> >> he following warning when running the code:<div>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> <p class=3D""><span class=3D"">WARNING: Error in initializing MVAPICH2
>> >> ptma=
>> >> lloc library.Continuing without InfiniBand registration cache
>> >> support.</spa=
>> >> n></p><p class=3D"">I saw some previous discussions on the subject but
>> >> none=
>> >>  of the suggested solutions worked (there was a patch on 2.1a(?), plus
>> >> LD_P=
>> >> RELAOD of mpich library, etc..).</p><p class=3D"">This error shows up
>> >> on a
>> >> =
>> >> simple hello test but only if we run on multiple nodes.=C2=A0 I can run
>> >> mul=
>> >> tiple threads on the same node without causing this warning.</p><p
>> >> class=3D=
>> >> "">Any idea what we can try?=C2=A0 I understand that there will be a
>> >> slight=
>> >>  decrease in performance if we disable ptmalloc.</p><div><br></div>--
>> >> <br><=
>> >> div class=3D"gmail_signature"><div dir=3D"ltr"><div><div
>> >> dir=3D"ltr">Nenad<=
>> >> /div></div></div></div>
>> >> </div></div>
>> >>
>> >> --089e0112c5cce95adb052c794745--
>> >>
>> >> --===============3181330637561286136==
>> >> Content-Type: text/plain; charset="us-ascii"
>> >> MIME-Version: 1.0
>> >> Content-Transfer-Encoding: 7bit
>> >> Content-Disposition: inline
>> >>
>> >> _______________________________________________
>> >> mvapich-discuss mailing list
>> >> mvapich-discuss at cse.ohio-state.edu
>> >> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>> >>
>> >> --===============3181330637561286136==--
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Nenad
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Nenad
>>
>>
>>
>> --
>> Nenad



-- 
Nenad


More information about the mvapich-discuss mailing list