[mvapich-discuss] Runtime warning - Error in initializing MVAPICH2 ptmalloc library

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Mar 10 17:59:07 EST 2016


Thanks for providing the bug pointers and debugging the issue.  We'll
update our initialization routine to workaround this *feature* for our next
release.

On Thu, Mar 10, 2016 at 5:38 PM Nenad Vukicevic <nenad at intrepid.com> wrote:

> Jonathan, I quickly made all pointers file content static and that
> resolved the issue.   Gary pointed out to the following gcc issues,
> but they are all related to optimizing memset() functions (which are
> indeed removed).
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67618
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68034
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69976
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70090
>
>
>
> On Thu, Mar 10, 2016 at 2:18 PM, Jonathan Perkins
> <perkinjo at cse.ohio-state.edu> wrote:
> > Good find.  I have a guess about what is going on here.  Let me know if
> this
> > makes sense to you.
> >
> > Could it be that the compiler is noticing that calloc sets the memory to
> 0
> > and thus optimizes out the memset.  Then after it optimizes out memset it
> > then optimizes out calloc since nothing else is done with that memory?
> >
> > As a test, can you change the memset for the ptr_calloc to something
> that is
> > non-zero and see if this resolves the issue for you?
> >
> >
> > On Thu, Mar 10, 2016 at 4:59 PM Nenad Vukicevic <nenad at intrepid.com>
> wrote:
> >>
> >> Jonathan, I tracked this down to the following problem in the code in
> >> mvapich2_minit():
> >>
> >> 152 int mvapich2_minit()
> >> 153 {
> >> ...
> >>
> >> 168     ptr_malloc = malloc(PT_TEST_ALLOC_SIZE);
> >> 169     ptr_calloc = calloc(PT_TEST_ALLOC_SIZE, 1);
> >> 170     ptr_realloc = realloc(ptr_malloc, PT_TEST_ALLOC_SIZE);
> >> 171     ptr_valloc = valloc(PT_TEST_ALLOC_SIZE);
> >> 172     ptr_memalign = memalign(64, PT_TEST_ALLOC_SIZE);
> >> 173
> >> 174     memset(ptr_calloc, 0, PT_TEST_ALLOC_SIZE);
> >> 175     memset(ptr_realloc, 0, PT_TEST_ALLOC_SIZE);
> >> 176     memset(ptr_valloc, 0, PT_TEST_ALLOC_SIZE);
> >> 177     memset(ptr_memalign, 0, PT_TEST_ALLOC_SIZE);
> >> 178
> >> 179     free(ptr_calloc);
> >> 180     free(ptr_valloc);
> >> 181     free(ptr_memalign);
> >>
> >> I added some debugging print statement that confirmed that only
> >> 'mvapich2_minfo.is_our_calloc' is set to 0.  After dissembling the
> >> object file I can see that calls to 'calloc()' and memset() are
> >> missing.  This is probably an optimization where compiler notices that
> >> memory has not been used and removes the call.
> >>
> >> Our gcc is 5.3.1 20151207.
> >>
> >> On Wed, Feb 24, 2016 at 2:10 PM, Jonathan Perkins
> >> <perkinjo at cse.ohio-state.edu> wrote:
> >> > Thank you for providing this.  I'll see if my build matches up once I
> >> > get my
> >> > hands on a fedora environment.
> >> >
> >> > As far as RPM goes, I wasn't actually asking for the RPM file itself,
> I
> >> > was
> >> > asking for the name of the RPM.  We provide multiple RPMs (X, GDR, EA,
> >> > etc.)
> >> > and I wanted to be sure we were debugging the correct code/build.
> Your
> >> > output of mpiname is sufficient so you do not need to send any more
> info
> >> > at
> >> > this time.
> >> >
> >> > On Wed, Feb 24, 2016 at 4:40 PM Nenad Vukicevic <nenad at intrepid.com>
> >> > wrote:
> >> >>
> >> >> I can provide you with RPM, but it is pretty much the same what you
> >> >> provided for RHEL6.  I just rebuilt it on Fedora.
> >> >>
> >> >>
> >> >>
> >> >> On the other hand, the latest runs are all done from the locally
> built
> >> >> tree.  Here is the output form mpiname:
> >> >>
> >> >>
> >> >>
> >> >> MVAPICH2 2.2b Mon Nov 12 20:00:00 EST 2015 ch3:mrail
> >> >>
> >> >>
> >> >>
> >> >> Compilation
> >> >>
> >> >> CC: gcc    -DNDEBUG -DNVALGRIND -g -O2
> >> >>
> >> >> CXX: g++   -DNDEBUG -DNVALGRIND -g -O2
> >> >>
> >> >> F77: gfortran -L/lib -L/lib   -g -O2
> >> >>
> >> >> FC: gfortran   -g -O2
> >> >>
> >> >>
> >> >>
> >> >> Configuration
> >> >>
> >> >> --prefix=/usr/local/mvapich-debug --enable-g=all
> >> >> --enable-error-messages=all
> >> >>
> >> >>
> >> >>
> >> >> And ldd:
> >> >>
> >> >>
> >> >>
> >> >> [nenad at dev one-sided]$ ldd osu_acc_latency
> >> >>
> >> >>         linux-vdso.so.1 (0x00007ffff7ffd000)
> >> >>
> >> >>         libpthread.so.0 => /usr/lib64/libpthread.so.0
> >> >> (0x00007ffff7d69000)
> >> >>
> >> >>         libm.so.6 => /usr/lib64/libm.so.6 (0x00007ffff7a66000)
> >> >>
> >> >>         libmpi.so.12 => /usr/local/mvapich-debug/lib/libmpi.so.12
> >> >> (0x00007ffff72f7000)
> >> >>
> >> >>         libc.so.6 => /usr/lib64/libc.so.6 (0x00007ffff6f36000)
> >> >>
> >> >>         /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
> >> >>
> >> >>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007ffff6d2a000)
> >> >>
> >> >>         libudev.so.1 => /usr/lib64/libudev.so.1 (0x00007ffff6d09000)
> >> >>
> >> >>         libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0
> >> >> (0x00007ffff6aff000)
> >> >>
> >> >>         libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007ffff6795000)
> >> >>
> >> >>         libibmad.so.5 => /usr/lib64/libibmad.so.5
> (0x00007ffff657b000)
> >> >>
> >> >>         librdmacm.so.1 => /usr/lib64/librdmacm.so.1
> >> >> (0x00007ffff6365000)
> >> >>
> >> >>         libibumad.so.3 => /usr/lib64/libibumad.so.3
> >> >> (0x00007ffff615c000)
> >> >>
> >> >>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1
> >> >> (0x00007ffff5f49000)
> >> >>
> >> >>         libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007ffff5d45000)
> >> >>
> >> >>         librt.so.1 => /usr/lib64/librt.so.1 (0x00007ffff5b3c000)
> >> >>
> >> >>         libgfortran.so.3 => /usr/lib64/libgfortran.so.3
> >> >> (0x00007ffff5810000)
> >> >>
> >> >>         libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1
> (0x00007ffff55f9000)
> >> >>
> >> >>         libquadmath.so.0 => /usr/lib64/libquadmath.so.0
> >> >> (0x00007ffff53b9000)
> >> >>
> >> >>         libselinux.so.1 => /usr/lib64/libselinux.so.1
> >> >> (0x00007ffff5196000)
> >> >>
> >> >>         libresolv.so.2 => /usr/lib64/libresolv.so.2
> >> >> (0x00007ffff4f7a000)
> >> >>
> >> >>         libdw.so.1 => /usr/lib64/libdw.so.1 (0x00007ffff4d30000)
> >> >>
> >> >>         libcap.so.2 => /usr/lib64/libcap.so.2 (0x00007ffff4b2b000)
> >> >>
> >> >>         libz.so.1 => /usr/lib64/libz.so.1 (0x00007ffff4915000)
> >> >>
> >> >>         liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007ffff46ee000)
> >> >>
> >> >>         libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200
> >> >> (0x00007ffff4488000)
> >> >>
> >> >>         libnl-3.so.200 => /usr/lib64/libnl-3.so.200
> >> >> (0x00007ffff4267000)
> >> >>
> >> >>         libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007ffff3ff6000)
> >> >>
> >> >>         libelf.so.1 => /usr/lib64/libelf.so.1 (0x00007ffff3dde000)
> >> >>
> >> >>         libbz2.so.1 => /usr/lib64/libbz2.so.1 (0x00007ffff3bcd000)
> >> >>
> >> >>         libattr.so.1 => /usr/lib64/libattr.so.1 (0x00007ffff39c7000)
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
> >> >> Sent: Wednesday, February 24, 2016 12:28 PM
> >> >> To: Nenad Vukicevic <nenad at intrepid.com>
> >> >> Cc: mvapich-discuss at cse.ohio-state.edu
> >> >> Subject: Re:
> >> >>
> >> >>
> >> >>
> >> >> Thanks for trying this out.  I believe you said that this was with
> >> >> Fedora
> >> >> 23 while using one of our RPMs.  Can you share with us the RPM used
> and
> >> >> the
> >> >> output of mpiname -a?  Can you also send us the output of ldd
> >> >> osu_acc_latency?  We'll try to reproduce this issue and think of a
> >> >> workaround.
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Feb 24, 2016 at 12:54 PM Nenad Vukicevic <nenad at intrepid.com
> >
> >> >> wrote:
> >> >>
> >> >> I got the same result.
> >> >>
> >> >>
> >> >>
> >> >> [nenad at dev one-sided]$ mpirun -n 2 osu_acc_latency
> >> >>
> >> >> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
> >> >> without InfiniBand registration cache support.
> >> >>
> >> >> # OSU MPI_Accumulate latency Test v5.2
> >> >>
> >> >> # Window creation: MPI_Win_allocate
> >> >>
> >> >> # Synchronization: MPI_Win_flush
> >> >>
> >> >> # Size          Latency (us)
> >> >>
> >> >> 0                       0.14
> >> >>
> >> >> 1                       3.50
> >> >>
> >> >> 2                       3.12
> >> >>
> >> >> 4                       3.01
> >> >>
> >> >> 8                       3.99
> >> >>
> >> >> 16                      4.02
> >> >>
> >> >> 32                      4.04
> >> >>
> >> >> 64                      4.14
> >> >>
> >> >> 128                     4.79
> >> >>
> >> >> 256                     5.08
> >> >>
> >> >> 512                     5.50
> >> >>
> >> >> 1024                    6.42
> >> >>
> >> >> 2048                    7.51
> >> >>
> >> >> 4096                    8.93
> >> >>
> >> >> 8192                   13.53
> >> >>
> >> >> 16384                  26.76
> >> >>
> >> >> 32768                  43.19
> >> >>
> >> >> 65536                 193.44
> >> >>
> >> >> 131072                261.60
> >> >>
> >> >> 262144                397.56
> >> >>
> >> >> 524288                670.97
> >> >>
> >> >> 1048576              1214.61
> >> >>
> >> >> 2097152              2692.48
> >> >>
> >> >> 4194304              5265.03
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Feb 24, 2016 at 8:22 AM, Nenad Vukicevic <nenad at intrepid.com
> >
> >> >> wrote:
> >> >>
> >> >> We are not doing anything special,  the warning appears on a simple
> >> >> hello
> >> >> program. I have MVAPICh built from 2.2b with and without debugging,
> and
> >> >> also
> >> >> with RPM created from the RHEL 6 spec.  Note that this is fairly new
> >> >> Fedora
> >> >> (FC23).
> >> >>
> >> >>
> >> >>
> >> >> I'll try OMB for warnings.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Feb 24, 2016 at 7:28 AM, Jonathan Perkins
> >> >> <perkinjo at cse.ohio-state.edu> wrote:
> >> >>
> >> >> Hello Nenad.  Can you tell us more about your application?
> >> >> Specifically
> >> >> if there are any libraries or special handling of memory allocation
> >> >> that may
> >> >> be linked in.  This type of problem usually occurs if MVAPICH2 isn't
> >> >> able to
> >> >> properly intercept the malloc and free calls.
> >> >>
> >> >> If you don't believe that your application is doing anything like
> this,
> >> >> can you verify that you're able to run the OMB suite (such as
> >> >> osu_latency)
> >> >> without this warning being emitted.
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Feb 24, 2016 at 3:11 AM Nenad Vukicevic <nenad at intrepid.com>
> >> >> wrote:
> >> >>
> >> >> X-MS-Exchange-CrossTenant-FromEntityHeade
> >> >> --===============3181330637561286136==
> >> >> Content-Type: multipart/alternative;
> >> >> boundary="089e0112c5cce95adb052c794745"
> >> >>
> >> >> --089e0112c5cce95adb052c794745
> >> >> Content-Type: text/plain; charset="UTF-8"
> >> >>
> >> >> We are running mvapich 2.2b on Fedora 23. We are getting the
> following
> >> >> warning when running the code:
> >> >>
> >> >> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
> >> >> without
> >> >> InfiniBand registration cache support.
> >> >>
> >> >> I saw some previous discussions on the subject but none of the
> >> >> suggested
> >> >> solutions worked (there was a patch on 2.1a(?), plus LD_PRELAOD of
> >> >> mpich
> >> >> library, etc..).
> >> >>
> >> >> This error shows up on a simple hello test but only if we run on
> >> >> multiple
> >> >> nodes.  I can run multiple threads on the same node without causing
> >> >> this
> >> >> warning.
> >> >>
> >> >> Any idea what we can try?  I understand that there will be a slight
> >> >> decrease in performance if we disable ptmalloc.
> >> >>
> >> >> --
> >> >> Nenad
> >> >>
> >> >> --089e0112c5cce95adb052c794745
> >> >> Content-Type: text/html; charset="UTF-8"
> >> >> Content-Transfer-Encoding: quoted-printable
> >> >>
> >> >> <div dir=3D"ltr">We are running mvapich 2.2b on Fedora 23. We are
> >> >> getting
> >> >> t=
> >> >> he following warning when running the code:<div>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> <p class=3D""><span class=3D"">WARNING: Error in initializing
> MVAPICH2
> >> >> ptma=
> >> >> lloc library.Continuing without InfiniBand registration cache
> >> >> support.</spa=
> >> >> n></p><p class=3D"">I saw some previous discussions on the subject
> but
> >> >> none=
> >> >>  of the suggested solutions worked (there was a patch on 2.1a(?),
> plus
> >> >> LD_P=
> >> >> RELAOD of mpich library, etc..).</p><p class=3D"">This error shows up
> >> >> on a
> >> >> =
> >> >> simple hello test but only if we run on multiple nodes.=C2=A0 I can
> run
> >> >> mul=
> >> >> tiple threads on the same node without causing this warning.</p><p
> >> >> class=3D=
> >> >> "">Any idea what we can try?=C2=A0 I understand that there will be a
> >> >> slight=
> >> >>  decrease in performance if we disable ptmalloc.</p><div><br></div>--
> >> >> <br><=
> >> >> div class=3D"gmail_signature"><div dir=3D"ltr"><div><div
> >> >> dir=3D"ltr">Nenad<=
> >> >> /div></div></div></div>
> >> >> </div></div>
> >> >>
> >> >> --089e0112c5cce95adb052c794745--
> >> >>
> >> >> --===============3181330637561286136==
> >> >> Content-Type: text/plain; charset="us-ascii"
> >> >> MIME-Version: 1.0
> >> >> Content-Transfer-Encoding: 7bit
> >> >> Content-Disposition: inline
> >> >>
> >> >> _______________________________________________
> >> >> mvapich-discuss mailing list
> >> >> mvapich-discuss at cse.ohio-state.edu
> >> >> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >> >>
> >> >> --===============3181330637561286136==--
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Nenad
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >>
> >> >> Nenad
> >>
> >>
> >>
> >> --
> >> Nenad
>
>
>
> --
> Nenad
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160310/d3b7fb2b/attachment-0001.html>


More information about the mvapich-discuss mailing list