[mvapich-discuss] Runtime warning - Error in initializing MVAPICH2 ptmalloc library

Jonathan Perkins perkinjo at cse.ohio-state.edu
Thu Mar 10 17:18:14 EST 2016


Good find.  I have a guess about what is going on here.  Let me know if
this makes sense to you.

Could it be that the compiler is noticing that calloc sets the memory to 0
and thus optimizes out the memset.  Then after it optimizes out memset it
then optimizes out calloc since nothing else is done with that memory?

As a test, can you change the memset for the ptr_calloc to something that
is non-zero and see if this resolves the issue for you?

On Thu, Mar 10, 2016 at 4:59 PM Nenad Vukicevic <nenad at intrepid.com> wrote:

> Jonathan, I tracked this down to the following problem in the code in
> mvapich2_minit():
>
> 152 int mvapich2_minit()
> 153 {
> ...
>
> 168     ptr_malloc = malloc(PT_TEST_ALLOC_SIZE);
> 169     ptr_calloc = calloc(PT_TEST_ALLOC_SIZE, 1);
> 170     ptr_realloc = realloc(ptr_malloc, PT_TEST_ALLOC_SIZE);
> 171     ptr_valloc = valloc(PT_TEST_ALLOC_SIZE);
> 172     ptr_memalign = memalign(64, PT_TEST_ALLOC_SIZE);
> 173
> 174     memset(ptr_calloc, 0, PT_TEST_ALLOC_SIZE);
> 175     memset(ptr_realloc, 0, PT_TEST_ALLOC_SIZE);
> 176     memset(ptr_valloc, 0, PT_TEST_ALLOC_SIZE);
> 177     memset(ptr_memalign, 0, PT_TEST_ALLOC_SIZE);
> 178
> 179     free(ptr_calloc);
> 180     free(ptr_valloc);
> 181     free(ptr_memalign);
>
> I added some debugging print statement that confirmed that only
> 'mvapich2_minfo.is_our_calloc' is set to 0.  After dissembling the
> object file I can see that calls to 'calloc()' and memset() are
> missing.  This is probably an optimization where compiler notices that
> memory has not been used and removes the call.
>
> Our gcc is 5.3.1 20151207.
>
> On Wed, Feb 24, 2016 at 2:10 PM, Jonathan Perkins
> <perkinjo at cse.ohio-state.edu> wrote:
> > Thank you for providing this.  I'll see if my build matches up once I
> get my
> > hands on a fedora environment.
> >
> > As far as RPM goes, I wasn't actually asking for the RPM file itself, I
> was
> > asking for the name of the RPM.  We provide multiple RPMs (X, GDR, EA,
> etc.)
> > and I wanted to be sure we were debugging the correct code/build.  Your
> > output of mpiname is sufficient so you do not need to send any more info
> at
> > this time.
> >
> > On Wed, Feb 24, 2016 at 4:40 PM Nenad Vukicevic <nenad at intrepid.com>
> wrote:
> >>
> >> I can provide you with RPM, but it is pretty much the same what you
> >> provided for RHEL6.  I just rebuilt it on Fedora.
> >>
> >>
> >>
> >> On the other hand, the latest runs are all done from the locally built
> >> tree.  Here is the output form mpiname:
> >>
> >>
> >>
> >> MVAPICH2 2.2b Mon Nov 12 20:00:00 EST 2015 ch3:mrail
> >>
> >>
> >>
> >> Compilation
> >>
> >> CC: gcc    -DNDEBUG -DNVALGRIND -g -O2
> >>
> >> CXX: g++   -DNDEBUG -DNVALGRIND -g -O2
> >>
> >> F77: gfortran -L/lib -L/lib   -g -O2
> >>
> >> FC: gfortran   -g -O2
> >>
> >>
> >>
> >> Configuration
> >>
> >> --prefix=/usr/local/mvapich-debug --enable-g=all
> >> --enable-error-messages=all
> >>
> >>
> >>
> >> And ldd:
> >>
> >>
> >>
> >> [nenad at dev one-sided]$ ldd osu_acc_latency
> >>
> >>         linux-vdso.so.1 (0x00007ffff7ffd000)
> >>
> >>         libpthread.so.0 => /usr/lib64/libpthread.so.0
> (0x00007ffff7d69000)
> >>
> >>         libm.so.6 => /usr/lib64/libm.so.6 (0x00007ffff7a66000)
> >>
> >>         libmpi.so.12 => /usr/local/mvapich-debug/lib/libmpi.so.12
> >> (0x00007ffff72f7000)
> >>
> >>         libc.so.6 => /usr/lib64/libc.so.6 (0x00007ffff6f36000)
> >>
> >>         /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
> >>
> >>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007ffff6d2a000)
> >>
> >>         libudev.so.1 => /usr/lib64/libudev.so.1 (0x00007ffff6d09000)
> >>
> >>         libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0
> >> (0x00007ffff6aff000)
> >>
> >>         libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007ffff6795000)
> >>
> >>         libibmad.so.5 => /usr/lib64/libibmad.so.5 (0x00007ffff657b000)
> >>
> >>         librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007ffff6365000)
> >>
> >>         libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00007ffff615c000)
> >>
> >>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1
> (0x00007ffff5f49000)
> >>
> >>         libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007ffff5d45000)
> >>
> >>         librt.so.1 => /usr/lib64/librt.so.1 (0x00007ffff5b3c000)
> >>
> >>         libgfortran.so.3 => /usr/lib64/libgfortran.so.3
> >> (0x00007ffff5810000)
> >>
> >>         libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007ffff55f9000)
> >>
> >>         libquadmath.so.0 => /usr/lib64/libquadmath.so.0
> >> (0x00007ffff53b9000)
> >>
> >>         libselinux.so.1 => /usr/lib64/libselinux.so.1
> (0x00007ffff5196000)
> >>
> >>         libresolv.so.2 => /usr/lib64/libresolv.so.2 (0x00007ffff4f7a000)
> >>
> >>         libdw.so.1 => /usr/lib64/libdw.so.1 (0x00007ffff4d30000)
> >>
> >>         libcap.so.2 => /usr/lib64/libcap.so.2 (0x00007ffff4b2b000)
> >>
> >>         libz.so.1 => /usr/lib64/libz.so.1 (0x00007ffff4915000)
> >>
> >>         liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007ffff46ee000)
> >>
> >>         libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200
> >> (0x00007ffff4488000)
> >>
> >>         libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00007ffff4267000)
> >>
> >>         libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007ffff3ff6000)
> >>
> >>         libelf.so.1 => /usr/lib64/libelf.so.1 (0x00007ffff3dde000)
> >>
> >>         libbz2.so.1 => /usr/lib64/libbz2.so.1 (0x00007ffff3bcd000)
> >>
> >>         libattr.so.1 => /usr/lib64/libattr.so.1 (0x00007ffff39c7000)
> >>
> >>
> >>
> >>
> >>
> >> From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
> >> Sent: Wednesday, February 24, 2016 12:28 PM
> >> To: Nenad Vukicevic <nenad at intrepid.com>
> >> Cc: mvapich-discuss at cse.ohio-state.edu
> >> Subject: Re:
> >>
> >>
> >>
> >> Thanks for trying this out.  I believe you said that this was with
> Fedora
> >> 23 while using one of our RPMs.  Can you share with us the RPM used and
> the
> >> output of mpiname -a?  Can you also send us the output of ldd
> >> osu_acc_latency?  We'll try to reproduce this issue and think of a
> >> workaround.
> >>
> >>
> >>
> >> On Wed, Feb 24, 2016 at 12:54 PM Nenad Vukicevic <nenad at intrepid.com>
> >> wrote:
> >>
> >> I got the same result.
> >>
> >>
> >>
> >> [nenad at dev one-sided]$ mpirun -n 2 osu_acc_latency
> >>
> >> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
> >> without InfiniBand registration cache support.
> >>
> >> # OSU MPI_Accumulate latency Test v5.2
> >>
> >> # Window creation: MPI_Win_allocate
> >>
> >> # Synchronization: MPI_Win_flush
> >>
> >> # Size          Latency (us)
> >>
> >> 0                       0.14
> >>
> >> 1                       3.50
> >>
> >> 2                       3.12
> >>
> >> 4                       3.01
> >>
> >> 8                       3.99
> >>
> >> 16                      4.02
> >>
> >> 32                      4.04
> >>
> >> 64                      4.14
> >>
> >> 128                     4.79
> >>
> >> 256                     5.08
> >>
> >> 512                     5.50
> >>
> >> 1024                    6.42
> >>
> >> 2048                    7.51
> >>
> >> 4096                    8.93
> >>
> >> 8192                   13.53
> >>
> >> 16384                  26.76
> >>
> >> 32768                  43.19
> >>
> >> 65536                 193.44
> >>
> >> 131072                261.60
> >>
> >> 262144                397.56
> >>
> >> 524288                670.97
> >>
> >> 1048576              1214.61
> >>
> >> 2097152              2692.48
> >>
> >> 4194304              5265.03
> >>
> >>
> >>
> >> On Wed, Feb 24, 2016 at 8:22 AM, Nenad Vukicevic <nenad at intrepid.com>
> >> wrote:
> >>
> >> We are not doing anything special,  the warning appears on a simple
> hello
> >> program. I have MVAPICh built from 2.2b with and without debugging, and
> also
> >> with RPM created from the RHEL 6 spec.  Note that this is fairly new
> Fedora
> >> (FC23).
> >>
> >>
> >>
> >> I'll try OMB for warnings.
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Feb 24, 2016 at 7:28 AM, Jonathan Perkins
> >> <perkinjo at cse.ohio-state.edu> wrote:
> >>
> >> Hello Nenad.  Can you tell us more about your application?  Specifically
> >> if there are any libraries or special handling of memory allocation
> that may
> >> be linked in.  This type of problem usually occurs if MVAPICH2 isn't
> able to
> >> properly intercept the malloc and free calls.
> >>
> >> If you don't believe that your application is doing anything like this,
> >> can you verify that you're able to run the OMB suite (such as
> osu_latency)
> >> without this warning being emitted.
> >>
> >>
> >>
> >> On Wed, Feb 24, 2016 at 3:11 AM Nenad Vukicevic <nenad at intrepid.com>
> >> wrote:
> >>
> >> X-MS-Exchange-CrossTenant-FromEntityHeade
> >> --===============3181330637561286136==
> >> Content-Type: multipart/alternative;
> >> boundary="089e0112c5cce95adb052c794745"
> >>
> >> --089e0112c5cce95adb052c794745
> >> Content-Type: text/plain; charset="UTF-8"
> >>
> >> We are running mvapich 2.2b on Fedora 23. We are getting the following
> >> warning when running the code:
> >>
> >> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
> >> without
> >> InfiniBand registration cache support.
> >>
> >> I saw some previous discussions on the subject but none of the suggested
> >> solutions worked (there was a patch on 2.1a(?), plus LD_PRELAOD of mpich
> >> library, etc..).
> >>
> >> This error shows up on a simple hello test but only if we run on
> multiple
> >> nodes.  I can run multiple threads on the same node without causing this
> >> warning.
> >>
> >> Any idea what we can try?  I understand that there will be a slight
> >> decrease in performance if we disable ptmalloc.
> >>
> >> --
> >> Nenad
> >>
> >> --089e0112c5cce95adb052c794745
> >> Content-Type: text/html; charset="UTF-8"
> >> Content-Transfer-Encoding: quoted-printable
> >>
> >> <div dir=3D"ltr">We are running mvapich 2.2b on Fedora 23. We are
> getting
> >> t=
> >> he following warning when running the code:<div>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> <p class=3D""><span class=3D"">WARNING: Error in initializing MVAPICH2
> >> ptma=
> >> lloc library.Continuing without InfiniBand registration cache
> >> support.</spa=
> >> n></p><p class=3D"">I saw some previous discussions on the subject but
> >> none=
> >>  of the suggested solutions worked (there was a patch on 2.1a(?), plus
> >> LD_P=
> >> RELAOD of mpich library, etc..).</p><p class=3D"">This error shows up
> on a
> >> =
> >> simple hello test but only if we run on multiple nodes.=C2=A0 I can run
> >> mul=
> >> tiple threads on the same node without causing this warning.</p><p
> >> class=3D=
> >> "">Any idea what we can try?=C2=A0 I understand that there will be a
> >> slight=
> >>  decrease in performance if we disable ptmalloc.</p><div><br></div>--
> >> <br><=
> >> div class=3D"gmail_signature"><div dir=3D"ltr"><div><div
> >> dir=3D"ltr">Nenad<=
> >> /div></div></div></div>
> >> </div></div>
> >>
> >> --089e0112c5cce95adb052c794745--
> >>
> >> --===============3181330637561286136==
> >> Content-Type: text/plain; charset="us-ascii"
> >> MIME-Version: 1.0
> >> Content-Transfer-Encoding: 7bit
> >> Content-Disposition: inline
> >>
> >> _______________________________________________
> >> mvapich-discuss mailing list
> >> mvapich-discuss at cse.ohio-state.edu
> >> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
> >>
> >> --===============3181330637561286136==--
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Nenad
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >> Nenad
>
>
>
> --
> Nenad
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20160310/76059720/attachment-0001.html>


More information about the mvapich-discuss mailing list