[mvapich-discuss] Runtime warning - Error in initializing MVAPICH2 ptmalloc library

Nenad Vukicevic nenad at intrepid.com
Thu Mar 10 16:59:23 EST 2016


Jonathan, I tracked this down to the following problem in the code in
mvapich2_minit():

152 int mvapich2_minit()
153 {
...

168     ptr_malloc = malloc(PT_TEST_ALLOC_SIZE);
169     ptr_calloc = calloc(PT_TEST_ALLOC_SIZE, 1);
170     ptr_realloc = realloc(ptr_malloc, PT_TEST_ALLOC_SIZE);
171     ptr_valloc = valloc(PT_TEST_ALLOC_SIZE);
172     ptr_memalign = memalign(64, PT_TEST_ALLOC_SIZE);
173
174     memset(ptr_calloc, 0, PT_TEST_ALLOC_SIZE);
175     memset(ptr_realloc, 0, PT_TEST_ALLOC_SIZE);
176     memset(ptr_valloc, 0, PT_TEST_ALLOC_SIZE);
177     memset(ptr_memalign, 0, PT_TEST_ALLOC_SIZE);
178
179     free(ptr_calloc);
180     free(ptr_valloc);
181     free(ptr_memalign);

I added some debugging print statement that confirmed that only
'mvapich2_minfo.is_our_calloc' is set to 0.  After dissembling the
object file I can see that calls to 'calloc()' and memset() are
missing.  This is probably an optimization where compiler notices that
memory has not been used and removes the call.

Our gcc is 5.3.1 20151207.

On Wed, Feb 24, 2016 at 2:10 PM, Jonathan Perkins
<perkinjo at cse.ohio-state.edu> wrote:
> Thank you for providing this.  I'll see if my build matches up once I get my
> hands on a fedora environment.
>
> As far as RPM goes, I wasn't actually asking for the RPM file itself, I was
> asking for the name of the RPM.  We provide multiple RPMs (X, GDR, EA, etc.)
> and I wanted to be sure we were debugging the correct code/build.  Your
> output of mpiname is sufficient so you do not need to send any more info at
> this time.
>
> On Wed, Feb 24, 2016 at 4:40 PM Nenad Vukicevic <nenad at intrepid.com> wrote:
>>
>> I can provide you with RPM, but it is pretty much the same what you
>> provided for RHEL6.  I just rebuilt it on Fedora.
>>
>>
>>
>> On the other hand, the latest runs are all done from the locally built
>> tree.  Here is the output form mpiname:
>>
>>
>>
>> MVAPICH2 2.2b Mon Nov 12 20:00:00 EST 2015 ch3:mrail
>>
>>
>>
>> Compilation
>>
>> CC: gcc    -DNDEBUG -DNVALGRIND -g -O2
>>
>> CXX: g++   -DNDEBUG -DNVALGRIND -g -O2
>>
>> F77: gfortran -L/lib -L/lib   -g -O2
>>
>> FC: gfortran   -g -O2
>>
>>
>>
>> Configuration
>>
>> --prefix=/usr/local/mvapich-debug --enable-g=all
>> --enable-error-messages=all
>>
>>
>>
>> And ldd:
>>
>>
>>
>> [nenad at dev one-sided]$ ldd osu_acc_latency
>>
>>         linux-vdso.so.1 (0x00007ffff7ffd000)
>>
>>         libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007ffff7d69000)
>>
>>         libm.so.6 => /usr/lib64/libm.so.6 (0x00007ffff7a66000)
>>
>>         libmpi.so.12 => /usr/local/mvapich-debug/lib/libmpi.so.12
>> (0x00007ffff72f7000)
>>
>>         libc.so.6 => /usr/lib64/libc.so.6 (0x00007ffff6f36000)
>>
>>         /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
>>
>>         libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007ffff6d2a000)
>>
>>         libudev.so.1 => /usr/lib64/libudev.so.1 (0x00007ffff6d09000)
>>
>>         libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0
>> (0x00007ffff6aff000)
>>
>>         libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007ffff6795000)
>>
>>         libibmad.so.5 => /usr/lib64/libibmad.so.5 (0x00007ffff657b000)
>>
>>         librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007ffff6365000)
>>
>>         libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00007ffff615c000)
>>
>>         libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007ffff5f49000)
>>
>>         libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007ffff5d45000)
>>
>>         librt.so.1 => /usr/lib64/librt.so.1 (0x00007ffff5b3c000)
>>
>>         libgfortran.so.3 => /usr/lib64/libgfortran.so.3
>> (0x00007ffff5810000)
>>
>>         libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007ffff55f9000)
>>
>>         libquadmath.so.0 => /usr/lib64/libquadmath.so.0
>> (0x00007ffff53b9000)
>>
>>         libselinux.so.1 => /usr/lib64/libselinux.so.1 (0x00007ffff5196000)
>>
>>         libresolv.so.2 => /usr/lib64/libresolv.so.2 (0x00007ffff4f7a000)
>>
>>         libdw.so.1 => /usr/lib64/libdw.so.1 (0x00007ffff4d30000)
>>
>>         libcap.so.2 => /usr/lib64/libcap.so.2 (0x00007ffff4b2b000)
>>
>>         libz.so.1 => /usr/lib64/libz.so.1 (0x00007ffff4915000)
>>
>>         liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007ffff46ee000)
>>
>>         libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200
>> (0x00007ffff4488000)
>>
>>         libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00007ffff4267000)
>>
>>         libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007ffff3ff6000)
>>
>>         libelf.so.1 => /usr/lib64/libelf.so.1 (0x00007ffff3dde000)
>>
>>         libbz2.so.1 => /usr/lib64/libbz2.so.1 (0x00007ffff3bcd000)
>>
>>         libattr.so.1 => /usr/lib64/libattr.so.1 (0x00007ffff39c7000)
>>
>>
>>
>>
>>
>> From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
>> Sent: Wednesday, February 24, 2016 12:28 PM
>> To: Nenad Vukicevic <nenad at intrepid.com>
>> Cc: mvapich-discuss at cse.ohio-state.edu
>> Subject: Re:
>>
>>
>>
>> Thanks for trying this out.  I believe you said that this was with Fedora
>> 23 while using one of our RPMs.  Can you share with us the RPM used and the
>> output of mpiname -a?  Can you also send us the output of ldd
>> osu_acc_latency?  We'll try to reproduce this issue and think of a
>> workaround.
>>
>>
>>
>> On Wed, Feb 24, 2016 at 12:54 PM Nenad Vukicevic <nenad at intrepid.com>
>> wrote:
>>
>> I got the same result.
>>
>>
>>
>> [nenad at dev one-sided]$ mpirun -n 2 osu_acc_latency
>>
>> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
>> without InfiniBand registration cache support.
>>
>> # OSU MPI_Accumulate latency Test v5.2
>>
>> # Window creation: MPI_Win_allocate
>>
>> # Synchronization: MPI_Win_flush
>>
>> # Size          Latency (us)
>>
>> 0                       0.14
>>
>> 1                       3.50
>>
>> 2                       3.12
>>
>> 4                       3.01
>>
>> 8                       3.99
>>
>> 16                      4.02
>>
>> 32                      4.04
>>
>> 64                      4.14
>>
>> 128                     4.79
>>
>> 256                     5.08
>>
>> 512                     5.50
>>
>> 1024                    6.42
>>
>> 2048                    7.51
>>
>> 4096                    8.93
>>
>> 8192                   13.53
>>
>> 16384                  26.76
>>
>> 32768                  43.19
>>
>> 65536                 193.44
>>
>> 131072                261.60
>>
>> 262144                397.56
>>
>> 524288                670.97
>>
>> 1048576              1214.61
>>
>> 2097152              2692.48
>>
>> 4194304              5265.03
>>
>>
>>
>> On Wed, Feb 24, 2016 at 8:22 AM, Nenad Vukicevic <nenad at intrepid.com>
>> wrote:
>>
>> We are not doing anything special,  the warning appears on a simple hello
>> program. I have MVAPICh built from 2.2b with and without debugging, and also
>> with RPM created from the RHEL 6 spec.  Note that this is fairly new Fedora
>> (FC23).
>>
>>
>>
>> I'll try OMB for warnings.
>>
>>
>>
>>
>>
>> On Wed, Feb 24, 2016 at 7:28 AM, Jonathan Perkins
>> <perkinjo at cse.ohio-state.edu> wrote:
>>
>> Hello Nenad.  Can you tell us more about your application?  Specifically
>> if there are any libraries or special handling of memory allocation that may
>> be linked in.  This type of problem usually occurs if MVAPICH2 isn't able to
>> properly intercept the malloc and free calls.
>>
>> If you don't believe that your application is doing anything like this,
>> can you verify that you're able to run the OMB suite (such as osu_latency)
>> without this warning being emitted.
>>
>>
>>
>> On Wed, Feb 24, 2016 at 3:11 AM Nenad Vukicevic <nenad at intrepid.com>
>> wrote:
>>
>> X-MS-Exchange-CrossTenant-FromEntityHeade
>> --===============3181330637561286136==
>> Content-Type: multipart/alternative;
>> boundary="089e0112c5cce95adb052c794745"
>>
>> --089e0112c5cce95adb052c794745
>> Content-Type: text/plain; charset="UTF-8"
>>
>> We are running mvapich 2.2b on Fedora 23. We are getting the following
>> warning when running the code:
>>
>> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
>> without
>> InfiniBand registration cache support.
>>
>> I saw some previous discussions on the subject but none of the suggested
>> solutions worked (there was a patch on 2.1a(?), plus LD_PRELAOD of mpich
>> library, etc..).
>>
>> This error shows up on a simple hello test but only if we run on multiple
>> nodes.  I can run multiple threads on the same node without causing this
>> warning.
>>
>> Any idea what we can try?  I understand that there will be a slight
>> decrease in performance if we disable ptmalloc.
>>
>> --
>> Nenad
>>
>> --089e0112c5cce95adb052c794745
>> Content-Type: text/html; charset="UTF-8"
>> Content-Transfer-Encoding: quoted-printable
>>
>> <div dir=3D"ltr">We are running mvapich 2.2b on Fedora 23. We are getting
>> t=
>> he following warning when running the code:<div>
>>
>>
>>
>>
>>
>>
>>
>> <p class=3D""><span class=3D"">WARNING: Error in initializing MVAPICH2
>> ptma=
>> lloc library.Continuing without InfiniBand registration cache
>> support.</spa=
>> n></p><p class=3D"">I saw some previous discussions on the subject but
>> none=
>>  of the suggested solutions worked (there was a patch on 2.1a(?), plus
>> LD_P=
>> RELAOD of mpich library, etc..).</p><p class=3D"">This error shows up on a
>> =
>> simple hello test but only if we run on multiple nodes.=C2=A0 I can run
>> mul=
>> tiple threads on the same node without causing this warning.</p><p
>> class=3D=
>> "">Any idea what we can try?=C2=A0 I understand that there will be a
>> slight=
>>  decrease in performance if we disable ptmalloc.</p><div><br></div>--
>> <br><=
>> div class=3D"gmail_signature"><div dir=3D"ltr"><div><div
>> dir=3D"ltr">Nenad<=
>> /div></div></div></div>
>> </div></div>
>>
>> --089e0112c5cce95adb052c794745--
>>
>> --===============3181330637561286136==
>> Content-Type: text/plain; charset="us-ascii"
>> MIME-Version: 1.0
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>> --===============3181330637561286136==--
>>
>>
>>
>>
>>
>> --
>>
>> Nenad
>>
>>
>>
>>
>>
>> --
>>
>> Nenad



-- 
Nenad


More information about the mvapich-discuss mailing list