[mvapich-discuss] Runtime warning - Error in initializing MVAPICH2 ptmalloc library
Nenad Vukicevic
nenad at intrepid.com
Thu Mar 10 16:59:23 EST 2016
Jonathan, I tracked this down to the following problem in the code in
mvapich2_minit():
152 int mvapich2_minit()
153 {
...
168 ptr_malloc = malloc(PT_TEST_ALLOC_SIZE);
169 ptr_calloc = calloc(PT_TEST_ALLOC_SIZE, 1);
170 ptr_realloc = realloc(ptr_malloc, PT_TEST_ALLOC_SIZE);
171 ptr_valloc = valloc(PT_TEST_ALLOC_SIZE);
172 ptr_memalign = memalign(64, PT_TEST_ALLOC_SIZE);
173
174 memset(ptr_calloc, 0, PT_TEST_ALLOC_SIZE);
175 memset(ptr_realloc, 0, PT_TEST_ALLOC_SIZE);
176 memset(ptr_valloc, 0, PT_TEST_ALLOC_SIZE);
177 memset(ptr_memalign, 0, PT_TEST_ALLOC_SIZE);
178
179 free(ptr_calloc);
180 free(ptr_valloc);
181 free(ptr_memalign);
I added some debugging print statement that confirmed that only
'mvapich2_minfo.is_our_calloc' is set to 0. After dissembling the
object file I can see that calls to 'calloc()' and memset() are
missing. This is probably an optimization where compiler notices that
memory has not been used and removes the call.
Our gcc is 5.3.1 20151207.
On Wed, Feb 24, 2016 at 2:10 PM, Jonathan Perkins
<perkinjo at cse.ohio-state.edu> wrote:
> Thank you for providing this. I'll see if my build matches up once I get my
> hands on a fedora environment.
>
> As far as RPM goes, I wasn't actually asking for the RPM file itself, I was
> asking for the name of the RPM. We provide multiple RPMs (X, GDR, EA, etc.)
> and I wanted to be sure we were debugging the correct code/build. Your
> output of mpiname is sufficient so you do not need to send any more info at
> this time.
>
> On Wed, Feb 24, 2016 at 4:40 PM Nenad Vukicevic <nenad at intrepid.com> wrote:
>>
>> I can provide you with RPM, but it is pretty much the same what you
>> provided for RHEL6. I just rebuilt it on Fedora.
>>
>>
>>
>> On the other hand, the latest runs are all done from the locally built
>> tree. Here is the output form mpiname:
>>
>>
>>
>> MVAPICH2 2.2b Mon Nov 12 20:00:00 EST 2015 ch3:mrail
>>
>>
>>
>> Compilation
>>
>> CC: gcc -DNDEBUG -DNVALGRIND -g -O2
>>
>> CXX: g++ -DNDEBUG -DNVALGRIND -g -O2
>>
>> F77: gfortran -L/lib -L/lib -g -O2
>>
>> FC: gfortran -g -O2
>>
>>
>>
>> Configuration
>>
>> --prefix=/usr/local/mvapich-debug --enable-g=all
>> --enable-error-messages=all
>>
>>
>>
>> And ldd:
>>
>>
>>
>> [nenad at dev one-sided]$ ldd osu_acc_latency
>>
>> linux-vdso.so.1 (0x00007ffff7ffd000)
>>
>> libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007ffff7d69000)
>>
>> libm.so.6 => /usr/lib64/libm.so.6 (0x00007ffff7a66000)
>>
>> libmpi.so.12 => /usr/local/mvapich-debug/lib/libmpi.so.12
>> (0x00007ffff72f7000)
>>
>> libc.so.6 => /usr/lib64/libc.so.6 (0x00007ffff6f36000)
>>
>> /lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
>>
>> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007ffff6d2a000)
>>
>> libudev.so.1 => /usr/lib64/libudev.so.1 (0x00007ffff6d09000)
>>
>> libpciaccess.so.0 => /usr/lib64/libpciaccess.so.0
>> (0x00007ffff6aff000)
>>
>> libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007ffff6795000)
>>
>> libibmad.so.5 => /usr/lib64/libibmad.so.5 (0x00007ffff657b000)
>>
>> librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x00007ffff6365000)
>>
>> libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00007ffff615c000)
>>
>> libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00007ffff5f49000)
>>
>> libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007ffff5d45000)
>>
>> librt.so.1 => /usr/lib64/librt.so.1 (0x00007ffff5b3c000)
>>
>> libgfortran.so.3 => /usr/lib64/libgfortran.so.3
>> (0x00007ffff5810000)
>>
>> libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007ffff55f9000)
>>
>> libquadmath.so.0 => /usr/lib64/libquadmath.so.0
>> (0x00007ffff53b9000)
>>
>> libselinux.so.1 => /usr/lib64/libselinux.so.1 (0x00007ffff5196000)
>>
>> libresolv.so.2 => /usr/lib64/libresolv.so.2 (0x00007ffff4f7a000)
>>
>> libdw.so.1 => /usr/lib64/libdw.so.1 (0x00007ffff4d30000)
>>
>> libcap.so.2 => /usr/lib64/libcap.so.2 (0x00007ffff4b2b000)
>>
>> libz.so.1 => /usr/lib64/libz.so.1 (0x00007ffff4915000)
>>
>> liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007ffff46ee000)
>>
>> libnl-route-3.so.200 => /usr/lib64/libnl-route-3.so.200
>> (0x00007ffff4488000)
>>
>> libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00007ffff4267000)
>>
>> libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007ffff3ff6000)
>>
>> libelf.so.1 => /usr/lib64/libelf.so.1 (0x00007ffff3dde000)
>>
>> libbz2.so.1 => /usr/lib64/libbz2.so.1 (0x00007ffff3bcd000)
>>
>> libattr.so.1 => /usr/lib64/libattr.so.1 (0x00007ffff39c7000)
>>
>>
>>
>>
>>
>> From: Jonathan Perkins [mailto:perkinjo at cse.ohio-state.edu]
>> Sent: Wednesday, February 24, 2016 12:28 PM
>> To: Nenad Vukicevic <nenad at intrepid.com>
>> Cc: mvapich-discuss at cse.ohio-state.edu
>> Subject: Re:
>>
>>
>>
>> Thanks for trying this out. I believe you said that this was with Fedora
>> 23 while using one of our RPMs. Can you share with us the RPM used and the
>> output of mpiname -a? Can you also send us the output of ldd
>> osu_acc_latency? We'll try to reproduce this issue and think of a
>> workaround.
>>
>>
>>
>> On Wed, Feb 24, 2016 at 12:54 PM Nenad Vukicevic <nenad at intrepid.com>
>> wrote:
>>
>> I got the same result.
>>
>>
>>
>> [nenad at dev one-sided]$ mpirun -n 2 osu_acc_latency
>>
>> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
>> without InfiniBand registration cache support.
>>
>> # OSU MPI_Accumulate latency Test v5.2
>>
>> # Window creation: MPI_Win_allocate
>>
>> # Synchronization: MPI_Win_flush
>>
>> # Size Latency (us)
>>
>> 0 0.14
>>
>> 1 3.50
>>
>> 2 3.12
>>
>> 4 3.01
>>
>> 8 3.99
>>
>> 16 4.02
>>
>> 32 4.04
>>
>> 64 4.14
>>
>> 128 4.79
>>
>> 256 5.08
>>
>> 512 5.50
>>
>> 1024 6.42
>>
>> 2048 7.51
>>
>> 4096 8.93
>>
>> 8192 13.53
>>
>> 16384 26.76
>>
>> 32768 43.19
>>
>> 65536 193.44
>>
>> 131072 261.60
>>
>> 262144 397.56
>>
>> 524288 670.97
>>
>> 1048576 1214.61
>>
>> 2097152 2692.48
>>
>> 4194304 5265.03
>>
>>
>>
>> On Wed, Feb 24, 2016 at 8:22 AM, Nenad Vukicevic <nenad at intrepid.com>
>> wrote:
>>
>> We are not doing anything special, the warning appears on a simple hello
>> program. I have MVAPICh built from 2.2b with and without debugging, and also
>> with RPM created from the RHEL 6 spec. Note that this is fairly new Fedora
>> (FC23).
>>
>>
>>
>> I'll try OMB for warnings.
>>
>>
>>
>>
>>
>> On Wed, Feb 24, 2016 at 7:28 AM, Jonathan Perkins
>> <perkinjo at cse.ohio-state.edu> wrote:
>>
>> Hello Nenad. Can you tell us more about your application? Specifically
>> if there are any libraries or special handling of memory allocation that may
>> be linked in. This type of problem usually occurs if MVAPICH2 isn't able to
>> properly intercept the malloc and free calls.
>>
>> If you don't believe that your application is doing anything like this,
>> can you verify that you're able to run the OMB suite (such as osu_latency)
>> without this warning being emitted.
>>
>>
>>
>> On Wed, Feb 24, 2016 at 3:11 AM Nenad Vukicevic <nenad at intrepid.com>
>> wrote:
>>
>> X-MS-Exchange-CrossTenant-FromEntityHeade
>> --===============3181330637561286136==
>> Content-Type: multipart/alternative;
>> boundary="089e0112c5cce95adb052c794745"
>>
>> --089e0112c5cce95adb052c794745
>> Content-Type: text/plain; charset="UTF-8"
>>
>> We are running mvapich 2.2b on Fedora 23. We are getting the following
>> warning when running the code:
>>
>> WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing
>> without
>> InfiniBand registration cache support.
>>
>> I saw some previous discussions on the subject but none of the suggested
>> solutions worked (there was a patch on 2.1a(?), plus LD_PRELAOD of mpich
>> library, etc..).
>>
>> This error shows up on a simple hello test but only if we run on multiple
>> nodes. I can run multiple threads on the same node without causing this
>> warning.
>>
>> Any idea what we can try? I understand that there will be a slight
>> decrease in performance if we disable ptmalloc.
>>
>> --
>> Nenad
>>
>> --089e0112c5cce95adb052c794745
>> Content-Type: text/html; charset="UTF-8"
>> Content-Transfer-Encoding: quoted-printable
>>
>> <div dir=3D"ltr">We are running mvapich 2.2b on Fedora 23. We are getting
>> t=
>> he following warning when running the code:<div>
>>
>>
>>
>>
>>
>>
>>
>> <p class=3D""><span class=3D"">WARNING: Error in initializing MVAPICH2
>> ptma=
>> lloc library.Continuing without InfiniBand registration cache
>> support.</spa=
>> n></p><p class=3D"">I saw some previous discussions on the subject but
>> none=
>> of the suggested solutions worked (there was a patch on 2.1a(?), plus
>> LD_P=
>> RELAOD of mpich library, etc..).</p><p class=3D"">This error shows up on a
>> =
>> simple hello test but only if we run on multiple nodes.=C2=A0 I can run
>> mul=
>> tiple threads on the same node without causing this warning.</p><p
>> class=3D=
>> "">Any idea what we can try?=C2=A0 I understand that there will be a
>> slight=
>> decrease in performance if we disable ptmalloc.</p><div><br></div>--
>> <br><=
>> div class=3D"gmail_signature"><div dir=3D"ltr"><div><div
>> dir=3D"ltr">Nenad<=
>> /div></div></div></div>
>> </div></div>
>>
>> --089e0112c5cce95adb052c794745--
>>
>> --===============3181330637561286136==
>> Content-Type: text/plain; charset="us-ascii"
>> MIME-Version: 1.0
>> Content-Transfer-Encoding: 7bit
>> Content-Disposition: inline
>>
>> _______________________________________________
>> mvapich-discuss mailing list
>> mvapich-discuss at cse.ohio-state.edu
>> http://mailman.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>>
>> --===============3181330637561286136==--
>>
>>
>>
>>
>>
>> --
>>
>> Nenad
>>
>>
>>
>>
>>
>> --
>>
>> Nenad
--
Nenad
More information about the mvapich-discuss
mailing list