[mvapich-discuss] malloc_consolidate failure in mvapich2-2.1

Hari Subramoni subramoni.1 at osu.edu
Tue Sep 1 12:15:18 EDT 2015


Hello Sashi,

Can you please run the application after
setting MV2_USE_LAZY_MEM_UNREGISTER=0 and see if the error goes away?

Please refer to the following section of the userguide for more details
about this parameter.

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.1-userguide.html#x1-24000011.82

Best Regards,
Hari.


> ------------------------------
> *From:* mvapich-discuss-bounces at cse.ohio-state.edu on behalf of Sashi
> Balasingam [sashibala2 at yahoo.com]
> *Sent:* Tuesday, September 01, 2015 2:22 AM
> *To:* mvapich-discuss at cse.ohio-state.edu
> *Subject:* [mvapich-discuss] malloc_consolidate failure in mvapich2-2.1
>
> Hi All:
> We have a subsystem where the host communicates with the client system
> using Omni-ORB (...a CORBA implementation). The client system is running an
> mvapich2-2.1 application, with Corba server in it. During one of the calls
> from host to client, we are seeing a hang-up in that call execution.
>
> From gdb, we find the following stack trace, which seems to indicate a
> failure with the 'malloc_consolidate' method, used thru the mvapich library.
>
> The client system runs SuSe Linux 11, SP3, Infiniband HCA,
> mvapich2-2.1, and we are using gcc 4.8.3 for code build, which was used to
> compile all libraries (mvapich, Omni-Orb), and our mpi application.
>
> STACK
> (gdb) #0  0x00007fbc9ea15ca5 in malloc_consolidate () from
> /usr/mpi/gcc/mvapich2-2.1/lib/libmpi.so.12
>
> (gdb) #1  0x00007fbc9ea16fa0 in _int_malloc () from
> /usr/mpi/gcc/mvapich2-2.1/lib/libmpi.so.12
>
> (gdb) #2  0x00007fbc9ea17e0a in malloc () from
> /usr/mpi/gcc/mvapich2-2.1/lib/libmpi.so.12
>
> (gdb) #3  0x00007fbc9e41908d in operator new(unsigned long) () from
> /usr/lib64/libstdc++.so.6
>
> (gdb) #4  0x00007fbc9e419189 in operator new[](unsigned long) () from
> /usr/lib64/libstdc++.so.6
>
> (gdb) #5  0x00007fbc9fa580ea in
> omni::omniCodeSet::TCS_C_8bit::fastUnmarshalString(cdrStream&,
> omni::omniCodeSet::NCS_C*, unsigned int, unsigned int&, char*&) () from
> /root/m31/blazer/nIMC/Exec/ThirdParty/libomniORB4.so.2
>
> (gdb) #6  0x00007fbc9fa5839a in
> omni::omniCodeSet::NCS_C_8bit::unmarshalString(cdrStream&,
> omni::omniCodeSet::TCS_C*, unsigned int, char*&) () from
> /root/m31/blazer/nIMC/Exec/ThirdParty/libomniORB4.so.2
>
> (gdb) #7  0x00007fbc9efc30c0 in unmarshalString (bounded=0,
> this=0x1485298) at /usr/local/include/omniORB4/cdrStream.h:462
>
> (gdb) #8  _0RL_cd_e5de0fd0b13285ea_d0000000::unmarshalArguments
> (this=0x7fbc94535660, _n=...)
> (gdb)     at
> ../../../../IMCBridge/Src/IDLGen/nIMC/Linux/EverestInterface.omni.cpp:1101
>
>
> Any suggestion on potential cause or workarounds would be much appreciated.
>
> Thanks,
> Sashi
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/attachments/20150901/a2965ee3/attachment.html>


More information about the mvapich-discuss mailing list