we entercount a malloc malloc deadlock in rti linux sigal handler .
dds version: rti_connext_dds-6.1.0-evaluation
gdb bt as the follow, and i want to know why and which linux signal rit use?
Thread 9 (Thread 0x7f082cd63700 (LWP 3924)):
#0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007f086a67d372 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3417
#2 0x00007f086cd29300 in RTIOsapiHeap_reallocateMemoryInternal ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#3 0x00007f086cd6932c in RTIOsapiThread_logBacktrace ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#4 0x00007f086cd25efb in RTILog_generatePrintFormatString ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#5 0x00007f086cd2667f in RTILogMessage_vprintWithParams ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#6 0x00007f086cd267ea in RTILogMessage_printWithParams ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#7 0x00007f086cd6b158 in RTIOsapiThread_onSigsegvHandler ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#8 <signal handler called>
#9 0x00007f086a677425 in _int_malloc (av=av@entry=0x7f081c000020, bytes=bytes@entry=48) at malloc.c:3622
in libnddscore we have a signal handle for SIGSEGV. The reason for this is that we print a backtrace when a segfault occurs (to aid with debugging).
It is important to note that we do not overwrite any signal handlers that are already installed. So if you are also installing a SIGSEGV handler, we will not overwrite it.
In order to debug your problem you could disable RTI's SEGV handler by calling RTIOsapiThread_disableBacktraceSupport(); in your code, removing our signal handler.
Sam
thanks Sam, our code exactly have the risk of Segment error
hi Sam, using malloc() in signal handler is the bug of rti?
Hi again,
Yes I have confirmed that this is a bug in our code. The internal reference is CORE-12794. Please use the workaround of disabling the printing of a backtrace that I mentioned above.
Thanks,
Sam
Hi Sam,
I have add the RTIOsapiThread_disableBacktraceSupport(); in our code , but still occur the malloc deadlock.
I only add one line to add RTIOsapiThread_disableBacktraceSupport; Is any i miss?
thanks again
Hi,
When are you calling RTIOsapiThread_disableBacktraceSupport?
Please call it after creating the DomainParticipant.
Let me know if that works,
Sam
hi:
1. my code as the blow, but it don't work:
2. Do I still need to register the SIGSEGV signal to block the use of RTI?
do this: signal(SIGSEGV, SIGSEGVHandle);
Hi,
When you say it doesn't work, what happens?
Did disabling backtrace support fix the deadlock?
Hi Sam:
what i mean is disabling the backtrace didn't fix the deadlock.
it still occur this deadlock(bt get from gdb attach debug):
Thread 9 (Thread 0x7efdf5ffb700 (LWP 11858)):
#0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007efdfe07b372 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3417
#2 0x00007efe00727300 in RTIOsapiHeap_reallocateMemoryInternal ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#3 0x00007efe00796236 in ADVLOGLogger_createMessageQueue ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#4 0x00007efe00795ee2 in ADVLOGLogger_assertMessageQueueLNOOP ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#5 0x00007efe00797274 in ADVLOGLogger_installedRtiLogMsgLNP ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#6 0x00007efe0072458a in RTILogMessage_vprintWithParams ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#7 0x00007efe007247ea in RTILogMessage_printWithParams ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#8 0x00007efe00769158 in RTIOsapiThread_onSigsegvHandler ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#9 <signal handler called>
#10 0x00007efdfe075425 in _int_malloc (av=av@entry=0x7efda4000020, bytes=bytes@entry=55) at malloc.c:3622
#11 0x00007efdfe0782ad in __GI___libc_malloc (bytes=55) at malloc.c:3075
#12 0x00007efdfe67d298 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x000055ee54b01ce0 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) ()
#14 0x000055ee54d5d235 in google::protobuf::internal::ArenaStringPtr::Set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::Arena*) ()
#15 0x000055ee54d5d716 in google::protobuf::internal::ArenaStringPtr::Set(google::protobuf::internal::ArenaStringPtr::EmptyDefault, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::Arena*) ()
#16 0x000055ee54c61354 in google::protobuf::FileDescriptorProto::_internal_set_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#17 0x000055ee54c612b7 in google::protobuf::FileDescriptorProto::set_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#18 0x000055ee54c3b931 in google::protobuf::FileDescriptor::CopyTo(google::protobuf::FileDescriptorProto*) const ()
#19 0x000055ee54c47447 in google::protobuf::ExistingFileMatchesProto(google::protobuf::FileDescriptor const*, google::protobuf::FileDescriptorProto const&) ()
#20 0x000055ee54c475e8 in google::protobuf::DescriptorBuilder::BuildFile(google::protobuf::FileDescriptorProto const&) ()
#21 0x000055ee54c43fac in google::protobuf::DescriptorPool::BuildFileCollectingErrors(google::protobuf::FileDescriptorProto const&, google::protobuf::DescriptorPool::ErrorCollector*) ()
#22 0x000055ee54af3e34 in apollo::cyber::message::ProtobufFactory::RegisterMessage(google::protobuf::FileDescriptorProto const&) ()
#23 0x000055ee54af3c75 in apollo::cyber::message::ProtobufFactory::RegisterMessage(apollo::cyber::proto::ProtoDesc const&) ()
#24 0x000055ee54af3dab in apollo::cyber::message::ProtobufFactory::RegisterMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#25 0x000055ee54ad369f in apollo::cyber::service_discovery::ChannelManager::DisposeJoin(apollo::cyber::proto::ChangeMsg const&) ()
#26 0x000055ee54ad312f in apollo::cyber::service_discovery::ChannelManager::Dispose(apollo::cyber::proto::ChangeMsg const&) ()
#27 0x000055ee54adf88c in apollo::cyber::service_discovery::Manager::OnRemoteChangeProcess() ()
Ok. I see in your backtrace that we are still ending up in the signal handler (frame #8 RTIOsapiThread_onSigsegvHandler).
As another option, please install another signal handler for SIGSEGV over the top of ours.
Thanks Sam, we test ok by install SIGSEGV signal handler.
Hi Sam,
We also need to know, disabling the RTI signal handler for SIGSEGV will bring some prblem? for exmple it will influence some clean up work of RTI ? or others?
further more,what should i do in the SIGSEGV signal handler?
Best Regards!