malloc deadlock in rti linux sigal handler

14 posts / 0 new
Last post
Offline
Last seen: 2 years 2 months ago
Joined: 06/09/2022
Posts: 15
malloc deadlock in rti linux sigal handler

we entercount a malloc malloc deadlock in rti linux sigal handler .

dds version: rti_connext_dds-6.1.0-evaluation

gdb bt as the follow, and i want to know why and which linux signal rit use?

Thread 9 (Thread 0x7f082cd63700 (LWP 3924)):
#0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007f086a67d372 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3417
#2 0x00007f086cd29300 in RTIOsapiHeap_reallocateMemoryInternal ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#3 0x00007f086cd6932c in RTIOsapiThread_logBacktrace ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#4 0x00007f086cd25efb in RTILog_generatePrintFormatString ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#5 0x00007f086cd2667f in RTILogMessage_vprintWithParams ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#6 0x00007f086cd267ea in RTILogMessage_printWithParams ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#7 0x00007f086cd6b158 in RTIOsapiThread_onSigsegvHandler ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#8 <signal handler called>
#9 0x00007f086a677425 in _int_malloc (av=av@entry=0x7f081c000020, bytes=bytes@entry=48) at malloc.c:3622

Organization:
Keywords:
Offline
Last seen: 1 month 3 weeks ago
Joined: 10/22/2018
Posts: 91

in libnddscore we have a signal handle for SIGSEGV. The reason for this is that we print a backtrace when a segfault occurs (to aid with debugging).

It is important to note that we do not overwrite any signal handlers that are already installed. So if you are also installing a SIGSEGV handler, we will not overwrite it.

In order to debug your problem you could disable RTI's SEGV handler by calling RTIOsapiThread_disableBacktraceSupport(); in your code, removing our signal handler.

Sam

Offline
Last seen: 2 years 2 months ago
Joined: 06/09/2022
Posts: 15

thanks Sam, our code exactly have the risk of Segment error

Offline
Last seen: 2 years 2 months ago
Joined: 06/09/2022
Posts: 15

hi Sam, using malloc() in signal handler is the bug of rti?

Offline
Last seen: 1 month 3 weeks ago
Joined: 10/22/2018
Posts: 91

Hi again,

Yes I have confirmed that this is a bug in our code. The internal reference is CORE-12794. Please use the workaround of disabling the printing of a backtrace that I mentioned above.

Thanks,

Sam

Offline
Last seen: 2 years 2 months ago
Joined: 06/09/2022
Posts: 15

Hi Sam,

I have add the  RTIOsapiThread_disableBacktraceSupport(); in our code , but still occur the malloc deadlock.

I only add one line to add RTIOsapiThread_disableBacktraceSupport; Is any i miss?

thanks again

Offline
Last seen: 1 month 3 weeks ago
Joined: 10/22/2018
Posts: 91

Hi,

When are you calling RTIOsapiThread_disableBacktraceSupport?
Please call it after creating the DomainParticipant.

Let me know if that works,

Sam

Offline
Last seen: 2 years 2 months ago
Joined: 06/09/2022
Posts: 15

hi:

1. my code as the blow, but it don't work:

domain_participant_ = std::make_unique<dds::domain::DomainParticipant>(
domain_id_, participant_qos);

publisher_ = std::make_unique<dds::pub::Publisher>(*domain_participant_);
subscriber_ = std::make_unique<dds::sub::Subscriber>(*domain_participant_);

+ RTIOsapiThread_disableBacktraceSupport(); 
 

2. Do I still need to register the SIGSEGV signal to block the use of RTI?

do this: signal(SIGSEGV, SIGSEGVHandle);

Offline
Last seen: 1 month 3 weeks ago
Joined: 10/22/2018
Posts: 91

Hi,

When you say it doesn't work, what happens?

Did disabling backtrace support fix the deadlock?

Offline
Last seen: 2 years 2 months ago
Joined: 06/09/2022
Posts: 15

Hi Sam:

what i mean is disabling the backtrace didn't fix the deadlock.

it still occur this deadlock(bt get from gdb attach debug):

 

Thread 9 (Thread 0x7efdf5ffb700 (LWP 11858)):
#0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007efdfe07b372 in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3417
#2 0x00007efe00727300 in RTIOsapiHeap_reallocateMemoryInternal ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#3 0x00007efe00796236 in ADVLOGLogger_createMessageQueue ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#4 0x00007efe00795ee2 in ADVLOGLogger_assertMessageQueueLNOOP ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#5 0x00007efe00797274 in ADVLOGLogger_installedRtiLogMsgLNP ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#6 0x00007efe0072458a in RTILogMessage_vprintWithParams ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#7 0x00007efe007247ea in RTILogMessage_printWithParams ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#8 0x00007efe00769158 in RTIOsapiThread_onSigsegvHandler ()
from /home/hong/.cache/bazel/_bazel_hong/e73c529f37cc9d389c4f694be1c86dd1/execroot/ados/bazel-out/k8-fastbuild/bin/ados/tools/channel_monitor/../../../_solib_unknown/_U@rti_Uconnext_Udds_U6_U1_U0_S_S_Cx64_Ugcc7_U3_U0_Ulibnddscore___Ulib_Sx64Linux4gcc7.3.0/libnddscore.so
#9 <signal handler called>
#10 0x00007efdfe075425 in _int_malloc (av=av@entry=0x7efda4000020, bytes=bytes@entry=55) at malloc.c:3622
#11 0x00007efdfe0782ad in __GI___libc_malloc (bytes=55) at malloc.c:3075
#12 0x00007efdfe67d298 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x000055ee54b01ce0 in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) ()
#14 0x000055ee54d5d235 in google::protobuf::internal::ArenaStringPtr::Set(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::Arena*) ()
#15 0x000055ee54d5d716 in google::protobuf::internal::ArenaStringPtr::Set(google::protobuf::internal::ArenaStringPtr::EmptyDefault, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, google::protobuf::Arena*) ()
#16 0x000055ee54c61354 in google::protobuf::FileDescriptorProto::_internal_set_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#17 0x000055ee54c612b7 in google::protobuf::FileDescriptorProto::set_name(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#18 0x000055ee54c3b931 in google::protobuf::FileDescriptor::CopyTo(google::protobuf::FileDescriptorProto*) const ()
#19 0x000055ee54c47447 in google::protobuf::ExistingFileMatchesProto(google::protobuf::FileDescriptor const*, google::protobuf::FileDescriptorProto const&) ()
#20 0x000055ee54c475e8 in google::protobuf::DescriptorBuilder::BuildFile(google::protobuf::FileDescriptorProto const&) ()
#21 0x000055ee54c43fac in google::protobuf::DescriptorPool::BuildFileCollectingErrors(google::protobuf::FileDescriptorProto const&, google::protobuf::DescriptorPool::ErrorCollector*) ()
#22 0x000055ee54af3e34 in apollo::cyber::message::ProtobufFactory::RegisterMessage(google::protobuf::FileDescriptorProto const&) ()
#23 0x000055ee54af3c75 in apollo::cyber::message::ProtobufFactory::RegisterMessage(apollo::cyber::proto::ProtoDesc const&) ()
#24 0x000055ee54af3dab in apollo::cyber::message::ProtobufFactory::RegisterMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#25 0x000055ee54ad369f in apollo::cyber::service_discovery::ChannelManager::DisposeJoin(apollo::cyber::proto::ChangeMsg const&) ()
#26 0x000055ee54ad312f in apollo::cyber::service_discovery::ChannelManager::Dispose(apollo::cyber::proto::ChangeMsg const&) ()
#27 0x000055ee54adf88c in apollo::cyber::service_discovery::Manager::OnRemoteChangeProcess() ()

Offline
Last seen: 1 month 3 weeks ago
Joined: 10/22/2018
Posts: 91

Ok. I see in your backtrace that we are still ending up in the signal handler (frame #8 RTIOsapiThread_onSigsegvHandler).

As another option, please install another signal handler for SIGSEGV over the top of ours.

Offline
Last seen: 2 years 2 months ago
Joined: 06/09/2022
Posts: 15

Thanks Sam, we test ok by  install SIGSEGV signal handler.

Offline
Last seen: 2 years 2 months ago
Joined: 06/09/2022
Posts: 15

Hi Sam,

We also need to know, disabling the RTI signal handler for SIGSEGV will bring some prblem? for exmple it will influence some clean up work of RTI ? or others?

further more,what should i do in the SIGSEGV signal handler?

Best Regards!

Offline
Last seen: 1 month 3 weeks ago
Joined: 10/22/2018
Posts: 91
In RTI's SIGSEGV handler we are not performing any cleanup (since you cannot recover from a SEGV). We are printing a backtrace to aid with debugging. There should be no problems with disabling the handler.
In your new SEGV handler you can do nothing (example in C showing a NO-OP handler):
void onSigsegvHandler(int sig) {
    return;
}
...
{
    struct sigaction newHandler;
    newHandler.sa_handler = &onSigsegvHandler;
    newHandler.sa_flags = 0;
    if (sigemptyset(&newHandler.sa_mask) != 0) {
        /* handle error */
    }
    if (sigaction(SIGSEGV, &newHandler, NULL) != 0) {
        /* handle error */
    }
}