Hi,
I was wondering what can cause a participant creation failure with error:
[D0102|ENABLE]DDS_DomainParticipant_enableI:Automatic participant index failed to initialize. PLEASE VERIFY CONSISTENT TRANSPORT / DISCOVERY CONFIGURATION.
DDSDomainParticipant_impl::createI:ERROR: Failed to auto-enable entity
DomainParticipantFactory_impl::create_participant():!create failure creating participant
I had this issue when launching an application which creates a participant calling create_participant function. My application usually works fine, that's the first time I encountered this. I have not changed the QoS file since the last success.
After rebooting the computer, it went back to normal, the participant creation succeeded. I am using UDP transport by the way.
Do you have any idea of what could have happened? Perhaps a network issue might have caused this?
Thank you,
Lucie
Hi Lucie,
Firstly, I would like to know which version of RTI Connext you are using.
The error that you are reproducing is due to a problem in the network interface configuration.
Best,
Antonio
We have seen this using 5.3.0 lately on linux in a docker container.
The exact error:
We repeat our unit tests to find ones that are flaky. Our tests with Connext very rarely output this error, it is only occasionally when they are being repeated.
However, I understood from this page: https://community.rti.com/static/documentation/connext-dds/5.2.0/doc/manuals/connext_dds/html_files/RTI_ConnextDDS_CoreLibraries_ReleaseNotes/Content/ReleaseNotes/WhatsFixed/Transports/FailureEnablingDomainParticipant.htm that the error should not occur anymore. Can you give any insight into what causes it to occur (and only occasionally)?
Hi,
Sorry for the delay. To answer your question, we are using RTI Connext 5.2.
We use UDPv4 protocol, and set "localhost" to the <initial_peers> discovery configuration.
The problem occurs rarely, but it always happens after a reboot of the system. And to correct this we have to reboot one more time.
Do you know what kind of network issue can cause this?
Thanks,
Lucie
Hi Lucie and DHood,
The error that you are facing is due there is not interface up and running when the participant is created.
So, there are two workarounds:
1. There is always an interface up and running on your system
2. It is possible to set the <participant_id> QoS manually. You can find more information in the “8.5.9 WIRE_PROTOCOL QosPolicy (DDS Extension)” section of the User’s Manual.
Example:
Best,
Antonio
Thanks for the advice Antonio.
Unfortunately for our use case it is not feasible to set the participant ID manually, we need to resolve the root cause. I am from the ROS 2 project; we need to have automatically generated IDs to support dynamic communication graphs for users.
In contrast to Lucie, we do not encounter this issue after system reboot, we will only encounter this issue after, say, the 400th time we test a particular application. The first 400 times it works fine. As such, in our case we can be sure that the network interface is already up and running on the system.
Is there any more information you can provide about what specifically is causing this issue, namely if anything may have changed between 5.2.3 and 5.3.0 that would have introduced this behaviour? We have been using Connext for years without seeing this error message, it has only started appearing since upgrading to 5.3.0.
With thanks
Hi! Did you ever find any solution to this? We're facing the same issue, though in docker containers in our CI-system. And we're using Connext 6.0.0. I'm not able to reproduce it locally by shutting down the interfaces (eth0 and lo) in the container, which seems to contradict @ajimenez information that this should be related to that.
Take this with a grain of salt but this *might* be a result of too many orphaned shared memory segments. Linux doesn't have an automatic way to clean them up so you have to do it manually if dds applications don't close gracefully.
I've used this hacky command line:
ipcs -m |
cut
-d
' '
-f2 |
grep
'^[0-9]'
|
while
read
x;
do
sudo
ipcrm -m $x;
done
we see this pretty often too. Not so often that i can readily duplicate. But after a few hundred reboots, it'll happen at least once. nothing changes on the system. just seems like an RTI bug and doesn't have much to do with the network interfaces because the system can be fully up. it may be a like a race condition with the very first RTI process during the boot process or something. But once you are in this state, there is nothing we can figure out to do to resolve other than reboot. really annoying.
Hi! This was solved for us by updating to Connext 6.0.1. The specific fix is described under section 4.2.3 (Shared memory locator ignored if it matched a multicast address) here.
It does sound like you have a slightly different issue though.
Hope this helps!