delete_participant() never returns

13 posts / 0 new
Last post
Offline
Last seen: 9 years 5 months ago
Joined: 06/07/2014
Posts: 7
delete_participant() never returns

Hey,

In the following Java snippet,

if (this.participant != null) {
  this.participant.delete_contained_entities();
  DomainParticipantFactory.get_instance().delete_participant(this.participant);
}

delete_participant() does not complete.  The code is running on JVM 1.8.11 and CentOS 6.5 (Linux 2.6.32) VM.

Any idea how to address this issue?

Thank you,

Offline
Last seen: 3 years 8 months ago
Joined: 01/15/2013
Posts: 94

Hi,

It's strange that the method is never completing. Two questions:

1) Have you tried running the example with a 1.7 (or previous) JVM? Do you get the same error?

2) Every DDS Entity (DomainParticipantFactory, DomainParticipant, Publisher, Subscriber) is a factory of their own. Recall that if there were any elements (DataWriters or DataWriters) not created by the Participant itself (e.g. via a Publisher or Subscriber), they also have to be deleted explicitly. Are you creating any explicit Publishers or Subscribers? Are they also calling Publisher(Subscriber).delete_contained_entities?

Thanks,

Juanlu

Offline
Last seen: 9 years 5 months ago
Joined: 06/07/2014
Posts: 7

Hey,

First off, I am using RTI Connext 5.1.0.

1) I haven't tried it recently with JVM 1.8.  However, in the past, I have observed similar behavior with JVM 1.7.  (I did not report it earlier as I figured it was a one-off anomaly.)  I cannot try JVM 1.7 now as that would require fixing JVM dependences.

2) The app creates topics and data writers/readers from the participant (without using publishers or subscribers).  So, the app only invokes DomainParticipant.delete_contained_entities() before invoking delete_participant().  I assume this is line with the "shutdown protocol" of RTI Connext.

Let me know if you need more info.

Cheers,

Offline
Last seen: 3 years 8 months ago
Joined: 01/15/2013
Posts: 94

Hi,

The error you're seeing is quite strange. At this point I would need more information.

Is your application code shareable so I can try to reproduce quickly on my environment? Are you creating more than one Domain Participant? Are you using RTI Connext DDS inside a bigger framework with other technologies involved? Are you modifying any QoS settings?

Thanks,

Juanlu

Offline
Last seen: 9 years 5 months ago
Joined: 06/07/2014
Posts: 7

Hey,

The system contains a small set of application and each application creates a domain participant with the same domain id.  I will try to construct and share a small example that exhibits this problem.  In the worst case, I'll share the source of the original application.  

Each application is built using Griffon framework.  A library is compiled against DDS and this library is used by each application.  

I am not providing a QoS configuration to the application.  So, I'm guessing it is picking up the default QoS settings.

Hope this helps,

sara's picture
Offline
Last seen: 1 year 11 months ago
Joined: 01/16/2013
Posts: 128

Hi,

I've seen this behavior when integrating DDS in the LabVIEW framework. In my case, LabVIEW was creating the threads and then loaning them to the DDS application. DDS needs a lot of threads internally, and deleting the participant releases many of them. However, since LabVIEW and not DDS was the father of the threads, I got blocked when deleting the participant. Note that I was using C++, so it may not be the same issue.

There's an easy workaround that worked for me. Sleep for a second before deleting the participant. This forces the context to switch. If this works for you, we can discuss "more elegant" solutions to your issue. 

Thanks,

Sara

Offline
Last seen: 9 years 5 months ago
Joined: 06/07/2014
Posts: 7

Hey Sara,

Since the apps have GUI, there is some amount of under-the-hood thread management.  However, I doubt if they interfere with DDS threading.  Nevertheless, I will try this workaround.

Thank you,

Offline
Last seen: 9 years 5 months ago
Joined: 06/07/2014
Posts: 7

Hey Sara,

Adding the delay did not work :(

sara's picture
Offline
Last seen: 1 year 11 months ago
Joined: 01/16/2013
Posts: 128

Hi,

:(

Well, send out a reproducer if you can, because as Juanlu said, this is a really weird error.

Thanks,

Sara

Offline
Last seen: 3 years 8 months ago
Joined: 01/15/2013
Posts: 94

Hi,

Are you seeing this kind of error when the participant deletion is stuck?

REDAWorker_enterExclusiveArea:worker U70f73cc0 deadlock risk: cannot enter 0x10182a180 of level 30 from level 50

Thanks,

Juanlu

Offline
Last seen: 9 years 5 months ago
Joined: 06/07/2014
Posts: 7

Hey Juan,

I don't see any such message on the console.  If there is a way to turn on detailed logging to check for this message, then I can try it out.

Cheers,

Offline
Last seen: 9 years 5 months ago
Joined: 06/07/2014
Posts: 7

Hey Sara and Juan,

I have attached the code for repro.  It is composed of two Groovy scripts -- pub.groovy and sub.groovy -- along with a thin DDS wrapper.  The classpath should include build/libs/ddslibrary-0.1.jar and libs/*.jar while executing the scripts.  As for the failure, it is intermittent, but more prevalent on subscriber's end.  Use Groovy 2.3.6. 

Hope this helps,

File Attachments: 
sara's picture
Offline
Last seen: 1 year 11 months ago
Joined: 01/16/2013
Posts: 128

Hi,

I haven't had time to give this a try, sorry about that. While I get some time to try your reproducer, here you have some ideas you may want to try:

  1. Does this work properly when run outside Groovy?  Try a simple helloWorld calling to the same functions as you are doing.
  2. Try disabling your firewall/antivirus. We have had customers having this kind of error when using certain firewalls/antivirus.
  3. Disable multicast communication in your QoS configuration. You will need to set up the discovery peers. You can find some documentation on doing it here.

Let me know if any of those works for you.

Thanks,

Sara

PS: If you are in a hurry and you have paid support, you may want to consider contacting them directly for a more timely answer :)