What does this Deadlock Error Message mean?

Note: Applies to RTI DDS 4.0x. 

The following applies to RTI DDS 4.0g (or earlier). This problem has been corrected in RTI DDS 4.0h and later. 

You may see a deadlock problem during application shutdown, with or without an error message. This deadlock is typically seen when the value for the max_gather_destinations field in the DomainParticipantResourceLimits QosPolicy was smaller than the total number of NICs from all the machines on the system (plus one, if shared memory was enabled). 

In some cases, your application may stop receiving user data or discovery data. In other cases, an RTI DDS call may fail to return. 
You may also see an error message such as: 

REDAWorker_enterExclusiveArea:worker G/69ec0001 deadlock risk: cannot enter 2227b0 of level 1 from level 3 

worker G/xxxxxrefers to the Event thread. The exception indicates that it is trying to enter an Exclusive Area (take a mutex) with the wrong level. In summary, our deadlock prevention mechanism is indicating a programming mistake inside the core of RTI DDS that could lead to a deadlock. 

Another possible symptom is that threads other than the event threads are stuck: 

  • no discovery happening, deleted entities are not purged
  • and/or user data messages are not being received
  • and/or and a DDS level API call will block forever 

Note: This problem has been resolved as of RTI DDS 4.0h. 

Possible workarounds if you are using RTI DDS 4.0g (or earlier)

Action

Set the value of participant_qos.resource_limits.max_gather_destinations to a number that is big enough to hold all the destinations of all the transports on all the nodes that exists in the network. For example, mammoth host will have 3 destinations (2 for the 2 ipv4 NICS and 1 for shared memory). 

Side effect

RTI DDS will be allocating max_gather_destinations * 16 bytes of memory internally according to the setting of the qos policy value. The default value of max_gather_destinations is 8. The same array is used by all the send calls, so you get hit only once by the increase in memory usage caused by the changed QoS. 

Note that you need to take into account all the possible destinations for a node when determining your QoS setting. One important thing to consider is that if a reader chooses a port different than the default port, or chooses a multicast address that is not used yet on that node, you are adding a possible destination for that node.

Programming Language: