What does the "PRESParticipant_assertRemoteParticipant:!assert .. due to different ro area" error message mean?

Note: Applies to RTI Connext 4.x

This solution addresses the following exception: 

PRESParticipant_assertRemoteParticipant:!assert remote participant c0c80aa3 7e520001 due to different ro area
DISCDynamicParticipantSelfDetector_onDataAvailable:!assert remote participant

The likely cause of this message is that a new RTI Connext application is starting and is using the same GUID that has already been observed from an existing application, but with different declaration information. 

Each DomainParticipant has a Globally Unique Identifier (GUID) that is used to uniquely identify the application in its communications with other applications. Each object within the participant also has its own GUID, which consists of its application's GUID plus a unique identifier for that particular object.

The GUID is supposed to be unique; if two DomainParticipants have the same GUID, this will cause communications problems, including the message above. Generally, these problems will occur either if there are two applications that are using the same GUID simultaneously, or if a new application starts up with a GUID that was used by some previous (now dead) application, but the dead application's liveliness duration (as defined by the LIVELINESS QoS policy on the data readers and data writers) has not yet elapsed. 

You can use the GUID displayed in the message to determine which node, and perhaps which application, is causing the GUID problem. The displayed hexadecimal numbers consist of: 

  • 32 bits for the host ID. The host ID is currently based upon the IP address (although this might change in future versions). In the above case, a GUID of c0c80aa3 translates to IP address 192.200.10.163. This will allow you to determine the node having the GUID problem. 
  • 16 low bits for the process ID.
    • If the originating node is a Windows system, the relevant "process ID" will be the value of GetCurrentProcessId().
    • On a VxWorks system, it will be the task ID of a specific task.
    • On a Linux system, it will be the process ID that you can see as the output of the command ps -ef.
  • 8 bits for an internal counter. This counter allows an application to create multiple DomainParticipants in the same domain.
  • 8 bits containing a constant value (0x01).

Some likely causes for this error include:

  • A node in the system is rebooting with the same process ID (or with a different process ID that matches in the low 2 bytes), and the reboot occurs more quickly than the liveliness duration of the application(s) on that node. This could definitely happen on a VxWorks system; it is unlikely but still possible on a Windows system. The probability of this happening grows exponentially with the number of participants in different applications on the same host. If that number is n, the probability that at least two of them have the same 16 low bits of process ID is about 1-e^(-n*(n-1)/2^17)).
  • On a Windows system, there are two processes using RTI Connext that have the same low 2 bytes of process ID and are running simultaneously.
  • On a multi-core system, there are two threads in the same process that are being executed concurrently and both are creating a DomainParticipant.
  • We have also seen this error when using multiple DLLs that use RTI Connext. When all DLLs are linked statically with RTI Connext, you will end up with multiple Participant Factories. Each factory maintains its own object ID counter and uses it in the GUID. This could result in a scenario where two domain participants have different participant indexes but the same GUID. 

Some ways to correct this problem:

  • Try determining which node is causing the problem and which (if any) of the proposed causes mentioned above are causing the actual problems.
  • The most robust fix is to change the application code so that it sets the application portion of the GUID itself. You can do this by assigning a unique value to participantQos.wire_protocol.rtps_app_id before creating the DomainParticipant. Solution Why is my VxWorks application unable to discover other RTI Connext applications after a restart? provides an example of how to do this. The solution is applicable to other platforms beyond VxWorks.
  • On a Windows system, the application portion of the GUID needs to be unique between processes. If you are creating multiple DomainParticipants (in the same Windows process, or on the same VxWorks target), the value also needs to be unique between them. Perhaps most importantly, the value needs to be unique across reboots—any app_id value after a reboot should not match any app_id value on the same node before the reboot.
  • The most bulletproof approach involves using a counter in a file (on a Windows system, you may need to use file locking to ensure non-concurrent access) or in NVRAM (on a VxWorks system) to generate a part of your app_id.
  • If your problem turns out to be associated with the reboot of a node, you should likely change your DataReader's and DataWriter's Livelinesslease_duration to a non-infinite value. If your application is not already changing the Liveliness QoS policy, we recommend leaving the Liveliness kind set to DDS_AUTOMATIC_LIVELINESS_QOS, which causes RTI Connext to maintain the liveliness for as long as the DomainParticipant lives. (Of course, if your application already asserts liveliness manually, or you have a reason to want it that way, you can use one of the other settings.) 
  • If the problem is related to concurrent DomainParticipant creations in parallel threads that belong to the same process, serializing the creation of the DomainParticipants will resolve the problem. 
  • If the problem is related to having multiple DLLs statically linked to RTI Connext, you should link your DLLs dynamically.