3.6. Discovery

This section discusses the implementation of discovery plugins in RTI Connext Micro. For a general overview of discovery in RTI Connext Micro, see What is Discovery?.

Connext Micro discovery traffic is conducted through transports. Please see the Transports section for more information about registering and configuring transports.

3.6.1. What is Discovery?

Discovery is the behind-the-scenes way in which RTI Connext Micro objects (DomainParticipants, DataWriters, and DataReaders) on different nodes find out about each other. Each DomainParticipant maintains a database of information about all the active DataReaders and DataWriters that are in the same DDS domain. This database is what makes it possible for DataWriters and DataReaders to communicate. To create and refresh the database, each application follows a common discovery process.

This section describes the default discovery mechanism known as the Simple Discovery Protocol, which includes two phases: Simple Participant Discovery and Simple Endpoint Discovery.

The goal of these two phases is to build, for each DomainParticipant, a complete picture of all the entities that belong to the remote participants that are in its peers list. The peers list is the list of nodes with which a participant may communicate. It starts out the same as the initial_peers list that you configure in the DISCOVERY QosPolicy. If the accept_unknown_peers flag in that same QosPolicy is TRUE, then other nodes may also be added as they are discovered; if it is FALSE, then the peers list will match the initial_peers list, plus any peers added using the DomainParticipant’s add_peer() operation.

The following section discusses how Connext Micro objects on different nodes find out about each other using the default Simple Discovery Protocol (SDP). It describes the sequence of messages that are passed between Connext Micro on the sending and receiving sides.

The discovery process occurs automatically, so you do not have to implement any special code. For more information about advanced topics related to Discovery, please refer to the Discovery chapter in the RTI Connext DDS Core Libraries User’s Manual (available here if you have Internet access).

3.6.1.1. Simple Participant Discovery

This phase of the Simple Discovery Protocol is performed by the Simple Participant Discovery Protocol (SPDP).

During the Participant Discovery phase, DomainParticipants learn about each other. The DomainParticipant’s details are communicated to all other DomainParticipants in the same DDS domain by sending participant declaration messages, also known as participant DATA submessages. The details include the DomainParticipant’s unique identifying key (GUID or Globally Unique ID described below), transport locators (addresses and port numbers), and QoS. These messages are sent on a periodic basis using best-effort communication.

Participant DATAs are sent periodically to maintain the liveliness of the DomainParticipant. They are also used to communicate changes in the DomainParticipant’s QoS. Only changes to QosPolicies that are part of the DomainParticipant’s built-in data need to be propagated.

When receiving remote participant discovery information, RTI Connext Micro determines if the local participant matches the remote one. A ‘match’ between the local and remote participant occurs only if the local and remote participant have the same Domain ID and Domain Tag. This matching process occurs as soon as the local participant receives discovery information from the remote one. If there is no match, the discovery DATA is ignored, resulting in the remote participant (and all its associated entities) not being discovered.

When a DomainParticipant is deleted, a participant DATA (delete) submessage with the DomainParticipant’s identifying GUID is sent.

The GUID is a unique reference to an entity. It is composed of a GUID prefix and an Entity ID. By default, the GUID prefix is calculated from the IP address and the process ID. The entityID is set by Connext Micro (you may be able to change it in a future version).

Once a pair of remote participants have discovered each other, they can move on to the Endpoint Discovery phase, which is how DataWriters and DataReaders find each other.

3.6.1.2. Simple Endpoint Discovery

This phase of the Simple Discovery Protocol is performed by the Simple Endpoint Discovery Protocol (SEDP).

During the Endpoint Discovery phase, RTI Connext Micro matches DataWriters and DataReaders. Information (GUID, QoS, etc.) about your application’s DataReaders and DataWriters is exchanged by sending publication/subscription declarations in DATA messages that we will refer to as publication DATAs and subscription DATAs. The Endpoint Discovery phase uses reliable communication.

These declaration or DATA messages are exchanged until each DomainParticipant has a complete database of information about the participants in its peers list and their entities. Then the discovery process is complete and the system switches to a steady state. During steady state, participant DATAs are still sent periodically to maintain the liveliness status of participants. They may also be sent to communicate QoS changes or the deletion of a DomainParticipant.

When a remote DataWriter/DataReader is discovered, Connext Micro determines if the local application has a matching DataReader/DataWriter. A ‘match’ between the local and remote entities occurs only if the DataReader and DataWriter have the same Topic, same data type, and compatible QosPolicies. Furthermore, if the DomainParticipant has been set up to ignore certain DataWriters/DataReaders, those entities will not be considered during the matching process.

This ‘matching’ process occurs as soon as a remote entity is discovered, even if the entire database is not yet complete: that is, the application may still be discovering other remote entities.

A DataReader and DataWriter can only communicate with each other if each one’s application has hooked up its local entity with the matching remote entity. That is, both sides must agree to the connection.

Please refer to the section on Discovery Implementation in the RTI Connext DDS Core Libraries User’s Manual for more details about the discovery process (available here if you have Internet access).

3.6.2. Configuring Participant Discovery Peers

An RTI Connext Micro DomainParticipant must be able to send participant discovery announcement messages for other DomainParticipants to discover itself, and it must receive announcements from other DomainParticipants to discover them.

To do so, each DomainParticipant will send its discovery announcements to a set of locators known as its peer list, where a peer is the transport locator of one or more potential other DomainParticipants to discover.

3.6.2.1. The Peer Descriptor

A peer descriptor string of the initial_peers sequence defines the interface and address of the locator to which to send, as well as the indices of participants to which to send. The peer descriptor format is:

< > denotes optional
[ ] denotes range or discreet values, unless enclosed in ''
    which means a literal.

PEER_DESCRIPTOR = <INDEX@>TRANSPORT_NAME://<ADDRESS>

TRANSPORT_NAME = [a-zA-Z_][0-9a-zA-Z_]+

ADDRESS = 0 or more 8bit characters

INDEX = INTEGER | '[' INTEGER ']' | '[' INTEGER-INTEGER ']' | '[' -INTEGER ']'

INTEGER = DEC_INTEGER | HEX_INTEGER

DEC_INTEGER = [0-9]+

HEX_INTEGER = [0x|0X][0-9a-fA-F]+

The TRANSPORT_NAME prefix refers to a registered transport name, such as _udp or _shmem for Connext Micro’s builtin transports, or a custom name for a user-defined transport.

ADDRESS is a string that you must format as the transport requires; for example, Connext Micro’s builtin UDP transport requires an IPv4 address formatted as w.x.y.z, while the SHMEM transport omits the ADDRESS string entirely. For user-created transports, format the ADDRESS as needed.

INDEX sets the maximum number of DomainParticipants that will be contacted at an address for discovery. If INDEX is left blank, Connext Micro will use an implied index value of 4@. For example, _shmem:// will parse as 4@_shmem:// and will try to connect to the first 5 DomainParticipants on the same host via the SHMEM transport. 9@_udp://192.168.0.1 will use the _udp transport to try to connect to the first 10 DomainParticipants at the IP address 192.168.0.1.

Remember that every DomainParticipant has a participant index that is unique within a DDS domain. The participant index (also referred to as the participant ID), together with the DDS domain ID, is used to calculate the network port on which DataReaders of that participant will receive messages. Thus, by specifying the participant index, or a range of indices, for a peer locator, that locator becomes a port to which messages will be sent only if addressed to the entities of a particular DomainParticipant. Specifying indices restricts the number of participant announcements sent to a locator where other DomainParticipants exist and, thus, should be considered to minimize network bandwidth usage.

For example:

DDS_StringSeq_set_maximum(&dp_qos.discovery.initial_peers, 5);
DDS_StringSeq_set_length(&dp_qos.discovery.initial_peers, 5);

/* If the index is not specified, it defaults to 5, thus sending to
 * the first 6 participant IDs at IP address 192.168.1.1 using
 * the transport registered as _udp.
 */
*DDS_StringSeq_get_reference(&dp_qos.discovery.initial_peers, 0) =
     DDS_String_dup("_udp://192.168.1.1");

/* Only send participant annoucements to multicast address 239.255.0.1
 * using the transport registered as _udp. Note that for multicast
 * addresses the index is not relevant since it is a shared address.
 */
*DDS_StringSeq_get_reference(&dp_qos.discovery.initial_peers, 1) =
     DDS_String_dup("_udp://239.255.0.1");

/* Send annoucements to participant ID 1,2,3, and 4 on 10.10.30.101
 * using the transport registered as _udp.
 */
*DDS_StringSeq_get_reference(&dp_qos.discovery.initial_peers, 2) =
     DDS_String_dup("[1-4]@_udp://10.10.30.101");

/* Send annoucements to participant ID 2 on address 10.10.30.102
 * using the transport registered as _udp.
 */
*DDS_StringSeq_get_reference(&dp_qos.discovery.initial_peers, 3) =
     DDS_String_dup("[2]@_udp://10.10.30.102");

/* Send annoucements to participant ID 0-8 on address 10.10.30.102
 * using the transport registered as _udp.
 */
*DDS_StringSeq_get_reference(&dp_qos.discovery.initial_peers, 4) =
     DDS_String_dup("8@_udp://10.10.30.102");

3.6.3. Configuring Initial Peers and Adding Peers

DiscoveryQosPolicy_initial_peers is the list of peers a DomainParticipant sends its participant announcement messages, when it is enabled, as part of the discovery process.

DiscoveryQosPolicy_initial_peers is an empty sequence by default.

Peers can also be added to the list, before and after a DomainParticipant has been enabled, by using DomainParticipant_add_peer.

The DomainParticipant will start sending participant announcement messages to the new peer as soon as it is enabled.

3.6.4. Configuring Discovery Data Reception

In order to receive discovery and user data, it is necessary to configure the DomainParticipantQos.discovery.enabled_transports sequence. This is a sequence of transport addresses to listen for discovery data on, and is sent as part of the participant annoucements. Other DomainParticipants will send to these addresses.

The address format for configuring data reception uses the following format:

< > denotes optional
[ ] denotes range or discreet values, unless enclosed in ''
    which means a literal.

ENABLED_TRANSPORTS = TRANSPORT_NAME://<ADDRESS>

TRANSPORT_NAME = [a-zA-Z_][0-9a-zA-Z_]+

ADDRESS = 0 or more 8bit characters

The TRANSPORT_NAME prefix and ADDRESS are functionally identical to those in The Peer Descriptor.

For example, to receive on a single unicast address:

DDS_StringSeq_set_maximum(&DomainParticipantQos.discovery.enabled_transports, 1);
DDS_StringSeq_set_length(&DomainParticipantQos.discovery.enabled_transports, 1);

/* Receive on the unicast address 192.168.1.1 using the transport registered
 * as _udp.
 */
*DDS_StringSeq_get_reference(&DomainParticipantQos.discovery.enabled_transports, 0) =
                             DDS_String_dup("_udp://192.168.1.1");

To receive on all unicast addresses allowed by the transport:

DDS_StringSeq_set_maximum(&DomainParticipantQos.discovery.enabled_transports, 1);
DDS_StringSeq_set_length(&DomainParticipantQos.discovery.enabled_transports, 1);

/* Receive on all unicast addresses allowed by the transport registered
 * as _udp. This is not recommended if more than 4 network interfaces are
 * allowed as it is non-deterministic which interfaces will be used.
 */
*DDS_StringSeq_get_reference(&DomainParticipantQos.discovery.enabled_transports, 0) =
                             DDS_String_dup("_udp://");

To receive on one unicast address and one multicast address:

DDS_StringSeq_set_maximum(&DomainParticipantQos.discovery.enabled_transports, 2);
DDS_StringSeq_set_length(&DomainParticipantQos.discovery.enabled_transports, 2);

*DDS_StringSeq_get_reference(&DomainParticipantQos.discovery.enabled_transports, 0) =
                             DDS_String_dup("_udp://192.168.1.1");

*DDS_StringSeq_get_reference(&DomainParticipantQos.discovery.enabled_transports, 1) =
                             DDS_String_dup("_udp://239.255.0.1");

To receive on one multicast address:

DDS_StringSeq_set_maximum(&DomainParticipantQos.discovery.enabled_transports, 1);
DDS_StringSeq_set_length(&DomainParticipantQos.discovery.enabled_transports, 1);

*DDS_StringSeq_get_reference(&DomainParticipantQos.discovery.enabled_transports, 0) =
                             DDS_String_dup("_udp://239.255.0.1");

3.6.5. Configuring User Data Reception

In order to receive user data, you must configure the DomainParticipantQos.user_traffic.enabled_transports sequence. This is a sequence of default transport addresses to listen for user data on, unless a DataReader or DataWriter specifies its own address, and is sent as part of the participant annoucements. Other DomainParticipants will send to these addresses.

The address format for configuring data reception uses the following format:

< > denotes optional
[ ] denotes range or discreet values, unless enclosed in ''
    which means a literal.

ENABLED_TRANSPORTS = TRANSPORT_NAME://<ADDRESS>

TRANSPORT_NAME = [a-zA-Z_][0-9a-zA-Z_]+

ADDRESS = 0 or more 8bit characters

The TRANSPORT_NAME prefix and ADDRESS are functionally identical to those in The Peer Descriptor.

For example, to receive on a single unicast address:

DDS_StringSeq_set_maximum(&DomainParticipantQos.user_traffic.enabled_transports, 1);
DDS_StringSeq_set_length(&DomainParticipantQos.user_traffic.enabled_transports, 1);

/* Receive on the unicast address 192.168.1.1 using the transport registered
 * as _udp.
 */
*DDS_StringSeq_get_reference(&DomainParticipantQos.user_traffic.enabled_transports, 0) =
                             DDS_String_dup("_udp://192.168.1.1");

To receive on all unicast addresses allowed by the transport:

DDS_StringSeq_set_maximum(&DomainParticipantQos.user_traffic.enabled_transports, 1);
DDS_StringSeq_set_length(&DomainParticipantQos.user_traffic.enabled_transports, 1);

/* Receive on all unicast addresses allowed by the transport registered
 * as _udp. This is not recommended if more than 4 network interfaces are
 * allowed as it is non-deterministic which interfaces will be used.
 */
*DDS_StringSeq_get_reference(&DomainParticipantQos.user_traffic.enabled_transports, 0) =
                             DDS_String_dup("_udp://");

To receive on one unicast address and one multicast address:

DDS_StringSeq_set_maximum(&DomainParticipantQos.user_traffic.enabled_transports, 2);
DDS_StringSeq_set_length(&DomainParticipantQos.user_traffic.enabled_transports, 2);

*DDS_StringSeq_get_reference(&DomainParticipantQos.user_traffic.enabled_transports, 0) =
                             DDS_String_dup("_udp://192.168.1.1");

*DDS_StringSeq_get_reference(&DomainParticipantQos.user_traffic.enabled_transports, 1) =
                             DDS_String_dup("_udp://239.255.0.1");

Note

When both multicast and unicast is specified, the following rules are used:

  • New data is sent over multicast.

  • Retransmissions are sent over unicast.

To receive on one multicast address:

DDS_StringSeq_set_maximum(&DomainParticipantQos.user_traffic.enabled_transports, 1);
DDS_StringSeq_set_length(&DomainParticipantQos.user_traffic.enabled_transports, 1);

*DDS_StringSeq_get_reference(&DomainParticipantQos.user_traffic.enabled_transports, 0) =
                             DDS_String_dup("_udp://239.255.0.1");

3.6.6. Configuring User Data Reception per DataReader or DataWriter

A DataReader and DataWriter can specify their own addresses in the DataReaderQos.transport.enabled_transports and DataWriterQos.transport.enabled_transports policies. The address format is exactly the same as for DomainParticipantQos.user_traffic.enabled_transports, with the restriction that a DataWriter can only specify its own unicast addresses.

3.6.7. Discovery Plugins

When a DomainParticipant receives a participant discovery message from another DomainParticipant, it will engage in the process of exchanging information of user-created DataWriter and DataReader endpoints.

RTI Connext Micro provides two ways of determinig endpoint information of other DomainParticipants: Dynamic Discovery Plugin and Static Discovery Plugin.

3.6.7.1. Dynamic Discovery Plugin

Dynamic endpoint discovery uses builtin discovery DataWriters and DataReader to exchange messages about user created DataWriter and DataReaders. A DomainParticipant using dynamic participant, dynamic endpoint (DPDE) discovery will have a pair of builtin DataWriters for sending messages about its own user created DataWriters and DataReaders, and a pair of builtin DataReaders for receiving messages from other DomainParticipants about their user created DataWriters and DataReaders.

Given a DomainParticipant with a user DataWriter, receiving an endpoint discovery message for a user DataReader allows the DomainParticipant to get the type, topic, and QoS of the DataReader that determine whether the DataReader is a match. When a matching DataReader is discovered, the DataWriter will include that DataReader and its locators as destinations for its subsequent writes.

Note

RTI Connext uses the acronyms SPDP and SEDP to distinguish between the two phases of Simple Discovery: participant and endpoint phases (see Discovery in the Core Libraries User’s Manual). RTI Connext Micro uses the acronyms DPSE and DPDE to distinguish between the static and dynamic endpoint discovery plugins available in RTI Connext Micro. The DPSE plugin implements the SPDP protocol and DPDE implements the SPDP and SEDP protocols.

3.6.7.2. Static Discovery Plugin

Static endpoint discovery uses function calls to statically assert information about remote endpoints belonging to remote DomainParticipants. An application with a DomainParticipant using dynamic participant, static endpoint (DPSE) discovery has control over which endpoints belonging to particular remote DomainParticipants are discoverable.

Whereas dynamic endpoint-discovery can establish matches for all endpoint-discovery messages it receives, static endpoint-discovery establishes matches only for the endpoint that have been asserted programmatically.

With DPSE, a user needs to know a priori the configuration of the entities that will need to be discovered by its application. The user must know the names of all DomainParticipants within the DDS domain and the exact QoS of the remote DataWriters and DataReaders.

Note

RTI Connext uses the acronyms SPDP and SEDP to distinguish between the two phases of Simple Discovery: participant and endpoint phases (see Discovery in the Core Libraries User’s Manual). RTI Connext Micro uses the acronyms DPSE and DPDE to distinguish between the static and dynamic endpoint discovery plugins available in RTI Connext Micro. The DPSE plugin implements the SPDP protocol and DPDE implements the SPDP and SEDP protocol.

Please refer to the C API Reference and C++ API Reference for the following remote entity assertion APIs:

3.6.7.2.1. Remote Participant Assertion

Given a local DomainParticipant, static discovery requires first the names of remote DomainParticipants to be asserted, in order for endpoints on them to match. This is done by calling DPSE_RemoteParticipant_assert with the name of a remote DomainParticipant. The name must match the name contained in the participant discovery announcement produced by that DomainParticipant. This has to be done reciprocally between two DomainParticipants so that they may discover one another.

For example, a DomainParticipant has entity name “participant_1”, while another DomainParticipant has name “participant_2.” participant_1 should call DPSE_RemoteParticipant_assert(“participant_2”) in order to discover participant_2. Similarly, participant_2 must also assert participant_1 for discovery between the two to succeed.

/* participant_1 is asserting (remote) participant_2 */
retcode = DPSE_RemoteParticipant_assert(participant_1,
                                        "participant_2");
if (retcode != DDS_RETCODE_OK) {
    printf("participant_1 failed to assert participant_2\n");
    goto done;
}

3.6.7.2.2. Remote Publication and Subscription Assertion

Next, a DomainParticipant needs to assert the remote endpoints it wants to match that belong to an already asserted remote DomainParticipant. The endpoint assertion function is used, specifying an argument which contains all the QoS and configuration of the remote endpoint. Where DPDE gets remote endpoint QoS information from received endpoint-discovery messages, in DPSE, the remote endpoint’s QoS must be configured locally. With remote endpoints asserted, the DomainParticipant then waits until it receives a participant discovery announcement from an asserted remote DomainParticipant. Once received that, all endpoints that have been asserted for that remote DomainParticipant are considered discovered and ready to be matched with local endpoints.

Assume participant_1 contains a DataWriter, and participant_2 has a DataReader, both communicating on topic HelloWorld. participant_1 needs to assert the DataReader in participant_2 as a remote subscription. The remote subscription data passed to the operation must match exactly the QoS actually used by the remote DataReader:

/* Set participant_2's reader's QoS in remote subscription data  */
rem_subscription_data.key.value[DDS_BUILTIN_TOPIC_KEY_OBJECT_ID] = 200;
rem_subscription_data.topic_name = DDS_String_dup("Example HelloWorld");
rem_subscription_data.type_name = DDS_String_dup("HelloWorld");
rem_subscription_data.reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;

/* Assert reader as a remote subscription belonging to (remote) participant_2 */
retcode = DPSE_RemoteSubscription_assert(participant_1,
                                         "participant_2",
                                         &rem_subscription_data,
                                         HelloWorld_get_key_kind(HelloWorldTypePlugin_get(), NULL));
if (retcode != DDS_RETCODE_OK)
{
    printf("failed to assert remote subscription\n");
    goto done;
}

Reciprocally, participant_2 must assert participant_1’s DataWriter as a remote publication, also specifying matching QoS parameters:

/* Set participant_1's writer's QoS in remote publication data  */
rem_publication_data.key.value[DDS_BUILTIN_TOPIC_KEY_OBJECT_ID] = 100;
rem_publication_data.key.value.topic_name = DDS_String_dup("Example HelloWorld");
rem_publication_data.key.value.type_name = DDS_String_dup("HelloWorld");
rem_publication_data.key.value.reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;

/* Assert writer as a remote publication belonging to (remote) participant_1 */
retcode = DPSE_RemotePublication_assert(participant_2,
                                        "participant_1",
                                        &rem_publication_data,
                                        HelloWorld_get_key_kind(HelloWorldTypePlugin_get(), NULL));
if (retcode != DDS_RETCODE_OK)
{
    printf("failed to assert remote publication\n");
    goto done;
}

When participant_1 receives a participant discovery message from participant_2, it is aware of participant_2, based on its previous assertion, and it knows participant_2 has a matching DataReader, also based on the previous assertion of the remote endpoint. It therefore establishes a match between its DataWriter and participant_2’s DataReader. Likewise, participant_2 will match participant_1’s DataWriter with its local DataReader, upon receiving one of participant_1’s participant discovery messages.

Note, with DPSE, there is no runtime check of QoS consistency between DataWriters and DataReaders, because no endpoint discovery messages are exchanged. This makes it extremely important that users of DPSE ensure that the QoS set for a local DataWriter and DataReader is the same QoS being used by another DomainParticipant to assert it as a remote DataWriter or DataReader.

3.6.8. Asymmetric Matching and Lost Samples

The DDS discovery process is necessary to establish communication between a DataWriter and a DataReader. However, it is important to understand that DDS applications do not connect to each other; there is no handshake protocol to ensure that a DataReader is ready to receive data from a DataWriter. Thus, it is possible that a DataWriter matches a DataReader before the DataReader matches the DataWriter (and vice versa). For this reason, it is possible that data published by a DataWriter is not received by the DataReader, even on a local network.

The reason for this asymmetric behavior can be for any number of reasons, such as, but not limited to:

  • Network delays

  • Packets taking different paths through the network

  • Address resolution delays

  • OS scheduling

DDS offers some solutions to mitigate this problem, e.g., the DURABILITY QoS policy, but in other cases it may be necessary for applications to implement their own synchronization protocols.

3.6.9. DomainParticipant Discovery by Name

If a DomainParticipant is restarted after an ungraceful shutown, other DomainParticipants will not remove the DomainParticipant’s discovery information until its lease_duration has expired. Typically, you can reduce the lease_duration or increase the resource limits to avoid running out of resources to discover restarted DomainParticipants. However, this may not be a feasible solution, since it increases network load and memory usage.

To mitigate these issues, Connext Micro can be configured to discover DomainParticipants by name instead of by GUID. This speeds up the discovery process because the restarted DomainParticipant can immediately replace the old DomainParticipant without waiting for the old one’s lease_duration to expire.

To enable discovery by name for a DomainParticipant, do the following:

  1. Set a globally unique participant_name in the DomainParticipant’s DomainParticipantQos.

  2. Set the enable_participant_discovery_by_name property to DDS_BOOLEAN_TRUE in the DomainParticipant’s DISCOVERY QosPolicy.

The DomainParticipant can now be rediscovered smoothly by other remote DomainParticipants after a restart.

Attention

When using discovery by name, a DomainParticipant must be restarted with a different GUID from its previous GUID.

Attention

To avoid undefined behavior while using discovery by name, make sure that each DomainParticipant in a given domain has a unique participant_name.

3.6.10. Queueing Discovery Messages

During a system restart or partial restart, the discovery process also restarts as DomainParticipants come back up. During this period, it is possible that each DomainParticipant is in a different phase. For example:

  • Some DomainParticipants may have terminated gracefully, but have not started up again.

  • Some DomainParticipants may have been abruptly taken off-line; e.g., during a power cycle.

  • Some DomainParticipants may have already restarted.

  • Some DomainParticipants may not have restarted yet, or will not restart.

During this period it is possible for a DomainParticipant to temporarily exceed its resource limits. For example, a DomainParticipant that was not restarted may store discovery information for a DomainParticipant that was ungracefully shut down because its lease duration has not expired, and also discover the restarted DomainParticipant.

Connext Micro will normally discard a discovery message if it receives an endpoint discovery message and lacks sufficient resources to store the endpoint. Connext Micro will (by default) acknowledge the message as received, but discard the discovery information. This can lead to a mismatch in discovery states between DomainParticipants. However, as lease durations expire on restarted DomainParticipants, resources become available to discover new DomainParticipants.

To mitigate this temporary lack of resources, you can set the DomainParticipantQos.discovery.enable_endpoint_discovery_queue property can be set to DDS_BOOLEAN_TRUE. Endpoint discovery messages will then instead be queued for later processing when resources become available.

Attention

This new field assumes that there are sufficient resources available for discovery information and that the lack of resources is temporary (such as during a system restart). Setting this value to DDS_BOOLEAN_TRUE without sufficient resources may cause undefined behavior.

3.6.11. Restarting Discovery

DDS discovery can be configured so that no action is required from the user application once initial discovery is complete; this is the typical use case. However, there may be cases where the user would like to restart discovery due to some runtime event. Connext Micro includes two functions to restart DomainParticipant discovery:

  1. DDS_DomainParticipant_remove_discovered_participants()

  2. DDS_DomainParticipant_announce()

To illustrate when a user might benefit from such functionality, this document will explore a system that suspends and resumes CPUs as an example. Because this is a known use case for this feature, we refer to these functions as “suspend-resume” APIs.

For more information regarding the use of these APIs, go to How to restart discovery.

3.6.11.1. How CPU suspension affects DDS discovery

Some systems benefit from suspending CPUs when they enter a dormant state, usually to save power. However, applications running on the suspended CPU may be maintaining a state that is time-dependent, and also may not be aware that a suspend or resume operation occurred. This can create a divergence in the system view between applications on the recently-resumed CPU and applications on other CPUs that have been running without pause.

This situation can impact DDS applications because the mechanisms implementing DDS discovery are time-dependent. If a local DomainParticipant does not announce itself in a timely manner, remote DomainParticipants will assume that the local DomainParticipant is no longer present (or “alive”) in the domain.

The DDS standard specifies functionality to recover from lapses in communication; namely, the participant_liveliness_lease_duration and participant_liveliness_assert_period QoS policies. These can be configured to limit the amount of time before a DomainParticipant can be rediscovered after being registered as “dead” by remote peers.

However, you may want to tune these parameters for steady-state use cases and still have a way to bring the system back to a known state after suspending and resuming a CPU.

When a CPU is suspended and then later resumed, any DDS applications located on that CPU could face two discovery-related issues:

  1. The discovery database of the local DomainParticipant may be out-of-sync with the actual reality of remote DomainParticipants in the domain.

    While the CPU was suspended, some remote DomainParticipants may have shut down and others may have started up. But the newly-resumed local DomainParticipant still has a record of the remote DomainParticipants that were present at the time of CPU suspension. This local DomainParticipant may then waste time and network bandwidth by trying to communicate with remote peers that are no longer present.

  2. If the CPU was suspended for longer than the participant_liveliness_lease_duration of a local DomainParticipant on that CPU, the DomainParticipant in question would not have been able to announce itself to any remote DomainParticipants. In this case, from a remote peer’s perspective, the local DomainParticipant has disappeared from the domain.

    Once resumed, any local DDS application may continue publishing or subscribing to topics, but it would be unaware that all remote endpoints have unmatched due to the loss of DomainParticipant liveliness.

3.6.11.2. How to restart discovery

To restart discovery from the perspective of the local DomainParticipant, call the suspend-resume API functions in order:

  1. DDS_DomainParticipant_remove_discovered_participants()

    Calling this function on the local DomainParticipant removes all currently-discovered remote DomainParticipants from the discovery database. If a removed remote DomainParticipant contained endpoints matched with local endpoints, the local endpoints are unmatched from the remote DomainParticipant endpoints.

  2. DDS_DomainParticipant_announce()

    Calling this function on the local DomainParticipant causes a DomainParticipant announcement (DATA(p)) to be sent. The DATA(p) is sent to each peer in the initial_peers list and to all currently-discovered DomainParticipants.

These two functions can be used to restart the discovery process as shown below:

../_images/sr_API_use.png

Figure 3.4 Post-resume API usage

The APIs are typically used by calling them in the order described above, but they are independent; either API can be used without the other. If you call DDS_DomainParticipant_remove_discovered_participants() only, then the discovery database will be reset and participant announcements will be sent according to the the local participants QoS configuration. If instead DDS_DomainParticipant_announce() is called alone, a participant announcement will be sent immediately to the currently known peers; this is shown in the diagram below:

../_images/sr_API_use_alt.png

Figure 3.5 Post-resume announcement without database purge

3.6.11.3. Usage considerations

It is up to your application to determine how and when to use these APIs.

In the example above, when the CPU in question resumes, you need to have implemented a mechanism to inform a DDS application that it has recently been through a suspend-resume cycle. This could be a message sent over a UDP socket, a POSIX signal (such as SIGUSR1) that is sent and handled, or some hardware interrupt. At that point, the DDS application can call the suspend-resume APIs to help return the system to its pre-suspension discovery state.

3.6.12. Interoperating with Connext Professional Discovery

When trying to establish communication between an Connext Micro application that uses the Dynamic Participant Static Endpoint (DPSE) discovery module and an RTI product based on Connext Professional, every participant in the DDS system must be configured with a unique participant name. While the static discovery functionality provided by Connext Professional allows participants on different hosts to share the same name, Connext Micro requires every participant to have a different name to help keep the complexity of its implementation suitable for smaller targets.

Also, Connext DataWriters that are configured to send compressed data will not match with Connext Micro DataReaders, since Connext Micro does not support sending or receiving compressed data. See DATA_REPRESENTATION QosPolicy in the Core Libraries User’s Manual for more information on the Connext compression feature.