RTI Connext
Core Libraries and Utilities
User’s Manual
Part 3 — Advanced Concepts
Chapters
Version 5.0
© 2012
All rights reserved.
Printed in U.S.A. First printing.
August 2012.
Trademarks
Copy and Use Restrictions
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form (including electronic, mechanical, photocopy, and facsimile) without the prior written permission of Real- Time Innovations, Inc. The software described in this document is furnished under and subject to the RTI software license agreement. The software may be used or copied only under the terms of the license agreement.
Note: In this section, "the Software" refers to
This product implements the DCPS layer of the Data Distribution Service (DDS) specification version 1.2 and the DDS Interoperability Wire Protocol specification version 2.1, both of which are owned by the Object Management, Inc. Copyright
Portions of this product were developed using ANTLR (www.ANTLR.org). This product includes software developed by the University of California, Berkeley and its contributors.
Portions of this product were developed using AspectJ, which is distributed per the CPL license. AspectJ source code may be obtained from Eclipse. This product includes software developed by the University of California, Berkeley and its contributors.
Portions of this product were developed using MD5 from Aladdin Enterprises.
Portions of this product include software derived from Fnmatch, (c) 1989, 1993, 1994 The Regents of the University of California. All rights reserved. The Regents and contributors provide this software "as is" without warranty.
Portions of this product were developed using EXPAT from Thai Open Source Software Center Ltd and Clark Cooper Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd and Clark Cooper Copyright (c) 2001, 2002 Expat maintainers. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
Technical Support
232 E. Java Drive
Sunnyvale, CA 94089
Phone: |
(408) |
Email: |
support@rti.com |
Website: |
Contents, Part 3
|
|||
|
|||
|
|||
10.3.4 Controlling Heartbeats and Retries with DataWriterProtocol QosPolicy |
|||
|
|||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
10.3.5 Avoiding Message Storms with DataReaderProtocol QosPolicy |
|||
10.3.6 Resending Samples to |
|||
|
|||
|
|||
|
|||
|
iii
iv
|
|||||
|
|
||||
|
|
|
|||
|
|
|
|||
|
|
||||
|
|
||||
|
|||||
|
|
||||
|
|
|
|||
|
|
|
14.3.1.2 Maintaining DataWriter Liveliness for kinds AUTOMATIC and |
|
|
|
|
|
|
||
|
|
||||
|
|
||||
|
|
||||
|
|||||
|
|||||
|
|
||||
|
|
||||
|
|
14.5.3 Automatic Selection of participant_id and Port Reservation |
|||
|
|
||||
|
|||||
|
|||||
|
|||||
|
|||||
|
Setting Builtin Transport Properties of the Default Transport Instance |
|
|||
|
|
||||
|
Setting Builtin Transport Properties with the PropertyQosPolicy |
||||
|
|
||||
|
|
15.6.2 Setting the Maximum |
|||
|
|
15.6.3 Formatting Rules for IPv6 ‘Allow’ and ‘Deny’ Address Lists |
|||
|
Installing Additional Builtin Transport Plugins with register_transport() |
||||
|
|
||||
|
|
||||
|
|
||||
|
Installing Additional Builtin Transport Plugins with PropertyQosPolicy |
||||
|
|||||
|
|
||||
|
|
||||
|
|
||||
|
|||||
|
|||||
|
|||||
|
|
v
|
|||
|
|||
|
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
||
|
||
|
||
|
||
vi
|
|||
|
|||
|
|||
|
20.1.4 |
|
20.2.1 Memory Management for DataReaders Using Generated |
|
20.2.6 |
vii
Chapter 10 Reliable Communications
Connext uses
This chapter includes the following sections:
❏Sending Data Reliably (Section 10.1)
❏Overview of the Reliable Protocol (Section 10.2)
❏Using QosPolicies to Tune the Reliable Protocol (Section 10.3)
10.1Sending Data Reliably
The DCPS reliability model recognizes that the optimal balance between
The QosPolicies provide a way to customize the determinism/reliability
There are two delivery models:
❏
❏Reliable delivery model “Make sure all samples get there, in order.”
10.1.1
By default, Connext uses the
The
10.1.2Reliable Delivery Model
Reliable delivery means the samples are guaranteed to arrive, in the order published.
The DataWriter maintains a send queue with space to hold the last X number of samples sent. Similarly, a DataReader maintains a receive queue with space for consecutive X expected samples.
The send and receive queues are used to temporarily cache samples until Connext is sure the samples have been delivered and are not needed anymore. Connext removes samples from a publication’s send queue after the sample has been acknowledged by all reliable subscriptions. When positive acknowledgements are disabled (see DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3) and DATA_READER_PROTOCOL QosPolicy (DDS Extension) (Section 7.6.1)), samples are removed from the send queue after the corresponding keep- duration has elapsed (see Table 6.36, “DDS_RtpsReliableWriterProtocol_t,” on page
If an
DataReader.
DataWriters can be set up to wait for available queue space when sending samples. This will cause the sending thread to block until there is space in the send queue. (Or, you can decide to sacrifice sending samples reliably so that the sending rate is not compromised.) If the DataWriter is set up to ignore the full queue and sends anyway, then older cached samples will be pushed out of the queue before all DataReaders have received them. In this case, the DataReader (or its Subscriber) is notified of the missing samples through its Listener and/or Conditions.
Connext automatically sends acknowledgments (ACKNACKs) as necessary to maintain reliable communications. The DataWriter may choose to block for a specified duration to wait for these acknowledgments (see Waiting for Acknowledgments in a DataWriter (Section 6.3.11)).
Connext establishes a virtual reliable channel between the matching DataWriter and all DataReaders. This mechanism isolates DataReaders from each other, allows the application to control memory usage, and provides mechanisms for the DataWriter to balance reliability and determinism. Moreover, the use of send and receive queues allows Connext to be implemented efficiently without introducing unnecessary delays in the stream.
Note that a successful return code (DDS_RETCODE_OK) from write() does not necessarily mean that all DataReaders have received the data. It only means that the sample has been added to the DataWriter’s queue. To see if all DataReaders have received the data, look at the RELIABLE_WRITER_CACHE_CHANGED Status (DDS Extension) (Section 6.3.6.7) to see if any samples are unacknowledged.
Suppose DataWriter A reliably publishes a Topic to which DataReaders B and C reliably subscribe. B has space in its queue, but C does not. Will DataWriter A be notified? Will DataReader C receive any error messages or callbacks? The exact behavior depends on the QoS settings:
❏If HISTORY_KEEP_ALL is specified for C, C will reject samples that cannot be put into the queue and request A to resend missing samples. The Listener is notified with the on_sample_rejected() callback (see SAMPLE_REJECTED Status (Section 7.3.7.8)). If A has a queue large enough, or A is no longer writing new samples, A won’t notice unless it checks the RELIABLE_WRITER_CACHE_CHANGED Status (DDS Extension) (Section 6.3.6.7).
❏If HISTORY_KEEP_LAST is specified for C, C will drop old samples and accept new ones. The Listener is notified with the on_sample_lost() callback (see SAMPLE_LOST Status (Section 7.3.7.7)). To A, it is as if all samples have been received by C (that is, they have all been acknowledged).
10.2Overview of the Reliable Protocol
An important advantage of Connext is that it can offer the reliability and other QoS guarantees mandated by DDS on top of a very wide variety of transports, including
In order to work in this wide range of environments, the reliable protocol defined by RTPS is highly configurable with a set of parameters that let the application
The most important features of the RTPS protocol are:
❏Support for both push and pull operating modes
❏Support for both positive and negative acknowledgments
❏Support for high
❏Support for multicast DataReaders
❏Support for
In order to support these features, RTPS uses several types of messages: Data messages (DATA), acknowledgments (ACKNACKs), and heartbeats (HBs).
❏DATA messages contain snapshots of the value of
❏HB messages announce to the DataReader that it should have received all snapshots up to the one tagged with a range of sequence numbers and can also request the DataReader to send an acknowledgement back. For example,
❏ACKNACK messages communicate to the DataWriter that particular snapshots have been successfully stored in the DataReader’s history. ACKNACKs also tell the DataWriter which snapshots are missing on the DataReader side. The ACKNACK message includes a set of sequence numbers represented as a bit map. The sequence numbers indicate which ones the DataReader is missing. (The bit map contains the base sequence number that has not been received, followed by the number of bits in bit map and the optional bit map.
The maximum size of the bit map is 256.) All numbers up to (not including) those in the set are considered positively acknowledged. They are represented in Figure 10.1 through Figure 10.7 as
1. For a link to the RTPS specification, see the RTI website, www.rti.com.
missing>). For example, ACKNACK(4) indicates that the snapshots with sequence numbers 1, 2, and 3 have been successfully stored in the DataReader history, and that 4 has not been received.
It is important to note that Connext can bundle multiple of the above messages within a single network packet. This ‘submessage bundling’ provides for higher performance communications.
Figure 10.1 Basic RTPS Reliable Protocol
Assigned sequence number
History of send data values
Whether or not the sample has been delivered to the reader history
DataWriter DataReader
write
(A)
1 |
A |
X |
|
|
DATA (A,1); |
|
||
|
cache |
HB (1) |
|
|||||
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
(A, 1) |
|
|
cache |
|
|
|
|
|
|
|
|
(A, 1) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
check
(1)
|
|
|
|
|
|
ACKNACK |
1 |
A |
4 |
|
|
|
(2) |
|
acked |
|||||
|
|
|
|
|
||
|
|
|
|
|||
|
|
|
(1) |
|
|
|
|
|
|
|
|
time |
time |
|
|
|
|
|
||
|
|
|
|
|
||
|
|
|
Assigned sequence number
DataReader history
Whether or not the sample is available for the application to read/take
1 A 4
Figure 10.1 illustrates the basic behavior of the protocol when an application calls the write() operation on a DataWriter that is associated with a DataReader. As mentioned, the RTPS protocol can bundle multiple submessages into a single network packet. In Figure 10.1 this feature is used to piggyback a HB message to the DATA message. Note that before the message is sent, the data is given a sequence number (1 in this case) which is stored in the DataWriter’s send queue. As soon as the message is received by the DataReader, it places it into the DataReader’s receive queue. From the sequence number the DataReader can tell that it has not missed any messages and therefore it can make the data available immediately to the user (and call the DataReaderListener). This is indicated by the “✔” symbol. The reception of the HB(1) causes the DataReader to check that it has indeed received all updates up to and including the one with sequenceNumber=1. Since this is true, it replies with an ACKNACK(2) to positively acknowledge all messages up to (but not including) sequence number 2. The DataWriter notes that the update has been acknowledged, so it no longer needs to be retained in its send queue. This is indicated by the “✔” symbol.
Figure 10.2 illustrates the behavior of the protocol in the presence of lost messages. Assume that the message containing DATA(A,1) is dropped by the network. When the DataReader receives
Figure 10.2 RTPS Reliable Protocol in the Presence of Message Loss
DataWriter
write(S01)
1 A X
cache (A, 1)
D |
|
AT |
|
A ( |
|
A, |
|
1); |
|
|
HB (1) |
DataReader
r
write(S02)
1 |
A |
X |
|
|
|
|
|
cache(B,2) |
|||||||
|
|
|
|||||
2 |
B |
X |
|||||
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
get(1)
write(S03)
DATA |
|
|
|
(B,2); |
|
|
|
HB |
|
K(1) |
|
KNAC |
|
|
AC |
|
|
D |
|
|
AT |
|
|
|
A ( |
|
|
A,1) |
|
1 |
A |
X |
cache(C,3) |
|
|
|
|
||
2 |
B |
X |
||
|
||||
|
|
|
|
|
3 |
C |
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
A |
4 |
|
|
|
|
|
|
|
2 |
B |
4 |
|
|
|
|
|
||
3 |
C |
4 |
||
|
||||
|
|
|
|
|
|
|
|
|
DATA (C,3); HB
|
K(4) |
KNAC |
|
AC |
|
time
|
|
|
|
1 |
|
X |
|
cache (B,2) |
|
||||||
|
|
|
|||||
2 |
B |
X |
|||||
|
|
|
|||||
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
1 |
A |
4 |
|
cache (A,1) |
|||||||
|
|
|
|||||
2 |
B |
4 |
|||||
|
|
|
|||||
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
1 |
A |
4 |
|
cache (C,3) |
|||||||
|
|
|
|||||
2 |
B |
4 |
|||||
|
|
|
|||||
|
|
|
|
|
|
||
|
|
|
|
3 |
C |
4 |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
||
|
|
|
time |
See Figure 10.1 for |
|
meaning of table columns. |
||
|
the next message (DATA(B,2);
1.The data associated with sequence number 2 (B) is tagged with ‘X’ to indicate that it is not deliverable to the application (that is, it should not be made available to the application, because the application needs to receive the data associated with sample 1
(A) first).
2.An ACKNACK(1) is sent to the DataWriter to request that the data tagged with sequence number 1 be resent.
Reception of the ACKNACK(1) causes the DataWriter to resend DATA(A,1). Once the DataReader receives it, it can ‘commit’ both A and B such that the application can now access both (indicated
by the “✔”) and call the DataReaderListener. From there on, the protocol proceeds as before for the next data message (C) and so forth.
A subtle but important feature of the RTPS protocol is that ACKNACK messages are only sent as a direct response to HB messages. This allows the DataWriter to better control the overhead of these ‘administrative’ messages. For example, if the DataWriter knows that it is about to send a chain of DATA messages, it can bundle them all and include a single HB at the end, which minimizes ACKNACK traffic.
10.3Using QosPolicies to Tune the Reliable Protocol
Reliability is controlled by the QosPolicies in Table 10.1. To enable reliable delivery, read the following sections to learn how to change the QoS for the DataWriter and DataReader:
❏Enabling Reliability (Section 10.3.1)
❏Tuning Queue Sizes and Other Resource Limits (Section 10.3.2)
❏Controlling Heartbeats and Retries with DataWriterProtocol QosPolicy (Section 10.3.4)
❏Avoiding Message Storms with DataReaderProtocol QosPolicy (Section 10.3.5)
❏Resending Samples to
Then see this section to explore example use cases:
Table 10.1 QosPolicies for Reliable Communications
QosPolicy |
Description |
Related |
Reference |
|
Entitiesa |
||||
|
|
|
||
|
|
|
|
|
|
To establish reliable communication, this QoS must be |
|
||
Reliability |
set to DDS_RELIABLE_RELIABILITY_QOS for the |
DW, DR |
||
|
DataWriter and its DataReaders. |
|
||
|
|
|
||
|
|
|
|
|
|
This QoS determines the amount of resources each side |
|
|
|
|
can use to manage instances and samples of instances. |
|
|
|
|
Therefore it controls the size of the DataWriter’s send |
|
||
ResourceLimits |
queue and the DataReader’s receive queue. The send |
DW, DR |
||
|
queue stores samples until they have been ACKed by |
|
||
|
|
|
||
|
all DataReaders. The DataReader’s receive queue stores |
|
|
|
|
samples for the user’s application to access. |
|
|
|
|
|
|
|
|
History |
This QoS affects how a DataWriter/DataReader behaves |
DW, DR |
||
|
when its send/receive queue fills up. |
|
||
DataWriterProtocol |
This QoS configures |
DW |
||
|
QoS can disable positive ACKs for its DataReaders. |
|
||
|
When a reliable DataReader receives a heartbeat from a |
|
|
|
|
DataWriter and needs to return an ACKNACK, the |
|
||
DataReaderProtocol |
DataReader can choose to delay a while. This QoS sets |
DR |
||
|
the minimum and maximum delay. It can also disable |
|
||
|
|
|
||
|
positive ACKs for the DataReader. |
|
|
|
|
|
|
|
Table 10.1 QosPolicies for Reliable Communications
QosPolicy |
Description |
Related |
Reference |
|
Entitiesa |
||||
|
|
|
||
|
|
|
|
|
|
This QoS determines additional amounts of resources |
|
|
|
|
that the DataReader can use to manage samples |
|
|
|
DataReaderResource- |
(namely, the size of the DataReader’s internal queues, |
DR |
||
Limits |
which cache samples until they are ordered for reliabil- |
|
||
|
ity and can be moved to the DataReader’s receive queue |
|
|
|
|
for access by the user’s application). |
|
|
|
|
|
|
|
|
Durability |
This QoS affects whether |
DW, DR |
||
|
receive all |
|
a.DW = DataWriter, DR = DataReader
10.3.1Enabling Reliability
You must modify the RELIABILITY QosPolicy (Section 6.5.19) of the DataWriter and each of its reliable DataReaders. Set the kind field to DDS_RELIABLE_RELIABILITY_QOS:
❏ DataWriter
writer_qos.reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;
❏ DataReader
reader_qos.reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;
10.3.1.1Blocking until the Send Queue Has Space Available
The max_blocking_time property in the RELIABILITY QosPolicy (Section 6.5.19) indicates how long a DataWriter can be blocked during a write().
If max_blocking_time is
If the number of unacknowledged samples in the reliability send queue drops below max_samples (set in the RESOURCE_LIMITS QosPolicy (Section 6.5.20)) before max_blocking_time, the sample is sent and write() returns DDS_RETCODE_OK.
If max_blocking_time is zero and the reliability send queue is full, write() returns DDS_RETCODE_TIMEOUT and the sample is not sent.
10.3.2Tuning Queue Sizes and Other Resource Limits
Set the HISTORY QosPolicy (Section 6.5.10) appropriately to accommodate however many samples should be saved in the DataWriter’s send queue or the DataReader’s receive queue. The defaults may suit your needs; if so, you do not have to modify this QosPolicy.
Set the DDS_RtpsReliableWriterProtocol_t in the DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3) appropriately to accommodate the number of unacknowledged samples that can be
For more information, see the following sections:
❏Understanding the Send Queue and Setting its Size (Section 10.3.2.1)
❏Understanding the Receive Queue and Setting Its Size (Section 10.3.2.2)
Note: The HistoryQosPolicy’s depth must be less than or equal to the ResourceLimitsQosPolicy’s max_samples_per_instance; max_samples_per_instance must be less than or equal to the ResourceLimitsQosPolicy’s max_samples (see RESOURCE_LIMITS QosPolicy (Section 6.5.20)), and max_samples_per_remote_writer (see
DATA_READER_RESOURCE_LIMITS QosPolicy (DDS Extension) (Section 7.6.2)) must be less than or equal to max_samples.
❏depth <= max_samples_per_instance <= max_samples
❏max_samples_per_remote_writer <= max_samples
Examples:
❏ DataWriter
writer_qos.resource_limits.initial_instances = 10; writer_qos.resource_limits.initial_samples = 200; writer_qos.resource_limits.max_instances = 100; writer_qos.resource_limits.max_samples = 2000; writer_qos.resource_limits.max_samples_per_instance = 20; writer_qos.history.depth = 20;
❏ DataReader
reader_qos.resource_limits.initial_instances = 10; reader_qos.resource_limits.initial_samples = 200; reader_qos.resource_limits.max_instances = 100; reader_qos.resource_limits.max_samples = 2000; reader_qos.resource_limits.max_samples_per_instance = 20; reader_qos.history.depth = 20; reader_qos.reader_resource_limits.max_samples_per_remote_writer = 20;
10.3.2.1Understanding the Send Queue and Setting its Size
A DataWriter’s send queue is used to store each sample it writes. A sample will be removed from the send queue after it has been acknowledged (through an ACKNACK) by all the reliable DataReaders. A DataReader can request that the DataWriter resend a missing sample (through an ACKNACK). If that sample is still available in the send queue, it will be resent. To elicit timely ACKNACKs, the DataWriter will regularly send heartbeats to its reliable DataReaders.
A DataWriter’s send queue size is determined by its RESOURCE_LIMITS QosPolicy (Section 6.5.20), specifically the max_samples field. The appropriate value depends on application parameters such as how fast the publication calls write().
A DataWriter has a "send window" that is the maximum number of unacknowledged samples allowed in the send queue at a time. The send window enables configuration of the number of samples queued for reliability to be done independently from the number of samples queued for history. This is of great benefit when the size of the history queue is much different than the size of the reliability queue. For example, you may want to resend a large history to
The send window is determined by the DataWriterProtocolQosPolicy, specifically the fields min_send_window_size and max_send_window_size within the rtps_reliable_writer field of type DDS_RtpsReliableWriterProtocol_t. Other fields control a dynamic send window, where the send window size changes in response to network congestion to maximize the effective send rate. Like for max_samples, the appropriate values depend on application parameters.
Strict reliability: If a DataWriter does not receive ACKNACKs from one or more reliable DataReaders, it is possible for the reliability send
reliability queue before writing any more samples. Connext provides two mechanisms to do this:
❏Allow the write() operation to block until there is space in the reliability queue again to store the sample. The maximum time this call blocks is determined by the max_blocking_time field in the RELIABILITY QosPolicy (Section 6.5.19) (also discussed in Section 10.3.1.1).
❏Use the DataWriter’s Listener to be notified when the reliability queue fills up or empties again.
When the HISTORY QosPolicy (Section 6.5.10) on the DataWriter is set to KEEP_LAST, strict reliability is not guaranteed. When there are depth number of samples in the queue (set in the HISTORY QosPolicy (Section 6.5.10), see Section 10.3.3) the oldest sample will be dropped from the queue when a new sample is written. Note that in such a reliable mode, when the send window is larger than max_samples, the DataWriter will never block, but strict reliability is no longer guaranteed.
If there is a request for the purged sample from any DataReaders, the DataWriter will send a heartbeat that no longer contains the sequence number of the dropped sample (it will not be able to send the sample).
Alternatively, a DataWriter with KEEP_LAST may block on write() when its send window is smaller than its send queue. The DataWriter will block when its send window is full. Only after the blocking time has elapsed, the DataWriter will purge a sample, and then strict reliability is no longer guaranteed.
The send queue size is set in the max_samples field of the RESOURCE_LIMITS QosPolicy (Section 6.5.20). The appropriate size for the send queue depends on application parameters (such as the send rate), channel parameters (such as
The DataReader’s receive queue size should generally be larger than the DataWriter’s send queue size. Receive queue size is discussed in Section 10.3.2.2.
A good rule of thumb, based on a simple model that assumes individual packet drops are not correlated and
Figure 10.3 Calculating Minimum Send Queue Size for a Desired Level of Reliability
NRTlog ( 1 – Q)
=2
log ( p)
Simple formula for determining the minimum size of the send queue required for strict reliability.
In the above equation, R is the rate of sending samples, T is the
Table 10.2 gives the required size of the send queue for several common scenarios.
Table 10.2 Required Size of the Send Queue for Different Network Parameters
Qa |
pb |
Tc |
|
Rd |
Ne |
|
|
|
|
|
|
99% |
1% |
0.001f sec |
100 Hz |
|
1 |
99% |
1% |
0.001 sec |
2000 Hz |
|
2 |
|
|
|
|
|
|
99% |
5% |
0.001 sec |
100 Hz |
|
1 |
|
|
|
|
|
|
99% |
5% |
0.001 sec |
2000 Hz |
|
4 |
|
|
|
|
|
|
99.99% |
1% |
0.001 sec |
100 Hz |
|
1 |
|
|
|
|
|
|
Table 10.2 Required Size of the Send Queue for Different Network Parameters
Qa |
pb |
|
Tc |
|
Rd |
Ne |
|
|
|
|
|
|
|
99.99% |
1% |
0.001 sec |
|
2000 Hz |
|
6 |
|
|
|
|
|
|
|
99.99% |
5% |
0.001 sec |
|
100 Hz |
|
1 |
|
|
|
|
|
|
|
99.99% |
5% |
0.001 sec |
|
2000 Hz |
|
8 |
|
|
|
|
|
|
|
a."Q" is the desired level of reliability measured as the probability that any data update will eventually be delivered successfully. In other words, percentage of samples that will be successfully delivered.
b."p" is the probability that any single packet gets lost in the network.
c."T" is the
d."R" is the rate at which the publisher is sending updates.
e."N" is the minimum required size of the send queue to accomplish the desired level of reliability "Q".
f.The typical
Note: Packet loss on a network frequently happens in bursts, and the packet loss events are correlated. This means that the probability of a packet being lost is much higher if the previous packet was lost because it indicates a congested network or busy receiver. For this situation, it may be better to use a queue size that can accommodate the longest period of network congestion, as illustrated in Figure 10.4.
Figure 10.4 Calculating Minimum Send Queue Size for Networks with Dropouts
N = RD( Q)
Send queue size as a function of send rate "R" and maximum dropout time D.
In the above equation R is the rate of sending samples, D(Q) is a time such that Q percent of the dropouts are of equal or lesser length, and Q is the required probability that a sample is eventually successfully delivered. The problem with the above formula is that it is hard to determine the value of D(Q) for different values of Q.
For example, if we want to ensure that 99.9% of the samples are eventually delivered successfully, and we know that the 99.9% of the network dropouts are shorter than 0.1 seconds, then we would use N = 0.1*R. So for a rate of 100Hz, we would use a send queue of N = 10; for a rate of 2000Hz, we would use N = 200.
10.3.2.2Understanding the Receive Queue and Setting Its Size
Samples are stored in the DataReader’s receive queue, which is accessible to the user’s application.
A sample is removed from the receive queue after it has been accessed by take(), as described in Accessing Data Samples with Read or Take (Section 7.4.3). Note that read() does not remove samples from the queue.
A DataReader's receive queue size is limited by its RESOURCE_LIMITS QosPolicy (Section 6.5.20), specifically the max_samples field. The storage of
A DataReader can maintain reliable communications with multiple DataWriters (e.g., in the case of the OWNERSHIP_STRENGTH QosPolicy (Section 6.5.16) setting of SHARED). The maximum number of
The DataReader will cache samples that arrive out of order while waiting for missing samples to be resent. (Up to 256 samples can be resent; this limitation is imposed by the wire protocol.) If there is no room, the DataReader has to reject
The appropriate size of the receive queue depends on application parameters, such as the DataWriter’s sending rate and the probability of a dropped sample. However, the receive queue size should generally be larger than the send queue size. Send queue size is discussed in Section 10.3.2.1.
Figure 10.5 and Figure 10.6 compare two hypothetical DataReaders, both interacting with the same DataWriter. The queue on the left represents an ordering cache, allocated from receive
In Figure 10.6 on page
Figure 10.5 Effect of
DataWriter DataReader
Send Sample “1”
Send Sample “2”
1
Sample
Send Sample “3” “2” lost. Send HeartBeat
max_samples is 4. This also limits the number of unordered samples that can be cached.
Sample 1 is taken
Note: no unordered samples cached
Send Sample “4”
C |
||||
A |
||||
|
|
|
||
Send Sample “5” |
|
|||
|
|
|
|
( |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
1- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
) |
“ |
” |
|
|
|
|
|
|
|
|
|
Space reserved for missing sample “2”. |
||
|
|
K |
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
C |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Samples “3” and “4” are cached |
|
|
|
|
|
|
|
3 |
4 |
|
|
|
|
|
|
while waiting for missing sample “2”. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
3 |
4 |
|
|
|
|
|
|
Samples |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
|
|
Sample 5 is taken |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 10.6 Effect of Receive Queue Size on Performance: Small Queue Size
DataWriter DataReader
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Send Sample “1” |
|
|
|
|
|
|
|
|
|
|
|
||
Send Sample “2” |
|
|
|
|
|
|
|
|
|
|
|
||
Send Sample “3” |
|
|
|
Sample |
|
|
|||||||
|
|
|
“2” lost |
|
|
|
|||||||
Send Heartbeat |
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
H |
|
|
|
|
|
|
|
|
|
|
|
|
|
B ( |
|
|
|
|
|
|
|
|
||
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
Send Sample “4” |
|
|
|
|
) |
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
2 |
) |
||||
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
( |
|
|
|
|
|
|
|
|
|
|
|
|
K |
|
|
|
|
|
|
|
|
|
|
|
|
C |
|
|
|
|
|
|
|
|
C |
|
A |
|
|
|
|
|
|||
A |
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|||
Send Sample “5” |
|
|
|
|
|
|
|
|
|
|
|
||
Send Heartbeat |
|
|
|
|
|
|
|
|
|
|
|
||
|
|
H |
|
|
|
|
|
|
|
|
|
|
|
|
|
B ( |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
) |
|
|
|
|
|
|
|
|
|
|
|
( |
|
|
|
|
|
|
|
|
|
|
|
|
K |
|
|
|
|
|
|
|
|
|
|
|
|
C |
|
|
|
|
|
|
|
|
|
|
|
|
A |
|
|
|
|
|
|
|
|
N |
|
|
|
|
|
|
|||||
CK |
|
|
|
|
|
|
|
|
|||||
|
|
|
A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
max_samples is 2. This also limits the |
|
|
|
|
|
|
|
|
|
|
|
number of unordered samples that |
|
|
|
|
|
|
|
|
|
|
|
can be cached. |
1 |
|
|
|
|
|
|
|
|
|
|
Move sample 1 to receive queue. |
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
Note: no unordered samples cached |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Space reserved for missing sample “2”. |
||
|
|
3 |
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
||
|
|
3 |
|
|
|
|
|
Sample “4” must be dropped |
|||
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|||||
2 |
|
|
|
|
|
|
|
|
|
because it does not fit in the queue. |
|
|
|
|
|
|
|
|
|
|
Move samples 2 and 3 to receive queue. |
||
|
3 |
|
|
|
|
|
|
||||
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
Space reserved for missing sample “4”. |
|
4 |
|
|
|
|
|
|
|
|
Move samples 4 and 5 to receive queue. |
||
|
5 |
|
|
|
|
|
|
||||
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
10.3.3Controlling Queue Depth with the History QosPolicy
If you want to achieve strict reliability, set the kind field in the HISTORY QosPolicy (Section 6.5.10) for both the DataReader and DataWriter to KEEP_ALL; in this case, the depth does not matter.
Or, for
The depth field in the HISTORY QosPolicy (Section 6.5.10) controls how many samples Connext will attempt to keep on the DataWriter’s send queue or the DataReader’s receive queue. For reliable communications, depth should be >= 1. The depth can be set to 1, but cannot be more than the max_samples_per_instance in RESOURCE_LIMITS QosPolicy (Section 6.5.20).
Example:
❏ DataWriter
writer_qos.history.depth = <number of samples to keep in send queue>;
❏ DataReader
reader_qos.history.depth = <number of samples to keep in receive queue>;
10.3.4Controlling Heartbeats and Retries with DataWriterProtocol QosPolicy
In the Connext reliability model, the DataWriter sends data samples and heartbeats to reliable DataReaders. A DataReader responds to a heartbeat by sending an ACKNACK, which tells the DataWriter what the DataReader has received so far.
In addition, the DataReader can request missing samples (by sending an ACKNACK) and the DataWriter will respond by resending the missing samples. This section describes some advanced timing parameters that control the behavior of this mechanism. Many applications do not need to change these settings. These parameters are contained in the
DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3).
The protocol described in Overview of the Reliable Protocol (Section 10.2) uses very simple rules such as piggybacking HB messages to each DATA message and responding immediately to ACKNACKs with the requested repair messages. While correct, this protocol would not be capable of accommodating optimum performance in more advanced use cases.
This section describes some of the parameters configurable by means of the rtps_reliable_writer structure in the DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3) and how they affect the behavior of the RTPS protocol.
10.3.4.1How Often Heartbeats are Resent (heartbeat_period)
If a DataReader does not acknowledge a sample that has been sent, the DataWriter resends the heartbeat. These heartbeats are resent at the rate set in the DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3), specifically its heartbeat_period field.
For example, a heartbeat_period of 3 seconds means that if a DataReader does not receive the latest sample (for example, it gets dropped by the network), it might take up to 3 seconds before the DataReader realizes it is missing data. The application can lower this value when it is important that recovery from packet loss is very fast.
The basic approach of sending HB messages as a piggyback to DATA messages has the advantage of minimizing network traffic. However, there is a situation where this approach, by itself, may result in large latencies. Suppose there is a DataWriter that writes bursts of data, separated by relatively long periods of silence. Furthermore assume that the last message in one of the bursts is lost by the network. This is the case shown for message DATA(B, 2) in Figure 10.7. If HBs were only sent piggybacked to DATA messages, the DataReader would not realize it missed the ‘B’ DATA message with sequence number ‘2’ until the DataWriter wrote the next message. This may be a long time if data is written sporadically. To avoid this situation,
Connext can be configured so that HBs are sent periodically as long as there are samples that have not been acknowledged even if no data is being sent. The period at which these HBs are sent is configurable by setting the rtps_reliable_writer.heartbeat_period field in the DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3).
Note that a small value for the heartbeat_period will result in a small
Also note that the heartbeat_period should not be less than the rtps_reliable_reader.heartbeat_suppression_duration in the DATA_READER_PROTOCOL QosPolicy (DDS Extension) (Section 7.6.1); otherwise those HBs will be lost.
Figure 10.7 Use of heartbeat_period
DataWriter |
DataReader |
write(A)
1 A X
cache (A, 1)
DATA (A,1)
cache (A,1) 1 |
A 4 |
write(B)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
cache(B,2) |
|
|
|
|
||||||
|
1 |
|
A |
X |
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
DATA (B,2) |
|
|
||
|
2 |
|
B |
X |
|
heartbeat peri |
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
acked(1) |
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|||||||
|
|
get(2) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
K(2) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
KNAC |
|||
|
|
|
|
|
|
|
|
|
|
|
AC |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
AT |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A( |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B, |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2) |
|
|
|
|
|
|
|
|
|
|
|
|
H |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
B( |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
A |
4 |
|
|
|
|
|
|
|
|
|
|
K(3) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
KNAC |
||||
|
2 |
|
B |
4 |
|
|
|
|
|
|
AC |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
time |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
r
10.3.4.2How Often Piggyback Heartbeats are Sent (heartbeats_per_max_samples)
A DataWriter will automatically send heartbeats with new samples to request regular ACKNACKs from the DataReader. These are called “piggyback” heartbeats.
If batching is disabled1: one piggyback heartbeat will be sent every [max_samples2/ heartbeats_per_max_samples] number of samples.
If batching is enabled: one piggyback heartbeat will be sent every [max_batches3/ heartbeats_per_max_samples] number of samples.
Furthermore, one piggyback heartbeat will be sent per send window. If the above calculation is greater than the send window size, then the DataWriter will send a piggyback heartbeat for every [send window size] number of samples.
The heartbeats_per_max_samples field is part of the rtps_reliable_writer structure in the
DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3). If heartbeats_per_max_samples is set equal to max_samples, this means that a heartbeat will be sent with each sample. A value of 8 means that a heartbeat will be sent with every 'max_samples/ 8' samples. Say max_samples is set to 1024, then a heartbeat will be sent once every 128 samples. If you set this to zero, samples are sent without any piggyback heartbeat. The max_samples field is part of the RESOURCE_LIMITS QosPolicy (Section 6.5.20).
Figure 10.1 on page
There are two reasons to send a HB:
❏To request that a DataReader confirm the receipt of data via an ACKNACK, so that the
DataWriter can remove it from its send queue and therefore prevent the DataWriter’s history from filling up (which could cause the write() operation to temporarily block4).
❏To inform the DataReader of what data it should have received, so that the DataReader can send a request for missing data via an ACKNACK.
The DataWriter’s send queue can buffer many
A HB is used to get confirmation from DataReaders so that the DataWriter can remove acknowledged samples from the queue to make space for new samples. Therefore, if the queue size is large, or new samples are added slowly, HBs can be sent less frequently.
In Figure 10.8 on page
10.3.4.3Controlling Packet Size for Resent Samples (max_bytes_per_nack_response)
A repair packet is the maximum amount of data that a DataWriter will resend at a time. For example, if the DataReader requests 20 samples, each 10K, and the max_bytes_per_nack_response is set to 100K, the DataWriter will only send the first 10 samples. The DataReader will have to ACKNACK again to receive the next 10 samples.
1.Batching is enabled with the BATCH QosPolicy (DDS Extension) (Section 6.5.2).
2.max_samples is set in the RESOURCE_LIMITS QosPolicy (Section 6.5.20)
3.max_batches is set in the DATA_WRITER_RESOURCE_LIMITS QosPolicy (DDS Extension) (Section 6.5.4)
4.Note that data could also be removed from the DataWriter’s send queue if it is no longer relevant due to some other QoS such a HISTORY KEEP_LAST (Section 6.5.10) or LIFESPAN (Section 6.5.12).
Figure 10.8 Use of heartbeats_per_max_samples
DataWriter |
DataReader |
write(A)
1 A X cache (A, 1)
write(B)
DATA (A,1)
cache (A,1) |
|
1 |
A |
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
cache(B,2) |
|
|
|
1 |
A |
X |
|
|
||
|
|
|
|
|
|
|
2 |
B |
X |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
write(C)
DATA(B,2)
cache (B,2) |
|
1 |
A |
4 |
|
|
|
|
|
|
|
2 |
B |
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
A |
X |
|
|
|
2 |
B |
X |
|
|
|
3 |
C |
X |
|
|
|
|
|
|
1 |
A |
4 |
|
|
|
2 |
B |
4 |
|
|
|
3 |
C |
4 |
|
|
|
|
|
|
cache(C,3)
D |
|
AT |
|
A( |
|
C,3);H |
|
|
1 A 4
cache (C,3)
2 B 4
|
|
|
|
4) |
|
|
|
K( |
|
|
|
AC |
|
|
|
KN |
|
|
|
|
AC |
|
|
|
time |
|
|
|
time |
See Figure 10.1 for meaning of table columns.
A DataWriter may resend multiple missed samples in the same packet. The max_bytes_per_nack_response field in the DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3) limits the size of this ‘repair’ packet.
10.3.4.4Controlling How Many Times Heartbeats are Resent (max_heartbeat_retries)
If a DataReader does not respond within max_heartbeat_retries number of heartbeats, it will be dropped by the DataWriter and the reliable DataWriter’s Listener will be called with a
RELIABLE_READER_ACTIVITY_CHANGED Status (DDS Extension) (Section 6.3.6.8).
If the dropped DataReader becomes available again (perhaps its network connection was down temporarily), it will be added back to the DataWriter the next time the DataWriter receives some message (ACKNACK) from the DataReader.
When a DataReader is ‘dropped’ by a DataWriter, the DataWriter will not wait for the DataReader to send an ACKNACK before any samples are removed. However, the DataWriter will still send data and HBs to this DataReader as normal.
The max_heartbeat_retries field is part of the DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3).
10.3.4.5Treating
In addition to max_heartbeat_retries, if inactivate_nonprogressing_readers is set, then not only are
One example for which it could be useful to turn on inactivate_nonprogressing_readers is when a DataReader’s
10.3.4.6Coping with Redundant Requests for Missing Samples (max_nack_response_delay)
When a DataWriter receives a request for missing samples from a DataReader and responds by resending the requested samples, it will ignore additional requests for the same samples during the time period max_nack_response_delay.
The rtps_reliable_writer.max_nack_response_delay field is part of the DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3).
If your send period is smaller than the
While these redundant messages provide an extra cushion for the level of reliability desired, you can conserve the CPU and network bandwidth usage by limiting how often the same ACKNACK messages are sent; this is controlled by min_nack_response_delay.
Reliable subscriptions are prevented from resending an ACKNACK within min_nack_response_delay seconds from the last time an ACKNACK was sent for the same sample. Our testing shows that the default min_nack_response_delay of 0 seconds achieves an optimal balance for most applications on typical Ethernet LANs.
However, if your system has very slow computers and/or a slow network, you may want to consider increasing min_nack_response_delay. Sending an ACKNACK and resending a missing sample inherently takes a long time in this system. So you should allow a longer time for recovery of the lost sample before sending another ACKNACK. In this situation, you should increase min_nack_response_delay.
If your system consists of a fast network or computers, and the receive queue size is very small, then you should keep min_nack_response_delay very small (such as the default value of 0). If the queue size is small, recovering a missing sample is more important than conserving CPU and network bandwidth (new samples that are too far ahead of the missing sample are thrown away). A fast system can cope with a smaller min_nack_response_delay value, and the reliable sample stream can normalize more quickly.
Figure 10.9 Resending Missing Samples due to Duplicate ACKNACKs
DataWriter DataReader
Send Sample “1”
Send Sample “2”
1
Send Sample “3”
Send Sample “4”
Resend Sample “2” Send Sample “5”
Resend Sample “2”
|
|
|
|
|
|
|
|
) |
||
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
( |
|
|
|
|
|
|
|
|
|
K |
|
|
|
|
|
|
|
|
|
C |
|
|
|
|
|
|
|
|
|
A |
|
|
|
|
|
|
|
|
|
N |
|
|
|
|
|
|
|
|
|
K |
|
|
|
|
|
|
|
|
|
C |
|
|
|
|
|
|
|
|
|
|
A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
) |
||
|
|
|
|
|
|
|
2 |
|
||
|
|
|
|
|
|
( |
|
|
|
|
|
|
|
|
|
K |
|
|
|
|
|
|
|
|
|
C |
|
|
|
|
|
|
|
|
|
A |
|
|
|
|
|
|
|
|
|
N |
|
|
|
|
|
|
|
|
|
K |
|
|
|
|
|
|
|
|
|
C |
|
|
|
|
|
|
|
|
|
|
A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3
3 4
2 3 4
5
Space must be reserved for missing sample “2”.
Samples “3” and “4” are cached while waiting for missing sample “2”.
Sample “2” is dropped since it is older than the last sample that has been handed to the application.
10.3.4.7Disabling Positive Acknowledgements (disable_postive_acks_min_sample_keep_duration)
When ACKNACK storms are a primary concern in a system, an alternative to tuning heartbeat and ACKNACK response delays is to disable positive acknowledgments (ACKs) and rely just on NACKs to maintain reliability. Systems with
Normally when ACKs are enabled, strict reliability is maintained by the DataWriter, guaranteeing that a sample stays in its send queue until all DataReaders have positively acknowledged it (aside from relevant DURABILITY, HISTORY, and LIFESPAN QoS policies). When ACKs are disabled, strict reliability is no longer guaranteed, but the DataWriter should still keep the sample for a sufficient duration for
The keep duration should be configured for the expected
If the peak send rate is known and writer resources are available, the writer queue can be sized so that writes will not block. For this case, the queue size must be greater than the send rate multiplied by the keep duration.
10.3.5Avoiding Message Storms with DataReaderProtocol QosPolicy
DataWriters send data samples and heartbeats to DataReaders. A DataReader responds to a heartbeat by sending an acknowledgement that tells the DataWriter what the DataReader has received so far and what it is missing. If there are many DataReaders, all sending ACKNACKs to the same DataWriter at the same time, a message storm can result. To prevent this, you can set a delay for each DataReader, so they don’t all send ACKNACKs at the same time. This delay is set in the DATA_READER_PROTOCOL QosPolicy (DDS Extension) (Section 7.6.1).
If you have several DataReaders per DataWriter, varying this delay for each one can avoid ACKNACK message storms to the DataWriter. If you are not concerned about message storms, you do not need to change this QosPolicy.
Example:
reader_qos.protocol.rtps_reliable_reader.min_heartbeat_response_delay.sec = 0; reader_qos.protocol.rtps_reliable_reader.min_heartbeat_response_delay.nanosec = 0; reader_qos.protocol.rtps_reliable_reader.max_heartbeat_response_delay.sec = 0; reader_qos.protocol.rtps_reliable_reader.max_heartbeat_response_delay.nanosec =
0.5 * 1000000000UL; // 0.5 sec
As the name suggests, the minimum and maximum response delay bounds the random wait time before the response. Setting both to zero will force immediate response, which may be necessary for the fastest recovery in case of lost samples.
10.3.6Resending Samples to
The DURABILITY QosPolicy (Section 6.5.7) is also somewhat related to Reliability. Connext requires a finite time to "discover" or match DataReaders to DataWriters. If an application attempts to send data before the DataReader and DataWriter "discover" one another, then the sample will not actually get sent. Whether or not samples are resent when the DataReader and DataWriter eventually "discover" one another depends on how the DURABILITY and HISTORY QoS are set. The default setting for the Durability QosPolicy is VOLATILE, which means that the DataWriter will not store samples for redelivery to
Connext also supports the TRANSIENT_LOCAL setting for the Durability, which means that the samples will be kept stored for redelivery to
See also: Waiting for Historical Data (Section 7.3.6).
10.3.7Use Cases
This section contains advanced material that discusses practical applications of the reliability related QoS.
10.3.7.1Importance of Relative Thread Priorities
For high throughput, the Connext Event thread’s priority must be sufficiently high on the sending application. Unlike an unreliable writer, a reliable writer relies on internal Connext threads: the Receive thread processes ACKNACKs from the DataReaders, and the Event thread schedules the events necessary to maintain reliable data flow.
❏When samples are sent to the same or another application on the same host, the Receive thread priority should be higher than the writing thread priority (priority of the thread calling write() on the DataWriter). This will allow the Receive thread to process the messages as they are sent by the writing thread. A sustained reliable flow requires the reader to be able to process the samples from the writer at a speed equal to or faster than the writer emits.
❏The default Event thread priority is low. This is adequate if your reliable transfer is not sustained; queued up events will eventually be processed when the writing thread yields the CPU. The Connext can automatically grow the event queue to store all pending events. But if the reliable communication is sustained, reliable events will continue to be scheduled, and the event queue will eventually reach its limit. The default Event thread priority is unsuitable for maintaining a fast and sustained reliable communication and should be increased through the participant_qos.event.thread.priority. This value maps directly to the OS thread priority, see EVENT QosPolicy (DDS Extension) (Section 8.5.5)).
The Event thread should also be increased to minimize the reliable latency. If events are processed at a higher priority, dropped packets will be resent sooner.
Now we consider some practical applications of the reliability related QoS:
❏Aperiodic Use Case:
❏Aperiodic, Bursty (Section 10.3.7.3)
10.3.7.2Aperiodic Use Case:
Suppose you have aperiodically generated data that needs to be delivered reliably, with minimum latency, such as a series of commands (“Ready,” “Aim,” “Fire”). If a writing thread may block between each sample to guarantee reception of the just sent sample on the reader’s middleware end, a smaller queue will provide a smaller upper bound on the sample delivery time. Adequate writer QoS for this use case are presented in Figure 10.10.
Figure 10.10 QoS for an Aperiodic,
1
2
3
5//use these hard coded value unless you use a key
6
7
8
9
10
12// want to piggyback HB w/ every sample.
13
14
15
16
17
18
19
20//consider making
21
22
24// should be faster than the send rate, but be mindful of OS resolution
25
26
27alertReaderWithinThisMs * 1000000;
29
30
32// essentially turn off slow HB period
33
Line 1 (Figure 10.10): This is the default setting for a writer, shown here strictly for clarity.
Line 2 (Figure 10.10): Setting the History kind to KEEP_ALL guarantees that no sample is ever lost.
Line 3 (Figure 10.10): This is the default setting for a writer, shown here strictly for clarity. ‘Push’ mode reliability will yield lower latency than ‘pull’ mode reliability in normal situations where there is no sample loss. (See DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3).) Furthermore, it does not matter that each packet sent in response to a command will be small, because our data sent with each command is likely to be small, so that maximizing throughput for this data is not a concern.
Line 5 - Line 10 (Figure 10.10): For this example, we assume a single writer is writing samples one at a time. If we are not using keys (see Section 2.2.2), there is no reason to use a queue with room for more than one sample, because we want to resolve a sample completely before moving on to the next. While this negatively impacts throughput, it minimizes memory usage. In this example, a written sample will remain in the queue until it is acknowledged by all active readers (only 1 for this example).
Line 12 - Line 14 (Figure 10.10): The fastest way for a writer to ensure that a reader is
Line
Line
Line
❏Suppose that the
end, and assuming that this retry succeeds, the time to recover the sample from the original publication time is: alertReaderWithinThisMs + 50 sec + 25 sec.
If the OS is capable of
❏What if two packets are dropped in a row? Then the recovery time would be
2 * alertReaderWithinThisMs + 2 * 50 sec + 25 sec. If alertReaderWithinThisMs is 100 ms, the recovery time now exceeds 200 ms, and can perhaps degrade user experience.
Line
So if we set blockingTime to about 80 ms, we will have given enough chance for recovery. Of course, in a dynamic system, a reader may drop out at any time, in which case max_heartbeat_retries will be exceeded, and the unresponsive reader will be dropped by the writer. In either case, the writer can continue writing. Inappropriate values will cause a writer to prematurely drop a temporarily unresponsive (but otherwise healthy) reader, or be stuck trying unsuccessfully to feed a crashed reader. In the unfortunate case where a reader becomes temporarily unresponsive for a duration exceeding (alertReaderWithinThisMs * max_heartbeat_retries), the writer may issue gaps to that reader when it becomes active again; the dropped samples are irrecoverable. So estimating the worst case unresponsive time of all potential readers is critical if sample drop is unacceptable.
Line
Figure 10.11 shows how to set the QoS for the reader side, followed by a
Figure 10.11 QoS for an Aperiodic,
1
2
3
4// 1 is ok for normal use. 2 allows fast infinite loop
5
6
7
9
10
11
12
Line
Line
1.The sender application writes sample 1 to the reader. The receiver application processes it and sends a
2.The sender application writes sample 2 to the receiving application in response to response 1. Because the reader’s queue is 2, it can accept sample 2 even though it may not yet have acknowledged sample 1. Otherwise, the reader may drop sample 2, and would have to recover it later.
3.At the same time, the receiver application acknowledges sample 1, and frees up one slot in the queue, so that it can accept sample 3, which it on its way.
The above steps can be repeated
Line 7 (Figure 10.11): Since we are not using keys, there is just one instance.
Line
10.3.7.3Aperiodic, Bursty
Suppose you have aperiodically generated bursts of data, as in the case of a new aircraft approaching an airport. The data may be the same or different, but if they are written by a single writer, the challenge to this writer is to feed all readers as quickly and efficiently as possible when this burst of hundreds or thousands of samples hits the system.
❏If you use an unreliable writer to push this burst of data, some of them may be dropped over an unreliable transport such as UDP.
❏If you try to shape the burst according to however much the slowest reader can process, the system throughput may suffer, and places an additional burden of queueing the samples on the sender application.
❏If you push the data reliably as fast they are generated, this may cost dearly in repair packets, especially to the slowest reader, which is already burdened with application chores.
Connext pull mode reliability offers an alternative in this case by letting each reader pace its own data stream. It works by notifying the reader what it is missing, then waiting for it to request only as much as it can handle. As in the aperiodic
Line 1 (Figure 10.12): This is the default setting for a writer, shown here strictly for clarity.
Line 2 (Figure 10.12): Since we do not want any data lost, we want the History kind set to KEEP_ALL.
Figure 10.12 QoS for an Aperiodic, Bursty Writer
1
2
3
5//use these hard coded value until you use key
6
7
8
9= worstBurstInSample;
10
11
12
13// piggyback HB not used
14
16
17
19
20
21
22
23
25// should be faster than the send rate, but be mindful of OS resolution
26
27
28alertReaderWithinThisMs * 1000000;
29
31// essentially turn off slow HB period
32
Line 3 (Figure 10.12): The default Connext reliable writer will push, but we want the reader to pull instead.
Line
Line
Line
Line 19- Line 23 (Figure 10.12): Similar to the
this time will decrease; other factors such as the traversal time through Connext and the transport are typically in microseconds range (depending on machines of course).
For example, let’s also say that the worst case burst is 1000 samples. The writing thread will of course not block because it is merely copying each of the 1000 samples to the Connext queue on the writer side; on a typical modern machine, the act of writing these 1000 samples will probably take no more than a few ms. But it would take at least 1000/20 = 50 resend packets for the reader to catch up to the writer, or 50 times 11 ms = 550 ms. Since the burst model deals with one burst at a time, we would expect that another burst would not come within this time, and that we are allowed to block for at least this period. Including a safety margin, it would appear that we can comfortably handle a burst of 1000 every second or so.
But what if there are multiple readers? The writer would then take more time to feed multiple readers, but with a fast transport, a few more readers may only increase the 11 ms to only 12 ms or so. Eventually, however, the number of readers will justify the use of multicast. Even in pull mode, Connext supports multicast by measuring how many multicast readers have requested sample repair. If the writer does not delay response to NACK, then repairs will be sent in unicast. But a suitable NACK delay allows the writer to collect potentially NACKs from multiple readers, and feed a single multicast packet. But as discussed in Section 10.3.7.2, by delaying reply to coalesce response, we may end up waiting much longer than desired. On a Windows system with 10 ms minimum sleep achievable, the delay would add at least 10 ms to the 11 ms delay, so that the time to push 1000 samples now increases to 50 times 21 ms = 1.05 seconds. It would appear that we will not be able to keep up with incoming burst if it came at roughly 1 second, although we put fewer packets on the wire by taking advantage of multicast.
Line
Line 29 (Figure 10.12): With a fast heartbeat period of 50 ms, a writer will take 500 ms (50 ms times the default max_heartbeat_retries of 10) to
Line
Figure 10.13 shows example code for a corresponding aperiodic, bursty reader.
Figure 10.13 QoS for an Aperiodic, Bursty Reader
1
2
3
4
5
7//use these hard coded value until you use key
8
9
10
11
13// the writer probably has more for the reader; ask right away
14
15
16
17
Line
Line
Line
Line
10.3.7.4Periodic
In a periodic reliable model, we can use the writer and the reader queue to keep the data flowing at a smooth rate. The data flows from the sending application to the writer queue, then to the transport, then to the reader queue, and finally to the receiving application. Unless the sending application or any one of the receiving applications becomes unresponsive (including a crash) for a noticeable duration, this flow should continue uninterrupted.
The latency will be low in most cases, but will be several times higher for the recovered and many subsequent samples. In the event of a disruption (e.g., loss in transport, or one of the readers becoming temporarily unresponsive), the writer’s queue level will rise, and may even block in the worst case. If the writing thread must not block, the writer’s queue must be sized sufficiently large to deal with any fluctuation in the system. Figure 10.14 shows an example, with
Line 1 (Figure 10.14): This is the default setting for a writer, shown here strictly for clarity.
Line 2 (Figure 10.14): Since we do not want any data lost, we set the History kind to KEEP_ALL.
Line 3 (Figure 10.14): This is the default setting for a writer, shown here strictly for clarity. Pushing will yield lower latency than pulling.
Line
Line
Figure 10.14 QoS for a Periodic Reliable Writer
1
2
3
5//use these hard coded value until you use key
6
7
9int unresolvedSamplePerRemoteWriterMax =
10worstCaseApplicationDelayTimeInMs * dataRateInHz / 1000;
11
12
13
14 |
|
15 |
|
16int piggybackEvery = 8;
17
18
19
20
21
22
23
24
25
27
28
30
31
32alertReaderWithinThisMs * 1000000;
33
35// essentially turn off slow HB period
36
Line 12 (Figure 10.14): Even though we have sized the queue according to the worst case, there is a possibility for saving some memory in the normal case. Here, we initially size the queue to be only half of the worst case, hoping that the worst case will not occur. When it does, Connext will keep increasing the queue size as necessary to accommodate new samples, until the maximum is reached. So when our optimistic initial queue size is breached, we will incur the penalty of dynamic memory allocation. Furthermore, you will wind up using more memory, as the initially allocated memory will be orphaned (note: does not mean a memory leak or dangling pointer); if the initial queue size is M_i and the maximal queue size is M_m, where M_m = M_i * 2^n, the memory wasted in the worst case will be (M_m - 1) * sizeof(sample) bytes. Note that the memory allocation can be avoided by setting the initial queue size equal to its max value.
Line
Line
Line
for the fast heartbeat period is to allow a writer to abandon inactive readers before the queue fills. If the high watermark is set equal to the queue size, the writer would not doubt the status of an unresponsive reader until the queue completely
Line
Line
Line
Figure 10.15 shows how to set the QoS for a matching reader, followed by a
Figure 10.15 QoS for a Periodic Reliable Reader
1
2
3
4
5
6((2*piggybackEvery - 1) + dataRateInHz * delayInMs / 1000);
8//use these hard coded value until you use key
9
10
11
12
14
15
16
17
Line
Line
but to drop all subsequent samples received until the one being sought is recovered. Connext uses speculative caching, which minimizes the disruption caused by a few dropped samples. Even for the same duration of disruption, the demand on reader queue size is greater if the writer will send more rapidly. In sizing the reader queue, we consider 2 factors that comprise the lost sample recovery time:
❏How long it takes a reader to request a resend to the writer.
The piggyback heartbeat tells a reader about the writer’s state. If only samples between two piggybacked samples are dropped, the reader must cache piggybackEvery samples before asking the writer for resend. But if a piggybacked sample is also lost, the reader will not get around to asking the writer until the next piggybacked sample is received. Note that in this worst case calculation, we are ignoring
❏How long it takes for the writer to respond to the request.
Even ignoring the flight time of the resend request through the transport, the writer takes a finite time to respond to the repair
Line
Line
Chapter 11 Collaborative DataWriters
The Collaborative DataWriters feature allows you to have multiple DataWriters publishing samples from a common logical data source. The DataReaders will combine the samples coming from these DataWriters in order to reconstruct the correct order in which they were produced at the source. This combination process for the DataReaders can be configured using the AVAILABILITY QosPolicy (DDS Extension) (Section 6.5.1). It requires the middleware to provide a way to uniquely identify every sample published in a domain independently of the actual DataWriter that published the sample.
In Connext, every modification (sample) to the global dataspace made by a DataWriter within a domain is identified by a pair (virtual GUID, sequence number).
❏The virtual GUID (Global Unique Identifier) is a
❏The virtual sequence number is a
Several DataWriters can be configured with the same virtual GUID. If each of these DataWriters publishes a sample with sequence number '0', the sample will only be received once by the DataReaders subscribing to the content published by the DataWriters (see Figure 11.1).
Figure 11.1 Global Dataspace Changes
11.1Collaborative DataWriters Use Cases
❏Ordered delivery of samples in high availability scenarios
One example of this is RTI Persistence Service1. When a
❏Ordered delivery of samples in
Multiple instances of the same application can work together to process and deliver samples. When the samples arrive through different
❏Ordered delivery of samples with Group Ordered Access
The Collaborative DataWriters feature can also be used to configure the sample ordering process when the Subscriber is configured with PRESENTATION QosPolicy (Section 6.4.6) access_scope set to GROUP. In this case, the Subscriber must deliver in order the samples published by a group of DataWriters that belong to the same Publisher and have access_scope set to GROUP.
Figure 11.2
1. For more information on Persistence Service, see Part 6: RTI Persistence Service.
11.2Sample Combination (Synchronization) Process in a DataReader
A DataReader will deliver a sample (VGUIDn, VSNm) to the application only when if one of the following conditions is satisfied:
❏(GUIDn,
❏All the known DataWriters publishing VGUIDn have announced that they do not have (VGUIDn,
❏None of the known DataWriters publishing VGUIDn have announced potential availability of (VGUIDn,
For additional details on how the reconstruction process works see the AVAILABILITY QosPolicy (DDS Extension) (Section 6.5.1).
11.3Configuring Collaborative DataWriters
11.3.1Assocating Virtual GUIDs with Data Samples
There are two ways to associate a virtual GUID with the samples published by a DataWriter.
❏Per DataWriter: Using virtual_guid in DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3).
❏Per Sample: By setting the writer_guid in the identity field of the WriteParams_t structure provided to the write_w_params operation (see Writing Data (Section 6.3.8)). Since the writer_guid can be set per sample, the same DataWriter can potentially write samples from independent logical data sources. One example of this is RTI Persistence Service where a single persistence service DataWriter can write samples on behalf of multiple original DataWriters.
11.3.2Assocating Virtual Sequence Numbers with Data Samples
You can associate a virtual sequence number with a sample published by a DataWriter by setting the sequence_number in the identity field of the WriteParams_t structure provided to the write_w_params operation (see Writing Data (Section 6.3.8)). Virtual sequence numbers for a given virtual GUID must be strictly monotonically increasing. If you try to write a sample with a sequence number less than or equal to the last sequence number, the write operation will fail.
11.3.3Specifying which DataWriters will Deliver Samples to the DataReader from a Logical Data Source
The required_matched_endpoint_groups field in the AVAILABILITY QosPolicy (DDS Extension) (Section 6.5.1) can be used to specify the set of DataWriter groups that are expected to provide samples for the same data source (virtual GUID). The quorum count in a group represents the number of DataWriters that must be discovered for that group before the DataReader is allowed to provide
A DataWriter becomes a member of an endpoint group by configuring the role_name in ENTITY_NAME QosPolicy (DDS Extension) (Section 6.5.9).
11.3.4Specifying How Long to Wait for a Missing Sample
A DataReader’s AVAILABILITY QosPolicy (DDS Extension) (Section 6.5.1) specifies how long to wait for a missing sample. For example, this is important when the first sample is received: how long do you wait to determine the lowest sequence number available in the system?
❏The max_data_availability_waiting_time defines how much time to wait before delivering a sample to the application without having received some of the previous samples.
❏The max_endpoint_availability_waiting_time defines how much time to wait to discover DataWriters providing samples for the same data source (virtual GUID).
11.4Collaborative DataWriters and Persistence Service
The DataWriters created by persistence service are automatically configured to do collaboration:
❏Every sample published by the Persistence Service DataWriter keeps its original identity.
❏Persistence Service associates the role name PERSISTENCE_SERVICE with all the DataWriters that it creates. You can overwrite that setting by changing the DataWriter QoS configuration in persistence service.
For more information, see Part 6: RTI Persistence Service.
Chapter 12 Mechanisms for Achieving Information Durability and Persistence
12.1Introduction
Connext offers the following mechanisms for achieving durability and persistence:
❏Durable Writer History This feature allows a DataWriter to persist its historical cache, perhaps locally, so that it can survive shutdowns, crashes and restarts. When an application restarts, each DataWriter that has been configured to have durable writer history automatically load all of the data in this cache from disk and can carry on sending data as if it had never stopped executing. To the rest of the system, it will appear as if the DataWriter had been temporarily disconnected from the network and then reappeared.
❏Durable Reader State This feature allows a DataReader to persist its state and remember which data it has already received. When an application restarts, each DataReader that has been configured to have durable reader state automatically loads its state from disk and can carry on receiving data as if it had never stopped executing. Data that had already been received by the DataReader before the restart will be suppressed so that it is not even sent over the network.
❏Data Durability This feature is a full implementation of the OMG DDS Persistence Profile. The DURABILITY QosPolicy (Section 6.5.7) allows an application to configure a DataWriter so that the information written by the DataWriter survives beyond the lifetime of the DataWriter. In this manner, a
These features can be configured separately or in combination. To use Durable Writer State and Durable Reader State, you need a relational database, which is not included with Connext. Supported databases are listed in the Release Notes. Persistence Service does not require a database when used in TRANSIENT mode (see Section 12.5.1) or in PERSISTENT mode with
To understand how these features interact we will examine the behavior of the system using the following scenarios:
❏Scenario 1. DataReader Joins after DataWriter Restarts (Durable Writer History) (Section 12.1.1)
❏Scenario 2: DataReader Restarts While DataWriter Stays Up (Durable Reader State) (Section 12.1.2)
❏Scenario 3. DataReader Joins after DataWriter Leaves Domain (Durable Data) (Section 12.1.3)
12.1.1Scenario 1. DataReader Joins after DataWriter Restarts (Durable Writer History)
In this scenario, a DomainParticipant joins the domain, creates a DataWriter and writes some data, then the DataWriter shuts down (gracefully or due to a fault). The DataWriter restarts and a DataReader joins the domain. Depending on whether the DataWriter is configured with durable history, the
Figure 12.1 Durable Writer History
DataWriter |
|
DataWriter |
|
|
|
a |
|
|
|
a |
|
|
||
|
|
|
|
|
b |
|
|
|
b |
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DataReader |
|
|
|
|
|||||
|
|
|
DataReader |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
a
b
Without Durable Writer History:
the
With Durable Writer History:
the restarted DataWriter will recover its history and deliver its data to the late- joining DataReader
12.1.2Scenario 2: DataReader Restarts While DataWriter Stays Up (Durable Reader State)
In this scenario, two DomainParticipants join a domain; one creates a DataWriter and the other a DataReader on the same Topic. The DataWriter publishes some data ("a" and "b") that is received by the DataReader. After this, the DataReader shuts down (gracefully or due to a fault) and then
Depending on whether the DataReader is configured with Durable Reader State, the DataReader may or may not receive a duplicate copy of the data it received before it restarted. This is illustrated in Figure 12.2. For more information, see Durable Reader State (Section 12.4).
Figure 12.2 Durable Reader State
DataWriter DataReader
a |
a |
b |
b |
|
|
|
|
|
|
|
|
a b
Without Durable Reader State:
the DataReader will receive the data that was already received before the restart.
DataWriter DataReader
a a
b b
With Durable Reader State:
the DataReader remembers that it already received the data and does not request it again.
12.1.3Scenario 3. DataReader Joins after DataWriter Leaves Domain (Durable Data)
In this scenario, a DomainParticipant joins a domain, creates a DataWriter, publishes some data on a Topic and then shuts down (gracefully or due to a fault). Later, a DataReader joins the domain and subscribes to the data. Persistence Service is running.
Depending on whether Durable Data is enabled for the Topic, the DataReader may or may not receive the data previous published by the DataWriter. This is illustrated in Figure 12.3. For more information, see Data Durability (Section 12.5)
Figure 12.3 Durable Data
DataWriter
a
b
DataReader
Without Durable Data:
the
DataWriter Persistence
Service
a a
b b
DataReader
a
b
With Durable Data:
Persistence Service remembers what data was published and delivers it to the
This third scenario is similar to Scenario 1. DataReader Joins after DataWriter Restarts (Durable Writer History) (Section 12.1.1) except that in this case the DataWriter does not need to restart for the DataReader to get the data previously written by the DataWriter. This is because Persistence Service acts as an intermediary that stores the data so it can be given to
12.2Durability and Persistence Based on Virtual GUIDs
Every modification to the global dataspace made by a DataWriter is identified by a pair (virtual GUID, sequence number).
❏The virtual GUID (Global Unique Identifier) is a
❏The sequence number is a
DataWriter.
Several DataWriters can be configured with the same virtual GUID. If each of these DataWriters publishes a sample with sequence number '0', the sample will only be received once by the DataReaders subscribing to the content published by the DataWriters (see Figure 12.4).
Figure 12.4 Global Dataspace Changes
|
DataWriter |
|
(vg: 1, sn: 0) |
|
|
|
|
||
|
(vg: 1) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(vg: 1, sn: 0) |
|
|
|
|
|
|
|
|
(vg: 1, sn: 0) |
DataReader |
||
|
|
|
|
|
|
|
|
||
|
DataWriter |
|
|
|
|
|
|
||
|
|
|
|
|
|
|
(vg: 1) |
||
|
|
|
|
|
|
|
|
||
|
(vg: 1) |
|
|
|
|
(vg: 2, sn: 0) |
(vg: 2, sn: 0) |
||
|
|
|
|
|
|
||||
|
|
|
(vg: 1, sn: 0) |
|
|
|
|||
|
|
|
|
|
|
|
|
||
DataWriter |
|
|
|
|
|
|
|
||
(vg: 2) |
|
(vg: 2, sn: 0) |
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
Additionally, Connext uses the virtual GUID to associate a persisted state (state in permanent storage) to the corresponding Entity.
For example, the history of a DataWriter will be persisted in a database table with a name generated from the virtual GUID of the DataWriter. If the DataWriter is restarted, it must have associated the same virtual GUID to restore its previous history.
Likewise, the state of a DataReader will be persisted in a database table whose name is generated from the DataReader virtual GUID (see Figure 12.5).
Figure 12.5 History/State Persistence Based on the Virtual GUID
DataWriter |
|
DataReader |
|
|
|
vg: 1 |
|
|
|
vg: 1 |
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A DataWriter’s virtual GUID can be configured using the member virtual_guid in the DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) (Section 6.5.3).
A DataReader’s virtual GUID can be configured using the member virtual_guid in the DATA_READER_PROTOCOL QosPolicy (DDS Extension) (Section 7.6.1).
The DDS_PublicationBuiltinTopicData and DDS_SubscriptionBuiltinTopicData structures include the virtual GUID associated with the discovered publication or subscription (see
12.3Durable Writer History
The DURABILITY QosPolicy (Section 6.5.7) controls whether or not, and how, published samples are stored by the DataWriter application for DataReaders that are found after the samples were initially written. The samples stored by the DataWriter constitute the DataWriter’s history.
Connext provides the capability to make the DataWriter history durable, by persisting its content in a relational database. This makes it possible for the history to be restored when the DataWriter restarts. See the Release Notes for the list of supported relational databases.
The association between the history stored in the database and the DataWriter is done using the virtual GUID.
12.3.1Durable Writer History Use Case
The following use case describes the durable writer history functionality:
1.A DataReader receives two samples with sequence number 1 and 2 published by a DataWriter with virtual GUID 1.
1, 2 |
|
DataWriter |
1, 2 |
|
DataReader |
1, 2 |
||
|
|
|
|
|
|
|
||
|
|
(vg: 1) |
|
|
|
(vg: 1) |
|
|
|
|
|
|
|
|
|
|
|
2.The process running the DataWriter is stopped and a new
DataReader (vg: 1)
DataReader (vg: 2)
The new DataReader with virtual GUID 2 does not receive samples 1 and 2 because the original DataWriter has been destroyed. If the samples must be available to
DataReaders after the DataWriter deletion, you can use Persistence Service, described in Chapter 26: Introduction to RTI Persistence Service.
3. The DataWriter is restarted using the same virtual GUID.
DataWriter
DataReader
(vg: 1)
1, 2
(vg: 1)
DataReader |
|
1, 2 |
(vg: 2) |
|
|
|
||
|
|
|
After being restarted, the DataWriter restores its history. The
4. The DataWriter publishes two new samples.
|
|
|
3, 4 |
DataReader |
3, 4 |
||
|
|
|
|
|
|
||
|
DataWriter |
|
|
(vg: 1) |
|
|
|
|
(vg: 1) |
|
3, 4 |
|
|
3, 4 |
|
|
|
|
|
||||
|
|
|
|
|
|||
|
|
|
DataReader |
|
|||
|
|
|
|
|
|||
|
|
|
|
(vg: 2) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The two new samples with sequence numbers 3 and 4 will be received by both DataRead- ers.
12.3.2How To Configure Durable Writer History
Connext allows a DataWriter’s history to be stored in a relational database that provides an ODBC driver.
For each DataWriter history that is configured to be durable, Connext will create a maximum of two tables:
❏The first table is used to store the samples associated with the writer history. The name of that table is WS<32 uuencoding of the writer virtual GUID>.
❏The second table is only created for
To configure durable writer history, use the PROPERTY QosPolicy (DDS Extension) (Section 6.5.17) associated with DataWriters and DomainParticipants.
A ‘durable writer history’ property defined in the DomainParticipant will be applicable to all the DataWriters belonging to the DomainParticipant unless it is overwritten by the DataWriter. Table 12.1 lists the supported ‘durable writer history’ properties.
Table 12.1 Durable Writer History Properties
Property |
Description |
|
|
|
Required. |
dds.data_writer.history.plugin_name |
Must be set to "dds.data_writer.history.odbc_plugin.builtin" to enable |
|
durable writer history in the DataWriter. |
|
|
dds.data_writer.history.odbc_plugin. |
Required. |
The ODBC DSN (Data Source Name) associated with the database where |
|
dsn |
the writer history must be persisted. |
|
|
|
|
|
Tells Connext which ODBC driver to load. If the property is not |
dds.data_writer.history.odbc_plugin. |
specified, Connext will try to use the standard ODBC driver manager |
driver |
library (UnixOdbc on UNIX/Linux systems, the Windows ODBC driver |
|
manager on Windows systems). |
|
|
dds.data_writer.history.odbc_plugin. |
|
username |
Configures the username/password used to connect to the database. |
|
Default: No password or username |
dds.data_writer.history.odbc_plugin. |
|
password |
|
|
|
|
When set to 1, Connext will create a single connection per DSN that will |
dds.data_writer.history.odbc_plugin. |
be shared across DataWriters within the same Publisher. |
shared |
A DataWriter can be configured to create its own database connection by |
|
setting this property to 0 (the default). |
|
|
Table 12.1 Durable Writer History Properties
Property |
|
|
Description |
|
|
||
|
|
||||||
|
|
||||||
dds.data_writer.history.odbc_plugin. |
These properties configure the resource limits associated with the ODBC |
||||||
instance_cache_max_size |
writer history caches. |
|
|
|
|
||
|
To minimize the number of accesses to the database, Connext uses two |
||||||
dds.data_writer.history.odbc_plugin. |
|||||||
instance_cache_init_size |
caches, one for samples and one for instances. The initial size and the |
||||||
|
maximum size of these caches are configured using these properties. |
|
|||||
dds.data_writer.history.odbc_plugin. |
|
||||||
sample_cache_max_size |
The resource limits, initial_instances, max_instances, initial_samples, |
||||||
max_samples, |
and |
max_samples_per_instance |
defined |
in |
|||
|
|||||||
|
|||||||
|
RESOURCE_LIMITS QosPolicy (Section 6.5.20) are used to configure the |
||||||
|
maximum number of samples and instances that can be stored in the |
||||||
|
relational database. |
|
|
|
|
||
|
Defaults: |
|
|
|
|
|
|
|
❏ instance_cache_max_size: |
max_instances |
in |
||||
|
|
|
|||||
dds.data_writer.history.odbc_plugin. |
❏ instance_cache_init_size: |
initial_instances |
in |
||||
sample_cache_init_size |
|
|
|||||
|
❏ sample_cache_max_size: 32 |
|
|
|
|||
|
❏ sample_cache_init_size: 32 |
|
|
|
|||
|
Note: If the property in_memory_state (see below in this table) is 1, |
||||||
|
then instance_cache_max_size is always equal to max_instances in |
||||||
|
RESOURCE_LIMITS QosPolicy (Section |
||||||
|
changed. |
|
|
|
|
|
|
|
|
||||||
|
This property indicates whether or not the persisted writer history must |
||||||
|
be restored once the DataWriter is restarted. |
|
|
||||
dds.data_writer.history.odbc_plugin. |
If this property is 0, the content of the database associated with the |
||||||
DataWriter being restarted will be deleted. |
|
|
|
||||
restore |
If it is 1, the DataWriter will restore its previous state from the database |
||||||
|
|||||||
|
content. |
|
|
|
|
|
|
|
Default: 1 |
|
|
|
|
|
|
|
|
||||||
|
This property determines how much state will be kept in memory by the |
||||||
|
ODBC writer history in order to avoid accessing the database. |
|
|||||
|
If this property is 1, then the property |
instance_cache_max_size (see |
|||||
|
above in this table) is always equal to max_instances |
in |
|||||
|
RESOURCE_LIMITS QosPolicy (Section |
||||||
|
In addition, the ODBC writer history will keep in memory a fixed state |
||||||
dds.data_writer.history.odbc_plugin. |
overhead of 24 bytes per sample. This mode provides the best ODBC |
||||||
in_memory_state |
writer history performance. However, the restore operation will be |
||||||
|
slower and the maximum number of samples that the writer history can |
||||||
|
manage is limited by the available physical memory. |
|
|
||||
|
If it is 0, all the state will be kept in the underlying database. In this |
||||||
|
mode, the maximum number of samples in the writer history is not |
||||||
|
limited by the physical memory available. |
|
|
|
|||
|
Default: 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Note: Durable Writer History is not supported for
See also: Durable Reader State (Section 12.4).
Example C++ Code
/* Get default QoS */
...
retcode = DDSPropertyQosPolicyHelper::add_property (writerQos.property, "dds.data_writer.history.plugin_name", "dds.data_writer.history.odbc_plugin.builtin",
DDS_BOOLEAN_FALSE);
if (retcode != DDS_RETCODE_OK) { /* Report error */
}
retcode = DDSPropertyQosPolicyHelper::add_property (writerQos.property, "dds.data_writer.history.odbc_plugin.dsn",
"<user DSN>", DDS_BOOLEAN_FALSE);
if (retcode != DDS_RETCODE_OK) { /* Report error */
}
retcode = DDSPropertyQosPolicyHelper::add_property (writerQos.property, "dds.data_writer.history.odbc_plugin.driver",
"<ODBC library>", DDS_BOOLEAN_FALSE);
if (retcode != DDS_RETCODE_OK) { /* Report error */
}
retcode = DDSPropertyQosPolicyHelper::add_property (writerQos.property, "dds.data_writer.history.odbc_plugin.shared", "<0|1>",
DDS_BOOLEAN_FALSE); if (retcode != DDS_RETCODE_OK) {
/* Report error */
}
/* Create Data Writer */
...
12.4Durable Reader State
Durable reader state allows a DataReader to locally store its state in disk and remember the data that has already been processed by the application1. When an application restarts, each DataReader configured to have durable reader state automatically reads its state from disk. Data that has already been processed by the application before the restart will not be provided to the application again.
Important: The DataReader does not persist the full contents of the data in its historical cache; it only persists an identification (e.g. sequence numbers) of the data the application has processed. This distinction is not meaningful if your application always uses the ‘take’ methods to access your data, since these methods remove the data from the cache at the same time they deliver it to your application. (See Read vs. Take (Section 7.4.3.1)) However, if your application uses the ‘read’ methods, leaving the data in the DataReader's cache after you've accessed it for the first time, those previously viewed samples will not be restored to the DataReader's cache in the event of a restart.
Connext requires a relational database to persist the state of a DataReader. This database is accessed using ODBC. See the Release Notes for the list of supported relational databases.
12.4.1Durable Reader State With Protocol Acknowledgment
For each DataReader configured to have durable state, Connext will create one database table with the following naming convention: RS<32 uuencoding of the reader virtual GUID>. This table will store the last sequence number processed from each virtual GUID. For DataReaders on
1.The circumstances under which a data sample is considered “processed by the application” are described in the sections that follow.
keyed topics requesting
Criteria to consider a sample “processed by the application”
❏For the read/take methods that require calling return_loan(), a sample 's1' with sequence number 's1_seq_num' and virtual GUID ‘vg1’ is considered processed by the application when the DataReader’s return_loan() operation is called for sample 's1' or any other sample with the same virtual GUID and a sequence number greater than 's1_seq_num'. For example:
retcode =
if (retcode == DDS_RETCODE_NO_DATA) { return;
}else if (retcode != DDS_RETCODE_OK) { /* report error */
return;
}
for (i = 0; i < data_seq.length(); ++i) { /* Operate with the data */
}
/* Return the loan */
retcode =
/* Report and error */
}
/* At this point the samples contained in data_seq will be considered as received. If the DataReader restarts, the samples will not be received again */
❏For the read/take methods that do not require calling return_loan(), a sample 's1' with sequence number 's1_seq_num' and virtual GUID ‘vg1’ will be considered processed after the application reads or takes the sample 's1' or any other sample with the same virtual GUID and with a sequence number greater than 's1_seq_num'. For example:
retcode =
/* At this point the sample contained in data will be considered as received. All the samples with a sequence number smaller than the sequence number associated with data will also be considered as received. If the DataReader restarts these sample will not be received again */
Important: If you access the samples in the DataReader cache out of
12.4.1.1Bandwidth Utilization
To optimize network usage, if a DataReader configured with durable reader state is restarted and it discovers a DataWriter with a virtual GUID ‘vg’, the DataReader will ACK all the samples with a sequence number smaller than ‘sn’, where ‘sn’ is the first sequence number that has not been being processed by the application for ‘vg’.
Notice that the previous algorithm can significantly reduce the number of duplicates on the wire. However, it does not suppress them completely in the case of keyed DataReaders where the durable state is kept per (instance, virtual GUID). In this case, and assuming that the application has read samples out of order (e.g., by reading different instances), the ACK is sent for the
lowest sequence number processed across all instances and may cause samples already processed to flow on the network again. These redundant samples waste bandwidth, but they will be dropped by the DataReader and not be delivered to the application.
12.4.2Durable Reader State with Application Acknowledgment
This section assumes you are familiar with the concept of Application Acknowledgment as described in Section 6.3.12.
For each DataReader configured to be durable and that uses application acknowledgement (see Section 6.3.12), Connext will create one database table with the following naming convention:
RS<32 uuencoding of the reader virtual GUID>. This table will store the list of sequence number intervals that have been acknowledged for each virtual GUID. The size of the column that stores the sequence number intervals is limited to 32767 bytes. If this size is exceeded for a given virtual GUID, the operation that persists the DataReader state into the database will fail.
12.4.2.1Bandwidth Utilization
To optimize network usage, if a DataReader configured with durable reader state is restarted and it discovers a DataWriter with a virtual GUID ‘vg’, the DataReader will send an APP_ACK message with all the samples that were
Notice that this algorithm can significantly reduce the number of duplicates on the wire. However, it does not suppress them completely since the DataReader may send a NACK and receive some samples from the DataWriter before the DataWriter receives the APP_ACK message.
12.4.3Durable Reader State Use Case
The following use case describes the durable reader state functionality:
1.A DataReader receives two samples with sequence number 1 and 2 published by a DataWriter with virtual GUID 1. The application takes those samples.
1, 2 |
|
DataWriter |
|
1, 2 |
|
DataReader |
|
take 1, 2 |
|
|
|
(vg: 1) |
|
|
|
|
(vg: 1) |
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
2.After the application returns the loan on samples 1 and 2, the DataReader considers them as processed and it persists the state change.
|
DataWriter |
|
|
DataReader |
return loan 1, 2 |
||
|
(vg: 1) |
|
|
(vg: 1) |
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
(dw vg: 1,last sn: 2)
3. The process running the DataReader is stopped.
4. The DataReader is restarted.
DataWriter |
|
DataReader |
(vg: 1) |
|
(vg: 1) |
|
|
|
(dw vg: 1,last sn: 2)
Because all the samples with sequence number smaller or equal than 2 were considered as received, the reader will not ask for these samples to the DataWriter.
12.4.4How To Configure a DataReader for Durable Reader State
To configure a DataReader with durable reader state, use the PROPERTY QosPolicy (DDS Extension) (Section 6.5.17) associated with DataReaders and DomainParticipants.
A property defined in the DomainParticipant will be applicable to all the DataReaders contained in the participant unless it is overwritten by the DataReaders. Table 12.2 lists the supported properties.
Table 12.2 Durable Reader State Properties
Property |
Description |
|
|
|
|
|
Required. |
|
dds.data_reader.state.odbc.dsn |
The ODBC DSN (Data Source Name) associated with the database where |
|
|
the DataReader state must be persisted. |
|
|
|
|
|
To enable durable reader state, this property must be set to 1. |
|
dds.data_reader.state. |
When set to 0, the reader state is not maintained and Connext does not |
|
filter_redundant_samples |
filter duplicate samples that may be coming from the same virtual writer. |
|
|
Default: 1 |
|
|
|
|
|
This property indicates which ODBC driver to load. If the property is not |
|
dds.data_reader.state.odbc.driver |
specified, Connext will try to use the standard ODBC driver manager |
|
library (UnixOdbc on UNIX/Linux systems, the Windows ODBC driver |
||
|
||
|
manager on Windows systems). |
|
|
|
|
dds.data_reader.state.odbc.username |
These two properties configure the username and password used to |
|
|
connect to the database. |
|
dds.data_reader.state.odbc.password |
||
Default: No password or username |
||
|
|
|
|
This property indicates if the persisted DataReader state must be restored |
|
|
or not once the DataReader is restarted. |
|
dds.data_reader.state.restore |
If this property is 0, the previous state will be deleted from the database. |
|
If it is 1, the DataReader will restore its previous state from the database |
||
|
||
|
content. |
|
|
Default: 1 |
|
|
|
|
|
This property controls how often the reader state is stored into the |
|
|
database. A value of N means store the state once every N samples. |
|
dds.data_reader.state. |
A high frequency will provide better performance. However, if the |
|
reader is restarted it may receive some duplicate samples. These samples |
||
checkpoint_frequency |
||
will be filtered by Connext and they will not be propagated to the |
||
|
||
|
application. |
|
|
Default: 1 |
|
|
|
|
dds.data_reader.state.persistence_ |
This property indicates how many of the most recent historical samples |
|
the persisted DataReader wants to receive upon |
||
service.request_depth |
||
Default: 0 |
||
|
||
|
|
Example (C++ code):
/* Get default QoS */
...
retcode = DDSPropertyQosPolicyHelper::add_property( readerQos.property, "dds.data_reader.state.odbc.dsn", "<user DSN>",
DDS_BOOLEAN_FALSE); if (retcode != DDS_RETCODE_OK) {
/* Report error */
}
retcode = DDSPropertyQosPolicyHelper::add_property(readerQos.property, "dds.data_reader.state.odbc.driver", "<ODBC library>", DDS_BOOLEAN_FALSE);
if (retcode != DDS_RETCODE_OK) { /* Report error */
}
retcode = DDSPropertyQosPolicyHelper::add_property(readerQos.property, "dds.data_reader.state.restore", "<0|1>", DDS_BOOLEAN_FALSE);
if (retcode != DDS_RETCODE_OK) { /* Report error */
}
/* Create Data Reader */
...
12.5Data Durability
The data durability feature is an implementation of the OMG DDS Persistence Profile. The DURABILITY QosPolicy (Section 6.5.7) allows an application to configure a DataWriter so that the information written by the DataWriter survives beyond the lifetime of the DataWriter.
Connext implements TRANSIENT and PERSISTENT durability using an external service called Persistence Service, available for purchase as a separate RTI product.
Persistence Service receives information from DataWriters configured with TRANSIENT or PERSISTENT durability and makes that information available to
The samples published by a DataWriter can be made durable by setting the kind field of the DURABILITY QosPolicy (Section 6.5.7) to one of the following values:
❏DDS_TRANSIENT_DURABILITY_QOS: Connext will store previously published samples in memory using Persistence Service, which will send the stored data to newly discovered DataReaders.
❏DDS_PERSISTENT_DURABILITY_QOS: Connext will store previously published samples in permanent storage, like a disk, using Persistence Service, which will send the stored data to newly discovered DataReaders.
A DataReader can request TRANSIENT or PERSISTENT data by setting the kind field of the corresponding DURABILITY QosPolicy (Section 6.5.7). A DataReader requesting PERSISTENT data will not receive data from DataWriters or Persistence Service applications that are configured with TRANSIENT durability.
12.5.1RTI Persistence Service
Persistence Service is a Connext application that is configured to persist topic data. Persistence Service is included with Connext Messaging. For each one of the topics that must be persisted for a specific domain, the service will create a DataWriter (known as PRSTDataWriter) and a DataReader (known as PRSTDataReader). The samples received by the PRSTDataReaders will be published by the corresponding PRSTDataWriters to be available for
For more information on Persistence Service, please see:
❏Chapter 26: Introduction to RTI Persistence Service
❏Chapter 27: Configuring Persistence Service
❏Chapter 28: Running RTI Persistence Service
Persistence Service can be configured to operate in PERSISTENT or TRANSIENT mode:
❏TRANSIENT mode The PRSTDataReaders and PRSTDataWriters will be created with TRANSIENT durability and Persistence Service will keep the received samples in memory. Samples published by a TRANSIENT DataWriter will survive the DataWriter lifecycle but will not survive the lifecycle of Persistence Service (unless you are running multiple copies).
❏PERSISTENT mode The PRSTDataWriters and PRSTDataReaders will be created with PERSISTENT durability and Persistence Service will store the received samples in files or in an external relational database. Samples published by a PERSISTENT DataWriter will survive the DataWriter lifecycle as well as any restarts of Persistence Service.
By default, a PERSISTENT/TRANSIENT DataReader will receive samples directly from the original DataWriter if it is still alive. In this scenario, the DataReader may also receive the same samples from Persistence Service. Duplicates will be discarded at the middleware level. This Peer-
Figure 12.6
|
DataWriter |
|
|
|
|
|
(vg: 1, sn: 0) |
|
|
||
|
|
|
|
|
DataReader |
||||||
|
(vg: 1) |
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
(vg: 1) |
||
|
|
|
(vg: 1, sn: 0) |
|
(vg: 1, sn: 0) |
|
|
||||
|
|
|
|
|
(vg: 1, sn: 0) |
|
|||||
|
|
|
|
|
|
|
|||||
|
|
|
(vg: 1, sn: 0) |
|
(vg: 1, sn: 0) |
||||||
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
RTI Persistence |
The application only |
|
receives one sample. |
||
Service |
||
|
Relay Communication:
A PERSISTENT/TRANSIENT DataReader may also be configured to not receive samples from the original DataWriter. In this case the traffic is relayed by Persistence Service. This ‘relay communication’ pattern is illustrated in Figure 12.7. To use relay communication, set the direct_communication field in the DURABILITY QosPolicy (Section 6.5.7) to FALSE. A PERSISTENT/TRANSIENT DataReader will receive all the information from Persistence Service.
Figure 12.7 Relay Communication
|
|
|
|
|
(vg: 1, sn: 0) |
|
|
||
|
|
|
|
|
|
|
|
DataReader |
|
|
DataWriter |
|
|
RTI Persistence |
|||||
|
|
|
|
|
|
(vg: 1) |
|||
|
(vg: 1) |
|
|
Service |
|
|
|
|
|
|
|
(vg: 1, sn: 0) |
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(vg: 1, sn: 0) |
Chapter 13 Guaranteed Delivery of Data
13.1Introduction
Some application scenarios need to ensure that the information produced by certain producers is delivered to all the intended consumers. This chapter describes the mechanisms available in Connext to guarantee the delivery of information from producers to consumers such that the delivery is robust to many kinds of failures in the infrastructure, deployment, and even the producing/consuming applications themselves.
Guaranteed information delivery is not the same as
❏With
❏With information durability alone, there is no way to specify or characterize the intended consumers of the information. Therefore the infrastructure has no way to know when the information has been consumed by all the intended recipients. The information may be persisted such that it is not lost and is available to future applications, but the infrastructure and producing applications have no way to know that all the intended consumers have joined the system, received the information, and processed it successfully.
The guaranteed
❏Required subscriptions. This feature provides a way to configure, identify and detect the applications that are intended to consume the information. See Required Subscriptions (Section 6.3.13).
❏
❏Durable subscriptions. This feature leverages the RTI Persistence Service to persist samples intended for the required subscriptions such that they are delivered even if the originating application is not available. See Configuring Durable Subscriptions in Persistence Service (Section 27.9).
These features used in combination with the mechanisms provided for Information Durability and Persistence (see Chapter 12: Mechanisms for Achieving Information Durability and Persistence) enable the creation of applications where the information delivery is guaranteed despite application and infrastructure failures. Scenarios (Section 13.2) describes various
When implementing an application that needs guaranteed data delivery, we have to consider three key aspects:
Key Aspects to Consider |
|
Related Features and QoS |
|
|
|
|
|
|
• |
Required subscriptions |
|
Identifying the required consumers of information |
• |
Durable subscriptions |
|
• |
EntityName QoS policy |
||
|
|||
|
• |
Availability QoS policy |
|
|
|
||
|
• |
||
Ensuring the intended consumer applications |
• Acknowledgment by a quorum of required and |
||
|
durable subscriptions |
||
process the data successfully |
|
||
• Reliability QoS policy (acknowledgment mode) |
|||
|
|||
|
• |
Availability QoS policy |
|
|
|
|
|
|
• |
Persistence Service |
|
Ensuring information is available to late joining |
• |
Durable Subscriptions |
|
applications |
• |
Durability QoS |
|
|
• |
Durable Writer History |
|
|
|
|
13.1.1Identifying the Required Consumers of Information
The first step towards ensuring that information is processed by the intended consumers is the ability to specify and recognize those intended consumers. This is done using the required subscriptions feature (Required Subscriptions (Section 6.3.13)) configured via the ENTITY_NAME QosPolicy (DDS Extension) (Section 6.5.9) and AVAILABILITY QosPolicy (DDS Extension) (Section 6.5.1)).
Connext DDS DataReader entities (as well as DataWriter and DomainParticipant entities) can have a name and a role_name. These names are configured using the ENTITY_NAME QosPolicy (DDS Extension) (Section 6.5.9), which is propagated via DDS discovery and is available as part of the
The DDS DomainParticipant, DataReader and DataWriter entities created by
DataReaders created by RTI Persistence Service have their role_name set to “PERSISTENCE_SERVICE”.
Unless explicitly set by the user, the DomainParticipant, DataReader and DataWriter entities created by
Connext uses the role_name of DataReaders to identify the consumer’s logical function. For this reason Connext’s required subscriptions feature relies on the role_name to identify intended consumers of information. The use of the DataReader’s role_name instead of the name is intentional. From the point of view of the information producer, the important thing is not the
concrete DataReader (identified by its name, for example, “Logger123”) but rather its logical function in the system (identified by its role_name, for example “LoggingService”).
A DataWriter that needs to ensure its information is delivered to all the intended consumers uses the AVAILABILITY QosPolicy (DDS Extension) (Section 6.5.1) to configure the role names of the consumers that must receive the information.
The AVAILABILITY QoS Policy set on a DataWriter lets an application configure the required consumers of the data produced by the DataWriter. The required consumers are specified in the required_matched_endpoint_groups attribute within the AVAILABILITY QoS Policy. This attribute is a sequence of DDS EndpointGroup structures. Each EndpointGroup represents a required information consumer characterized by the consumer’s role_name and quorum. The role_name identifies a logical consumer; the quorum specifies the minimum number of consumers with that role_name that must acknowledge the sample before the DataWriter can consider it delivered to that required consumer.
For example, an application that wants to ensure data written by a DataWriter is delivered to at least two Logging Services and one Display Service would configure the DataWriter’s AVAILABILITY QoS Policy with a required_matched_endpoint_groups consisting of two elements. The first element would specify a required consumer with the role_name “LoggingService” and a quorum of 2. The second element would specify a required consumer with the role_name “DisplayService” and a quorum of 1. Furthermore, the application would set the logging service DataReader ENTITY_NAME policy to have a role_name of “LoggingService” and similarly the display service DataReader ENTITY_NAME policy to have the role_name of “DisplayService.”
A DataWriter that has been configured with an AVAILABILITY QoS policy will not remove samples from the DataWriter cache until they have been “delivered” to both the already- discovered DataReaders and the minimum number (quorum) of DataReaders specified for each role. In particular, samples will be retained by the DataWriter if the quorum of matched DataReaders with a particular role_name have not been discovered yet.
We used the word “delivered” in quotes above because the level of assurance a DataWriter has that a particular sample has been delivered depends on the setting of the RELIABILITY QosPolicy (Section 6.5.19). We discuss this next in Section 13.1.2.
13.1.2Ensuring Consumer Applications Process the Data Successfully
Section 13.1.1 described mechanisms by which an application could configure who the required consumers of information are. This section is about the criteria, mechanisms, and assurance provided by Connext to ensure consumers have the information delivered to them and process it in a successful manner.
RTI provides four levels of information delivery guarantee. You can set your desired level using the RELIABILITY QosPolicy (Section 6.5.19). The levels are:
❏
❏ Reliable with protocol acknowledgment The
cache. However, there is no guarantee the application actually processed the sample. The application might crash before processing the sample, or it might simply fail to read it from the cache.
❏ Reliable with Application Acknowledgment (Auto) Application Acknowledgment in Auto mode causes Connext to send an additional
❏ Reliable with Application Acknowledgment (Explicit) Application Acknowledgment in Explicit mode causes Connext to send an
13.1.3Ensuring Information is Available to
The third aspect of guaranteed data delivery addresses situations where the application needs to ensure that the information produced by a particular DataWriter is available to DataReaders that join the system after the data was produced. The need for data delivery may even extend beyond the lifetime of the producing application; that is, it may be required that the information is delivered to applications that join the system after the producing application has left the system.
Connext provides four mechanisms to handle these scenarios:
❏The DDS Durability QoS Policy. The DURABILITY QosPolicy (Section 6.5.7) specifies whether samples should be available to late joiners. The policy is set on the DataWriter and the DataReader and supports four kinds: VOLATILE, TRANSIENT_LOCAL, TRANSIENT, or PERSISTENT. If the DataWriter’s Durability QoS policy is set to VOLATILE kind, the DataWriter’s samples will not be made available to any late joiners. If the DataWriter’s policy kind is set to TRANSIENT_LOCAL, TRANSIENT, or PERSISTENT, the samples will be made available for
❏Durable Writer History. A DataWriter configured with a DURABILITY QoS policy kind other than VOLATILE keeps its data in a local cache so that it is available when the late- joining application appears. The data is maintained in the DataWriter’s cache until it is considered to be no longer needed. The precise criteria depends on the configuration of additional QoS policies such as LIFESPAN QoS Policy (Section 6.5.12), HISTORY QosPolicy (Section 6.5.10), RESOURCE_LIMITS QosPolicy (Section 6.5.20), etc. For the purposes of guaranteeing information delivery it is important to note that the
DataWriter’s cache can be configured to be a memory cache or a durable
❏RTI Persistence Service. This service allows the information produced by a DataWriter to survive beyond the lifetime of the producing application. Persistence Service is an stand- alone application that runs on many supported platforms. This service complies with the Persistent Profile of the OMG DDS specification. The service uses DDS to subscribe to the DataWriters that specify a DURABILITY QosPolicy (Section 6.5.7) kind of TRANSIENT or PERSISTENT. Persistence Service receives the data from those DataWriters, stores the data in its internal caches, and makes the data available via DataWriters (which are automatically created by Persistence Service) to
❏Durable Subscriptions. This is a Persistence Service configuration setting that allows configuration of the required subscriptions (Identifying the Required Consumers of Information (Section 13.1.1)) for the data stored by Persistence Service (Managing Data Instances (Working with Keyed Data Types) (Section 6.3.14)). Configuring required subscriptions for Persistence Service ensures that the service will store the samples until they have been delivered to the configured number (quorum) of DataReaders that have each of the specified roles.
13.2Scenarios
In each of the scenarios below, we assume both the DataWriter and DataReader are configured for strict reliability (RELIABLE ReliabilityQosPolicyKind and KEEP_ALL HistoryQosPolicyKind, see Section 10.3.3). As a result, when the DataWriter’s cache is full of unacknowledged samples, the write() operation will block until samples are acknowledged by all the intended consumers.
13.2.1Scenario 1: Guaranteed Delivery to
A common use case is to guarantee delivery to a set of known subscribers. These subscribers may be already running and have been discovered, they may be temporarily
To guarantee delivery, the list of required subscribers should be configured using the AVAILABILITY QosPolicy (DDS Extension) (Section 6.5.1) on the DataWriters to specify the role_name and quorum for each required subscription. Similarly the ENTITY_NAME QosPolicy (DDS Extension) (Section 6.5.9) should be used on the DataReaders to specify their role_name. In
addition we use Application Acknowledgment (Section 6.3.12) to guarantee the sample was delivered and processed by the DataReader.
Figure 13.1 Guaranteed Delivery Scenario 1
The DataWriter and DataReader RELIABILITY QoS Policy can be configured for either AUTO or EXPLICIT application acknowledgment kind. As the DataWriter publishes the sample, it will await acknowledgment from the DataReader (through the
In this specific scenario, DataReader #1 is configured for EXPLICIT application acknowledgment. After reading and processing the sample, the subscribing application calls acknowledge_sample() or acknowledge_all() (see Section 7.4.4). As a result, Connext will send an
If the sample was lost in transit, the reliability protocol will repair the sample. Since it has not been acknowledged, it remains available in the writer’s queue to be automatically resent by Connext. The sample will remain available until acknowledged by the application. If the subscribing application crashes while processing the sample and restarts, Connext will repair the unacknowledged sample. Samples which already been processed and acknowledged will not be resent.
In this scenario, DataReader #2 may be a late joiner. When it starts up, because it is configured with TRANSIENT_LOCAL Durability, the reliability protocol will
DataWriter because they had not been confirmed yet by the required subscription (identified by its role_name: ‘logger’).
DataReader #2 does not explicitly acknowledge the samples it reads. It is configured to use AUTO application acknowledgment, which will automatically acknowledge samples that have been read or taken after the application calls the DataReader return_loan operation.
This configuration works well for situations where the DataReader may not be immediately available or may restart. However, this configuration does not provide any guarantee if the DataWriter restarts. When the DataWriter restarts, samples previously unacknowledged are lost and will no longer be available to any late joining DataReaders.
13.2.2Scenario 2: Surviving a Writer Restart when Delivering Samples to a priori Known Subscribers
Scenario 1 describes a use case where samples are delivered to a list of a priori known subscribers. In that scenario, Connext will deliver samples to the
To handle a situation where the producing application is restarted, we will use the Durable Writer History (Section 12.3) feature. See Figure 13.2 on page
A DataWriter can be configured to maintain its data and state in durable storage. This configuration is done using the PROPERTY QoS policy as described in Section 12.3.2.. With this configuration the data samples written by the DataWriter and any necessary internal state is persisted by the DataWriter into durable storage As a result, when the DataWriter restarts, samples which had not been acknowledged by the set of required subscriptions will be resent and
13.2.3Scenario 3: Delivery Guaranteed by Persistence Service (Store and Forward) to a priori Known Subscribers
Previous scenarios illustrated that using the DURABILITY, RELIABILITY, and AVAILABILITY QoS policies we can ensure that as long as the DataWriter is present in the system, samples written by a DataWriter will be delivered to the intended consumers. The use of the durable writer history in the previous scenario extended this guarantee even in the presence of a restart of the application writing the data.
This scenario addresses the situation where the originating application that produced the data is no longer available. For example, the network could have become partitioned, the application could have been terminated, it could have crashed and not have been restarted, etc.
In order to deliver data to applications that appear after the producing application is no longer available on the network it is necessary to have another service that stores those samples and delivers them. This is the purpose of the RTI Persistence Service.
The RTI Persistence Service can be configured to automatically discover DataWriters that specify a DURABILITY QoS with kind TRANSIENT or PERSISTENT and automatically create pairs (DataReader, DataWriter) that receive and store that information (see Chapter 26: Introduction to RTI Persistence Service). All the DataReaders created by the RTI Persistence Service have the ENTITY_QOS policy set with the role_name of “PERSISTENCE_SERVICE”. This allows an application to specify Persistence Service as one of the required subscriptions for its DataWriters.
In this third scenario, we take advantage of this capability to configure the DataWriter to have the RTI Persistence Service as a required subscription. See Figure 13.3 on page
The RTI Persistence Service can also have its DataWriters configured with required subscriptions. This feature is known as Persistence Service “durable subscriptions”. DataReader #1 is pre configured in Persistence Service as a Durable Subscription. (Alternatively, DataReader #1 could
Figure 13.2 Guaranteed Delivery Scenario 2
Figure 13.3 Guaranteed Delivery Scenario 3
have registered itself dynamically as Durable Subscription using the DomainParticipant register_durable_subscription() operation).
We also configure the RELIBILITY QoS policy setting of the AcknowledgmentKind to APPLICATION_AUTO_ACKNOWLEDGMENT_MODE in order to ensure samples are stored in the Persistence Service and properly processed on the consuming application prior to them being removed from the DataWriter cache.
With this configuration in place the DataWriter will deliver samples to the DataReader and to the Persistence Service reliably and wait for the Application Acknowledgment from both. Delivery of samples to DataReader #1 and the Persistence Service occurs concurrently. The Persistence Service in turn takes responsibility to deliver the samples to the configured “logger” durable subscription. If the original publisher is no longer available, samples can still be delivered by the Persistence Service. to DataReader #1 and any other
When DataReader #1 acknowledges the sample through an
13.2.3.1Variation: Using Redundant Persistence Services
Using a single Persistence Service to guarantee delivery can still raise concerns about having the Persistence Service as a single point of failure. To provide a level of added redundancy, the publisher may be configured to await acknowledgment from a quorum of multiple persistence services (role_name remains PERSISTENCE). Using this configuration we can achieve higher levels of redundancy
Figure 13.4 Guaranteed Delivery Scenario 3 with Redundant Persistence Service
The RTI Persistence Services will automatically share information to keep each other synchronized. This includes both the data and also the information on the durable subscriptions. That is, when a Persistence Service discovers a durable subscription, information about durable subscriptions is automatically replicated and synchronized among persistence services (CITE: New section to be written in Persistence Service Chapter).
13.2.3.2Variation: Using
The Persistence Service will store samples on behalf of many DataWriters and, depending on the configuration, it might write those samples to a database or to disk. For this reason the Persistence Service may become a bottleneck in systems with high durable sample throughput.
It is possible to run multiple instances of the Persistence Service in a manner where each is only responsible for the guaranteed delivery of certain subset of the durable data being published. These Persistence Service can also be run different computers and in this manner achieve much higher throughput. For example, depending on the hardware, using typical
The data to be persisted can be partitioned among the persistence services by specifying different Topics to be persisted by each Persistence Service. If a single Topic has more data that can be handled y a single Persistence Service it is also possible to specify a
Chapter 14 Discovery
This chapter discusses how Connext objects on different nodes find out about each other using the default Simple Discovery Protocol (SDP). It describes the sequence of messages that are passed between Connext on the sending and receiving sides.
This chapter includes the following sections:
❏What is Discovery? (Section 14.1)
❏Configuring the Peers List Used in Discovery (Section 14.2)
❏Discovery Implementation (Section 14.3)
❏Debugging Discovery (Section 14.4)
❏Ports Used for Discovery (Section 14.5)
The discovery process occurs automatically, so you do not have to implement any special code. We recommend that all users read What is Discovery? (Section 14.1) and Configuring the Peers List Used in Discovery (Section 14.2). The remaining sections contain advanced material for those who have a particular need to understand what is happening ‘under the hood.’ This information can help you debug a system in which objects are not communicating.
You may also be interested in reading Chapter 15: Transport Plugins , as well as learning about these QosPolicies:
❏TRANSPORT_SELECTION QosPolicy (DDS Extension) (Section 6.5.22)
❏TRANSPORT_BUILTIN QosPolicy (DDS Extension) (Section 8.5.7)
❏TRANSPORT_UNICAST QosPolicy (DDS Extension) (Section 6.5.23)
❏TRANSPORT_MULTICAST QosPolicy (DDS Extension) (Section 7.6.5)
14.1What is Discovery?
Discovery is the
This chapter describes the default discovery mechanism known as the Simple Discovery Protocol, which includes two phases: Simple Participant Discovery (Section 14.1.1) and Simple
Endpoint Discovery (Section 14.1.2). (Discovery can also be performed using the Enterprise Discovery
The goal of these two phases is to build, for each DomainParticipant, a complete picture of all the entities that belong to the remote participants that are in its peers list. The peers list is the list of nodes with which a participant may communicate. It starts out the same as the initial_peers list that you configure in the DISCOVERY QosPolicy (DDS Extension) (Section 8.5.2). If the accept_unknown_peers flag in that same QosPolicy is TRUE, then other nodes may also be added as they are discovered; if it is FALSE, then the peers list will match the initial_peers list, plus any peers added using the DomainParticipant’s add_peer() operation.
14.1.1Simple Participant Discovery
This phase of the Simple Discovery Protocol is performed by the Simple Participant Discovery Protocol (SPDP).
During the Participant Discovery phase, DomainParticipants learn about each other. The DomainParticipant’s details are communicated to all other DomainParticipants in the same domain by sending participant declaration messages, also known as participant DATA submessages. The details include the DomainParticipant’s unique identifying key (GUID or Globally Unique ID described below), transport locators (addresses and port numbers), and QoS. These messages are sent on a periodic basis using
Participant DATAs are sent periodically to maintain the liveliness of the DomainParticipant. They are also used to communicate changes in the DomainParticipant’s QoS. Only changes to QosPolicies that are part of the DomainParticipant’s
When a DomainParticipant is deleted, a participant DATA (delete) submessage with the
DomainParticipant's identifying GUID is sent.
The GUID is a unique reference to an entity. It is composed of a GUID prefix and an Entity ID. By default, the GUID prefix is calculated from the IP address and the process ID. (For more on how the GUID is calculated, see Controlling How the GUID is Set (rtps_auto_id_kind) (Section 8.5.9.4).) The IP address and process ID are stored in the DomainParticipant’s WIRE_PROTOCOL QosPolicy (DDS Extension) (Section 8.5.9). The entityID is set by Connext (you may be able to change it in a future version).
Once a pair of remote participants have discovered each other, they can move on to the Endpoint Discovery phase, which is how DataWriters and DataReaders find each other.
14.1.2Simple Endpoint Discovery
This phase of the Simple Discovery Protocol is performed by the Simple Endpoint Discovery Protocol (SEDP).
During the Endpoint Discovery phase, Connext matches DataWriters and DataReaders. Information (GUID, QoS, etc.) about your application’s DataReaders and DataWriters is exchanged by sending publication/subscription declarations in DATA messages that we will refer to as publication DATAs and subscription DATAs. The Endpoint Discovery phase uses reliable communication.
As described in Section 14.3, these declaration or DATA messages are exchanged until each DomainParticipant has a complete database of information about the participants in its peers list and their entities. Then the discovery process is complete and the system switches to a steady state. During steady state, participant DATAs are still sent periodically to maintain the liveliness status of participants. They may also be sent to communicate QoS changes or the deletion of a
DomainParticipant.
When a remote DataWriter/DataReader is discovered, Connext determines if the local application has a matching DataReader/DataWriter. A ‘match’ between the local and remote entities occurs only if the DataReader and DataWriter have the same Topic, same data type, and compatible QosPolicies (which includes having the same partition name string, see Section 6.4.5). Furthermore, if the DomainParticipant has been set up to ignore certain DataWriters/DataReaders, those entities will not be considered during the matching process. See Section 16.4.2 for more on ignoring specific publications and subscriptions.
This ‘matching’ process occurs as soon as a remote entity is discovered, even if the entire database is not yet complete: that is, the application may still be discovering other remote entities.
A DataReader and DataWriter can only communicate with each other if each one’s application has hooked up its local entity with the matching remote entity. That is, both sides must agree to the connection.
Section 14.3 describes the details about the discovery process.
14.2Configuring the Peers List Used in Discovery
The Connext discovery process will try to contact all possible participants on each remote node in the ‘initial peers list,’ which comes from the initial_peers field of the DomainParticipant’s DISCOVERY QosPolicy.
The ‘initial peers list’ is just that: an initial list of peers to contact. Furthermore, the peers list merely contains potential
After startup, you can add to the ‘peers list’ with the add_peer() operation (see Adding and Removing Peers List Entries (Section 8.5.2.3)). The ‘peer list’ may also grow as peers are automatically discovered (if accept_unknown_peers is TRUE, see Controlling Acceptance of Unknown Peers (Section 8.5.2.6)).
When you call get_default_participant_qos() for a DomainParticipantFactory, the values used for the DiscoveryQosPolicy’s initial_peers and multicast_receive_addresses may come from the following:
❏A file named NDDS_DISCOVERY_PEERS, which is formatted as described in NDDS_DISCOVERY_PEERS File Format (Section 14.2.3). The file must be in the same directory as your application’s executable.
❏An environment variable named NDDS_DISCOVERY_PEERS, defined as a comma- separated list of peer descriptors (see NDDS_DISCOVERY_PEERS Environment Variable Format (Section 14.2.2)).
❏The value specified in the default XML QoS profile (see Overwriting Default QoS Values (Section 17.9.4)).
If NDDS_DISCOVERY_PEERS (file or environment variable) does not contain a multicast address, then multicast_receive_addresses is cleared and the RTI discovery process will not listen for discovery messages via multicast.
If NDDS_DISCOVERY_PEERS (file or environment variable) contains one or more multicast addresses, the addresses are stored in multicast_receive_addresses, starting at element 0. They will be stored in the order in which they appear in NDDS_DISCOVERY_PEERS.
Note: Setting initial_peers in the default XML QoS Profile does not modify the value of multicast_receive_address.
If both the file and environment variable are found, the file takes precedence and the environment variable will be ignored.1 The settings in the default XML QoS Profile take
precedence over the file and environment variable. In the absence of a file, environment variable, or default XML QoS profile values, Connext will use a default value. See the API Reference HTML documentation for details (in the section on the DISCOVERY QosPolicy).
If initial peers are specified in both the currently loaded QoS XML profile and in the NDDS_DISCOVERY_PEERS file, the values in the profile take precedence.
The file, environment variable, and default XML QoS Profile make it easy to reconfigure which nodes will take part in the discovery
The file, environment variable, and default XML QoS Profile are the possible sources for the default initial peers list. You can, of course, explicitly set the initial list by changing the values in the QoS provided to the DomainParticipantFactory's create_participant() operation, or by adding to the list after startup with the DomainParticipant’s add_peer() operation (see Section 8.5.2.3).
If you set NDDS_DISCOVERY_PEERS and You Want to Communicate over Shared Memory:
Suppose you want to communicate with other Connext applications on the same host and you are explicitly setting NDDS_DISCOVERY_PEERS (generally in order to use unicast discovery with applications on other hosts).
If the local host platform does not support the shared memory transport, then you can include the name of the local host in the NDDS_DISCOVERY_PEERS list. (To check if your platform supports shared memory, see the Platform Notes document.)
If the local host platform supports the shared memory transport, then you must do one of the following:
❏Include "shmem://" in the NDDS_DISCOVERY_PEERS list. This will cause shared memory to be used for discovery and data traffic for applications on the same host.
or:
❏Include the name of the local host in the NDDS_DISCOVERY_PEERS list, and disable the shared memory transport in the TRANSPORT_BUILTIN QosPolicy (DDS Extension) (Section 8.5.7) of the DomainParticipant. This will cause UDP loopback to be used for discovery and data traffic for applications on the same host.
14.2.1Peer Descriptor Format
A peer descriptor string specifies a range of participants at a given locator. Peer descriptor strings are used in the DISCOVERY QosPolicy (DDS Extension) (Section 8.5.2) initial_peers field (see Section 8.5.2.2) and the DomainParticipant’s add_peer() and remove_peer() operations (see Section 8.5.2.3).
The anatomy of a peer descriptor is illustrated in Figure 14.1 using a special "StarFabric" transport example.
A peer descriptor consists of:
❏[optional] A participant ID. If a simple integer is specified, it indicates the maximum participant ID to be contacted by the Connext discovery mechanism at the given locator. If that integer is enclosed in square brackets (e.g., [2]), then only that Participant ID will be used. You can also specify a range in the form of [a,b]: in this case only the Participant IDs in that specific range are contacted. If omitted, a default value of 4 is implied.
❏A locator, as described in Section 14.2.1.1.
These are separated by the '@' character. The separator may be omitted if a participant ID limit is not explicitly specified.
1. This is true even if the file is empty.
Figure 14.1 Peer Descriptor Address String
The "participant ID limit" only applies to unicast locators; it is ignored for multicast locators (and therefore should be omitted for multicast peer descriptors).
14.2.1.1Locator Format
A locator string specifies a transport and an address in string format. Locators are used to form peer descriptors. A locator is equivalent to a peer descriptor with the default participant ID limit
(4).
A locator consists of:
❏[optional] Transport name (alias or class). This identifies the set of transport
❏[optional] An address, as described in Section 14.2.1.2.
These are separated by the "://" string. The separator is specified if and only if a transport name is specified.
If a transport name is specified, the address may be omitted; in that case all the unicast addresses (across all transport
If an address is specified, the transport name and the separator string may be omitted; in that case all the available transport
The transport names for the
❏shmem - Shared Memory Transport
❏udpv4 - UDPv4 Transport
❏udpv6 - UDPv6 Transport
14.2.1.2Address Format
An address string specifies a
DDS_TransportMulticastSettings_t::receive_address fields. An address is equivalent to a locator in which the transport name and separator are omitted.
An address consists of:
❏[optional] A network address in IPv4 or IPv6 string notation. If omitted, the network address of the transport is implied.
❏[optional] A transport address, which is a string that is passed to the transport for processing. The transport maps this string into
NDDS_Transport_Property_t::address_bit_count bits. If omitted, the network address is used as the fully qualified address.
These are separated by the '#' character. If a separator is specified, it must be followed by a non- empty string which is passed to the transport
The bits resulting from the transport address string are prepended with the network address. The least significant NDDS_Transport_Property_t::address_bit_count bits of the network address are ignored.
If you omit the ‘#’ separator and the string is not a valid IPv4 or IPv6 address, it is treated as a transport address with an implicit network address (of the transport
14.2.2NDDS_DISCOVERY_PEERS Environment Variable Format
You can set the default value for the initial peers list in an environment variable named NDDS_DISCOVERY_PEERS. Multiple peer descriptor entries must be separated by commas. Table 14.1 shows some examples. The examples use an implied maximum participant ID of 4 unless otherwise noted. (If you need instructions on how to set environment variables, see the Getting Started Guide).
Table 14.1 NDDS_DISCOVERY_PEERS Environment Variable Examples
NDDS_DISCOVERY_PEERS |
Description of Host(s) |
|
|
|
|
|
|
|
239.255.0.1 |
multicast |
|
|
|
|
localhost |
localhost |
|
|
|
|
192.168.1.1 |
10.10.30.232 (IPv4) |
|
|
|
|
FAA0::1 |
FAA0::0 (IPv6) |
|
|
|
|
himalaya,gangotri |
himalaya and gangotri |
|
|
|
|
1@himalaya,1@gangotri |
himalaya and gangotri (with a maximum participant ID of 1 on each |
|
host) |
||
|
||
|
|
|
FAA0::0localhost |
FAA0::0localhost (could be a UDPv4 transport |
|
network address of FAA0::0) (IPv6) |
||
|
||
|
|
|
udpv4://himalaya |
himalaya accessed using the "udpv4" transport |
|
|
|
|
udpv4://FAA0::0localhost |
localhost using the "udpv4" transport |
|
address FAA0::0 |
||
|
||
|
|
|
udpv4:// |
all unicast addresses accessed via the "udpv4" (UDPv4) transport |
|
|
||
|
|
|
0/0/R |
0/0/R (StarFabric) |
|
#0/0/R |
||
|
||
|
|
|
starfabric://0/0/R |
0/0/R (StarFabric) using the "starfabric" (StarFabric) transport plug- |
|
starfabric://#0/0/R |
ins |
|
|
|
Table 14.1 NDDS_DISCOVERY_PEERS Environment Variable Examples
NDDS_DISCOVERY_PEERS |
Description of Host(s) |
|
|
|
|
|
|
|
starfabric://FBB0::0#0/0/R |
0/0/R (StarFabric) using the "starfabric" (StarFabric) transport plug- |
|
ins registered at network address FAA0::0 |
||
|
|
|
starfabric:// |
all unicast addresses accessed via the "starfabric" (StarFabric) |
|
transport |
||
|
||
|
|
|
shmem:// |
all unicast addresses accessed via the "shmem" (shared memory) |
|
transport |
||
|
||
|
|
|
shmem://FCC0::0 |
all unicast addresses accessed via the "shmem" (shared memory) |
|
transport |
||
|
||
|
|
14.2.3NDDS_DISCOVERY_PEERS File Format
You can set the default value for the initial peers list in a file named NDDS_DISCOVERY_PEERS. The file must be in the your application’s current working directory.
The file is optional. If it is found, it supersedes the values in any environment variable of the same name.
Entries in the file must contain a sequence of peer descriptors separated by whitespace or the comma (',') character. The file may also contain comments starting with a semicolon (';') character until the end of the line.
Example file contents:
;;NDDS_DISCOVERY_PEERS - Default Discovery Configuration File
;;Multicast builtin.udpv4://239.255.0.1 ; default discovery multicast addr
;;Unicast
localhost,192.168.1.1 |
; A comma can be used a separator |
FAA0::1 FAA0::0#localhost ; Whitespace can be used as a separator |
|
1@himalaya |
; Max participant ID of 1 on 'himalaya' |
1@gangotri |
|
;; UDPv4 |
|
udpv4://himalaya |
; 'himalaya' via 'udpv4' transport plugin(s) |
udpv4://FAA0::0#localhost ; 'localhost' via 'updv4' transport plugin
|
; |
registered at network address FAA0::0 |
;; Shared Memory |
|
|
shmem:// |
; All 'shmem' transport plugin(s) |
|
builtin.shmem:// |
; The builtin builtin 'shmem' transport plugin |
|
shmem://FCC0::0 |
; Shared memory transport plugin registered |
|
|
; |
at network address FCC0::0 |
;; StarFabric |
|
|
0/0/R |
; StarFabric node 0/0/R |
|
starfabric://0/0/R |
; 0/0/R accessed via 'starfabric' |
|
|
; |
transport plugin(s) |
starfabric://FBB0::0#0/0/R ; StarFabric transport plugin registered |
||
|
; |
at network address FBB0::0 |
starfabric:// |
; All 'starfabric' transport plugin(s) |
14.3Discovery Implementation
Note: this section contains advanced material not required by most users.
Discovery is implemented using
DDSDataWriter/DDSDataReader. For each DomainParticipant, three
Figure 14.2
DomainParticipant
|
|
|
|
|
participant DATA |
||
|
Advertises this |
Builtin |
|
|
|||
|
|
|
|
|
|
||
Participant |
participant |
DataWriter |
|
“DCPSParticipant” builtin topic |
|||
Discovery |
|
|
|
|
|
|
|
Phase |
Discovers other |
Builtin |
|
|
participant DATA |
||
|
DataReader |
|
|
|
|
|
|
|
participants |
|
“DCPSParticipant” builtin topic |
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
publication DATA |
|||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Builtin |
|
|
|
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
Advertises this |
|
|
|
|
DataWriter |
|
“DCPSPublication” builtin topic |
||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
participant’s |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
DataWriters and |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
subscription DATA |
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
DataReaders |
|
|
|
|
|
Builtin |
|
|
|
|
|
||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DataWriter |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
Endpoint |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
“DCPSSubscription” builtin topic |
||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||
|
(Writer/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||
|
Reader) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||
Discovery |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
publication DATA |
||||||||||||||||||||||||||||||||||
|
|
|
Builtin |
|
|
|
|
|
||||||||||||||||||||||||||||||||||||||||||
|
|
Phase |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||
|
|
|
|
Discovers other |
|
|
|
|
DataReader |
|
“DCPSPublication” builtin topic |
|||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
participants’ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
DataWriters and |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
DataReaders |
|
|
|
|
|
Builtin |
|
|
|
|
|
subscription DATA |
|||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DataReader |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
“DCPSSubscription” builtin topic |
|||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Network
For each DomainParticipant, there are six objects automatically created for discovery purposes. The top two objects are used to send/receive participant DATA messages, which are used in the Participant Discovery phase to find remote DomainParticipants. This phase uses
The implementation is split into two separate protocols:
Simple Participant Discovery Protocol (SPDP)
+Simple Endpoint Discovery Protocol (SEDP)
=Simple Discovery Protocol (SDP)
14.3.1Participant Discovery
When a DomainParticipant is created, a DataWriter and a DataReader are automatically created to exchange participant DATA messages in the network. These DataWriters and DataReaders are "special" because the DataWriter can send to a given list of destinations, regardless of whether there is a Connext application at the destination, and the DataReader can receive data from any
source, whether the source is previously known or not. In other words, these special readers and writers do not need to discover the remote entity and perform a match before they can communicate with each other.
When a DomainParticipant joins or leaves the network, it needs to notify its peer participants. The list of remote participants to use during discovery comes from the peer list described in the DISCOVERY QosPolicy (DDS Extension) (Section 8.5.2). The remote participants are notified via participant DATA messages. In addition, if a participant’s QoS is modified in such a way that other participants need to know about the change (that is, changes to the USER_DATA QosPolicy (Section 6.5.25)), a new participant DATA will be sent immediately.
Participant DATAs are also used to maintain a participant’s liveliness status. These are sent at the rate set in the participant_liveliness_assert_period in the DISCOVERY_CONFIG QosPolicy (DDS Extension) (Section 8.5.3).
Let’s examine what happens when a new remote participant is discovered. If the new remote participant is in the local participant's peer list, the local participant will add that remote participant into its database. If the new remote participant is not in the local application's peer list, it may still be added, if the accept_unknown_peers field in the DISCOVERY QosPolicy (DDS Extension) (Section 8.5.2) is set to TRUE.
Once a remote participant has been added to the Connext database, Connext keeps track of that remote participant’s participant_liveliness_lease_duration. If a participant DATA for that participant (identified by the GUID) is not received at least once within the participant_liveliness_lease_duration, the remote participant is considered stale, and the remote participant, together with all its entities, will be removed from the database of the local participant.
To keep from being purged by other participants, each participant needs to periodically send a participant DATA to refresh its liveliness. The rate at which the participant DATA is sent is controlled by the participant_liveliness_assert_period in the participant’s DISCOVERY_CONFIG QosPolicy (DDS Extension) (Section 8.5.3). This exchange, which keeps Participant A from appearing ‘stale,’ is illustrated in Figure 14.3. Figure 14.4 shows what happens when Participant A terminates ungracefully and therefore needs to be seen as ‘stale.’
14.3.1.1Refresh Mechanism
To ensure that a
The number of retries and the random amount of sleep between them are controlled by each participant’s DISCOVERY_CONFIG QosPolicy (DDS Extension) (Section 8.5.3) (see ① and ✍ in Figure 14.5).
Figure 14.6 provides a summary of the messages sent during the participant discovery phase.
Figure 14.3 Periodic ‘participant DATAs’
Node A
Participant created
Participant’s UserDataQosPolicy modified
① Participant A’s DDS_DomainParticipantQos.discovery_config.
participant_liveliness_assert_period
➁ Random time between min_initial_participant_announcement_period and
max_initial_participant_announcement_period (in A’s
DDS_DomainParticipantQos.discovery_config)
Participant destroyed
Node B
participant A DATA
participant A DATA (delete)
➁
➁
The DomainParticipant on Node A sends a ‘participant DATA’ to Node B, which is in Node A’s peers list. This occurs regardless of whether or not there is a Connext application on Node B.
The green short dashed lines are periodic participant DATAs. The time between these messages is controlled by the participant_liveliness_assert_period in the DiscoveryConfig QosPolicy.
➁ In addition to the periodic participant DATAs, ‘initial repeat messages’ (shown in blue, with longer dashes) are sent from A to B. These messages are sent at a random time between min_initial_participant_announcement_period and max_initial_participant_announcement_period (in A’s DiscoveryConfig QosPolicy). The number of these initial repeat messages is set in initial_participant_announcements.
Figure 14.4 Ungraceful Termination of a Participant
Node A |
Node B |
||
|
|
Participant created |
|
Participant created |
participant A |
||
DATA |
|||
|
|
||
➀ |
|
New remote participant A |
|
|
|||
|
added to database |
||
|
|
|
➀
➁
Participant ungracefully terminated
➀Participant A’s DDS_DomainParticipantQos.discovery_config. participant_liveliness_assert_period
➁Participant A’s DDS_DomainParticipantQos.discovery_config. participant_liveliness_lease_duration
➁
Remote participant A considered ‘stale,’ removed from database
Participant A is removed from participant B’s database if it is not refreshed within the liveliness lease duration. Dashed lines are periodic participant DATA messages.
(Periodic resends of ‘participant B DATA’ from B to A are omitted from this diagram for simplicity. Initial repeat messages from A to B are also omitted from this
Figure 14.5 Resending ‘participant DATA’ to a
Node A |
Node B |
Participant created |
|
participant A |
|||
|
DATA |
||||
|
|||||
|
|
|
|
|
participant B |
|
|
|
|
|
DATA |
New remote participant B |
|
|
|||
added to database |
|
|
|
|
|
|
|
|
|||
➀ |
|
|
|
participant A |
|
|
|
||||
|
|
|
|
|
|
➀ |
|
|
|
|
DATA |
|
|
|
|
||
|
|
participant A |
|||
|
|
||||
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DATA |
participant B already in |
|
participant B |
|||
database, no action taken |
|
DATA |
|||
|
|
|
|
|