Use Cases

You are here:

Use Cases

This section contains advanced material that discusses practical applications of the reliability related QoS.

Importance of Relative Thread Priorities

For high throughput, the Connext DDS Event thread’s priority must be sufficiently high on the sending application. Unlike an unreliable writer, a reliable writer relies on internal Connext DDS threads: the Receive thread processes ACKNACKs from the DataReaders, and the Event thread schedules the events necessary to maintain reliable data flow.

When DDS samples are sent to the same or another application on the same host, the Receive thread priority should be higher than the writing thread priority (priority of the thread calling write() on the DataWriter). This will allow the Receive thread to process the messages as they are sent by the writing thread. A sustained reliable flow requires the reader to be able to process the DDS samples from the writer at a speed equal to or faster than the writer emits.
The default Event thread priority is low. This is adequate if your reliable transfer is not sustained; queued up events will eventually be processed when the writing thread yields the CPU. The Connext DDS can automatically grow the event queue to store all pending events. But if the reliable communication is sustained, reliable events will continue to be scheduled, and the event queue will eventually reach its limit. The default Event thread priority is unsuitable for maintaining a fast and sustained reliable communication and should be increased through the participant_qos.event.thread.priority. This value maps directly to the OS thread priority, see EVENT QosPolicy (DDS Extension)).

The Event thread should also be increased to minimize the reliable latency. If events are processed at a higher priority, dropped packets will be resent sooner.

Now we consider some practical applications of the reliability related QoS:

Aperiodic Use Case: One-at-a-Time
Aperiodic, Bursty
Periodic

Aperiodic Use Case: One-at-a-Time

Suppose you have aperiodically generated data that needs to be delivered reliably, with minimum latency, such as a series of commands (“Ready,” “Aim,” “Fire”). If a writing thread may block between each DDS sample to guarantee reception of the just-sent DDS sample on the reader’s middleware end, a smaller queue will provide a smaller upper bound on the DDS sample delivery time. Adequate writer QoS for this use case are presented in Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer.

Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer

1. qos->reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;

2. qos->history.kind = DDS_KEEP_ALL_HISTORY_QOS;

3. qos->protocol.push_on_write = DDS_BOOLEAN_TRUE;

4.

5. //use these hard coded value unless you use a key

6. qos->resource_limits.initial_samples = qos->resource_limits.max_samples = 1;

7. qos->resource_limits.max_samples_per_instance =

8. qos->resource_limits.max_samples;

9. qos->resource_limits.initial_instances =

10. qos->resource_limits.max_instances = 1;

11.

12. // want to piggyback HB w/ every sample.

13. qos->protocol.rtps_reliable_writer.heartbeats_per_max_samples =

14. qos->resource_limits.max_samples;

15.

16. qos->protocol.rtps_reliable_writer.high_watermark = 1;

17. qos->protocol.rtps_reliable_writer.low_watermark = 0;

18. qos->protocol.rtps_reliable_writer.min_nack_response_delay.sec = 0;

19. qos->protocol.rtps_reliable_writer.min_nack_response_delay.nanosec = 0;

20. //consider making non-zero for reliable multicast

21. qos->protocol.rtps_reliable_writer.max_nack_response_delay.sec = 0;

22. qos->protocol.rtps_reliable_writer.max_nack_response_delay.nanosec = 0;

23.

24. // should be faster than the send rate, but be mindful of OS resolution

25. 25 qos->protocol.rtps_reliable_writer.fast_heartbeat_period.sec = 0;

26. 26 qos->protocol.rtps_reliable_writer.fast_heartbeat_period.nanosec =

27. alertReaderWithinThisMs * 1000000;

28.

29. qos->reliability.max_blocking_time = blockingTime;

30. qos->protocol.rtps_reliable_writer.max_heartbeat_retries = 7;

31.

32. // essentially turn off slow HB period

33. qos->protocol.rtps_reliable_writer.heartbeat_period.sec = 3600 * 24 * 7;

Line 1 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): This is the default setting for a writer, shown here strictly for clarity.

Line 2 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): Setting the History kind to KEEP_ALL guarantees that no DDS sample is ever lost.

Line 3 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): This is the default setting for a writer, shown here strictly for clarity. ‘Push’ mode reliability will yield lower latency than ‘pull’ mode reliability in normal situations where there is no DDS sample loss. (See DATA_WRITER_PROTOCOL QosPolicy (DDS Extension).) Furthermore, it does not matter that each packet sent in response to a command will be small, because our data sent with each command is likely to be small, so that maximizing throughput for this data is not a concern.

Line 5 - Line 10 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): For this example, we assume a single writer is writing DDS samples one at a time. If we are not using keys (see DDS Samples, Instances, and Keys), there is no reason to use a queue with room for more than one DDS sample, because we want to resolve a DDS sample completely before moving on to the next. While this negatively impacts throughput, it minimizes memory usage. In this example, a written DDS sample will remain in the queue until it is acknowledged by all active readers (only 1 for this example).

Line 12 - Line 14 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): The fastest way for a writer to ensure that a reader is up-to-date is to force an acknowledgment with every DDS sample. We do this by appending a Heartbeat with every DDS sample. This is akin to a certified mail; the writer learns—as soon as the system will allow—whether a reader has received the letter, and can take corrective action if the reader has not. As with certified mail, this model has significant overhead compared to the unreliable case, trading off lower packet efficiency in favor of latency and fast recovery.

Line 16-Line 17 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): Since the writer takes responsibility for pushing the DDS samples out to the reader, a writer will go into a “heightened alert” mode as soon as the high water mark is reached (which is when any DDS sample is written for this writer) and only come out of this mode when the low water mark is reached (when all DDS samples have been acknowledged for this writer). Note that the selected high and low watermarks are actually the default values.

Line 18-Line 22 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): When a reader requests a lost DDS sample, we respond to the reader immediately in the interest of faster recovery. If the readers receive packets on unicast, there is no reason to wait, since the writer will eventually have to feed individual readers separately anyway. In case of multicast readers, it makes sense to consider further. If the writer delayed its response enough so that all or most of the readers have had a chance to NACK a DDS sample, the writer may coalesce the requests and send just one packet to all the multicast readers. Suppose that all multicast readers do indeed NACK within approximately 100 msec. Setting the minimum and maximum delays at 100 msec will allow the writer to collect all these NACKs and send a single response over multicast. (See DATA_WRITER_PROTOCOL QosPolicy (DDS Extension) for information on setting min_nack_response_delay and max_nack_response_delay.) Note that Connext DDS relies on the OS to wait for this 100 msec. Unfortunately, not all operating systems can sleep for such a fine duration. On Windows systems, for example, the minimum achievable sleep time is somewhere between 1 to 20 milliseconds, depending on the version. On VxWorks systems, the minimum resolution of the wait time is based on the tick resolution, which is 1/system clock rate (thus, if the system clock rate is 100 Hz, the tick resolution is 10 millisecond). On such systems, the achievable minimum wait is actually far larger than the desired wait time. This could have an unintended consequence due to the delay caused by the OS; at a minimum, the time to repair a packet may be longer than you specified.

Line 24-Line 27 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): If a reader drops a DDS sample, the writer recovers by notifying the reader of what it has sent, so that the reader may request resending of the lost DDS sample. Therefore, the recovery time depends primarily on how quickly the writer pings the reader that has fallen behind. If commands will not be generated faster than one every few seconds, it may be acceptable for the writer to ping the reader several hundred milliseconds after the DDS sample is sent.

Suppose that the round-trip time of fairly small packets between the writer and the reader application is 50 microseconds, and that the reader does not delay response to a Heartbeat from the writer (see DATA_READER_PROTOCOL QosPolicy (DDS Extension) for how to change this). If a DDS sample is dropped, the writer will ping the reader after a maximum of the OS delay resolution discussed above and alertReaderWithinThisMs (let’s say 10 ms for this example). The reader will request the missing DDS sample immediately, and with the code set as above, the writer will feed the missing DDS sample immediately. Neglecting the processing time on the writer or the reader end, and assuming that this retry succeeds, the time to recover the DDS sample from the original publication time is: alertReaderWithinThisMs + 50 msec + 25 msec.

If the OS is capable of micro-sleep, the recovery time can be within 100 msec, barely noticeable to a human operator. If the OS minimum wait resolution is much larger, the recovery time is dominated by the wait resolution of the OS. Since ergonomic studies suggest that delays in excess of a 0.25 seconds start hampering operations that require low latency data, even a 10 ms limitation seems to be acceptable.

What if two packets are dropped in a row? Then the recovery time would be
2 * alertReaderWithinThisMs + 2 * 50 msec + 25 msec. If alertReaderWithinThisMs is 100 ms, the recovery time now exceeds 200 ms, and can perhaps degrade user experience.

Line 29-Line 30 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): What if another command (like another button press) is issued before the recovery? Since we must not drop this new DDS sample, we block the writer until the recovery completes. If alertReaderWithinThisMs is 10 ms, and we assume no more than 7 consecutive drops, the longest time for recovery will be just above (alertReaderWithinThisMs * max_heartbeat_retries), or 70 ms.

So if we set blockingTime to about 80 ms, we will have given enough chance for recovery. Of course, in a dynamic system, a reader may drop out at any time, in which case max_heartbeat_retries will be exceeded, and the unresponsive reader will be dropped by the writer. In either case, the writer can continue writing. Inappropriate values will cause a writer to prematurely drop a temporarily unresponsive (but otherwise healthy) reader, or be stuck trying unsuccessfully to feed a crashed reader. In the unfortunate case where a reader becomes temporarily unresponsive for a duration exceeding (alertReaderWithinThisMs * max_heartbeat_retries), the writer may issue gaps to that reader when it becomes active again; the dropped DDS samples are irrecoverable. So estimating the worst case unresponsive time of all potential readers is critical if DDS sample drop is unacceptable.

Line 33 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Writer): Since the command may not be issued for hours or even days on end, there is no reason to keep announcing the writer’s state to the readers.

Figure: QoS for an Aperiodic, One-at-a-time Reliable Reader shows how to set the QoS for the reader side, followed by a line-by-line explanation.

Figure: QoS for an Aperiodic, One-at-a-time Reliable Reader

1. qos->reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;

2. qos->history.kind = DDS_KEEP_ALL_HISTORY_QOS;

3.

4. // 1 is ok for normal use. 2 allows fast infinite loop

5. qos->reader_resource_limits.max_samples_per_remote_writer = 2;

6. qos->resource_limits.initial_samples = 2;

7. qos->resource_limits.initial_instances = 1;

8.

9. qos->protocol.rtps_reliable_reader.max_heartbeat_response_delay.sec = 0;

10. qos->protocol.rtps_reliable_reader.max_heartbeat_response_delay.nanosec = 0;

11. qos->protocol.rtps_reliable_reader.min_heartbeat_response_delay.sec = 0;

12. qos->protocol.rtps_reliable_reader.min_heartbeat_response_delay.nanosec = 0;

Line 1-Line 2 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Reader): Unlike a writer, the reader’s default reliability setting is best-effort, so reliability must be turned on. Since we don’t want to drop anything, we choose KEEP_ALL history.

Line 4-Line 6 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Reader): Since we enforce reliability on each DDS sample, it would be sufficient to keep the queue size at 1, except in the following case: suppose that the reader takes some action in response to the command received, which in turn causes the writer to issue another command right away. Because Connext DDS passes the user data up to the application even before acknowledging the DDS sample to the writer (for minimum latency), the first DDS sample is still pending for acknowledgement in the writer’s queue when the writer attempts to write the second DDS sample, and will cause the writing thread to block until the reader completes processing the first DDS sample and acknowledges it to the writer; all are as they should be. But if you want to run this infinite loop at full throttle, the reader should buffer one more DDS sample. Let’s follow the packets flow under a normal circumstance:

The sender application writes DDS sample 1 to the reader. The receiver application processes it and sends a user-level response 1 to the sender application, but has not yet ACK’d DDS sample 1.
The sender application writes DDS sample 2 to the receiving application in response to response 1. Because the reader’s queue is 2, it can accept DDS sample 2 even though it may not yet have acknowledged DDS sample 1. Otherwise, the reader may drop DDS sample 2, and would have to recover it later.
At the same time, the receiver application acknowledges DDS sample 1, and frees up one slot in the queue, so that it can accept DDS sample 3, which it on its way.

The above steps can be repeated ad-infinitum in a continuous traffic.

Line 7 (Figure: QoS for an Aperiodic, One-at-a-time Reliable Reader): Since we are not using keys, there is just one instance.

Line 9-Line 12 (Use Cases): We choose immediate response in the interest of fastest recovery. In high throughput, multicast scenario, delaying the response (with event thread priority set high of course) may decrease the likelihood of NACK storm causing a writer to drop some NACKs. This random delay reduces this chance by staggering the NACK response. But the minimum delay achievable once again depends on the OS.

Aperiodic, Bursty

Suppose you have aperiodically generated bursts of data, as in the case of a new aircraft approaching an airport. The data may be the same or different, but if they are written by a single writer, the challenge to this writer is to feed all readers as quickly and efficiently as possible when this burst of hundreds or thousands of DDS samples hits the system.

If you use an unreliable writer to push this burst of data, some of them may be dropped over an unreliable transport such as UDP.

If you try to shape the burst according to however much the slowest reader can process, the system throughput may suffer, and places an additional burden of queueing the DDS samples on the sender application.

If you push the data reliably as fast they are generated, this may cost dearly in repair packets, especially to the slowest reader, which is already burdened with application chores.

Connext DDS pull mode reliability offers an alternative in this case by letting each reader pace its own data stream. It works by notifying the reader what it is missing, then waiting for it to request only as much as it can handle. As in the aperiodic one-at-a-time case (Aperiodic Use Case: One-at-a-Time), multicast is supported, but its performance depends on the resolution of the minimum delay supported by the OS. At the cost of greater latency, this model can deliver reliability while using far fewer packets than in the push mode. The writer QoS is given in Figure: QoS for an Aperiodic, Bursty Writer, with a line-by-line explanation below.

Figure: QoS for an Aperiodic, Bursty Writer

1. qos->reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;

2. qos->history.kind = DDS_KEEP_ALL_HISTORY_QOS;

3. qos->protocol.push_on_write = DDS_BOOLEAN_FALSE;

4.

5. //use these hard coded value until you use key

6. qos->resource_limits.initial_instances =

7. qos->resource_limits.max_instances = 1;

8. qos->resource_limits.initial_samples = qos->resource_limits.max_samples

9.  			= worstBurstInSample;

10. qos->resource_limits.max_samples_per_instance =

11. qos->resource_limits.max_samples;

12.

13. // piggyback HB not used

14. qos->protocol.rtps_reliable_writer.heartbeats_per_max_samples = 0;

15.

16. qos->protocol.rtps_reliable_writer.high_watermark = 1;

17. qos->protocol.rtps_reliable_writer.low_watermark = 0;

18.

19. qos->protocol.rtps_reliable_writer.min_nack_response_delay.sec = 0;

20. qos->protocol.rtps_reliable_writer.min_nack_response_delay.nanosec = 0;

21. qos->protocol.rtps_reliable_writer.max_nack_response_delay.sec = 0;

22. qos->protocol.rtps_reliable_writer.max_nack_response_delay.nanosec = 0;

23. qos->reliability.max_blocking_time = blockingTime;

24.

25. // should be faster than the send rate, but be mindful of OS resolution

26. qos->protocol.rtps_reliable_writer.fast_heartbeat_period.sec = 0;

27. qos->protocol.rtps_reliable_writer.fast_heartbeat_period.nanosec =

28. 				alertReaderWithinThisMs * 1000000;

29. qos->protocol.rtps_reliable_writer.max_heartbeat_retries = 5;

30.

31. // essentially turn off slow HB period

32.  qos->protocol.rtps_reliable_writer.heartbeat_period.sec = 3600 * 24 * 7;

Line 1 (Figure: QoS for an Aperiodic, Bursty Writer): This is the default setting for a writer, shown here strictly for clarity.

Line 2 (Figure: QoS for an Aperiodic, Bursty Writer): Since we do not want any data lost, we want the History kind set to KEEP_ALL.

Line 3 (Figure: QoS for an Aperiodic, Bursty Writer): The default Connext DDS reliable writer will push, but we want the reader to pull instead.

Line 5-Line 11 (Figure: QoS for an Aperiodic, Bursty Writer): We assume a single instance, in which case the maximum DDS sample count will be the same as the maximum DDS sample count per writer. In contrast to the one-at-a-time case discussed in Aperiodic Use Case: One-at-a-Time, the writer’s queue is large; as big as the burst size in fact, but no more because this model tries to resolve a burst within a reasonable period, to be computed shortly. Of course, we could block the writing thread in the middle of the burst, but that might complicate the design of the sending application.

Line 13-Line 14 (Figure: QoS for an Aperiodic, Bursty Writer): By a ‘piggyback’ Heartbeat, we mean only a Heartbeat that is appended to data being pushed from the writer. Strictly speaking, the writer will also append a Heartbeat with each reply to a reader’s lost DDS sample request, but we call that a ‘framing’ Heartbeat. Since data is pulled, heartbeats_per_max_samples is ignored.

Line 16-Line 17 (Figure: QoS for an Aperiodic, Bursty Writer): Similar to the previous aperiodic writer, this writer spends most of its time idle. But as the name suggests, even a single new DDS sample implies more DDS sample to follow in a burst. Putting the writer into a fast mode quickly will allow readers to be notified soon. Only when all DDS samples have been delivered, the writer can rest.

Line 19- Line 23 (Figure: QoS for an Aperiodic, Bursty Writer): Similar to the one-at-a-time case, there is no reason to delay response with only one reader. In this case, we can estimate the time to resolve a burst with only a few parameters. Let’s say that the reader figures it can safely receive and process 20 DDS samples at a time without being overwhelmed, and that the time it takes a writer to fetch these 20 DDS samples and send a single packet containing these 20 DDS samples, plus the time it takes a reader to receive and process these DDS samples, and send another request back to the writer for the next 20 DDS samples is 11 ms. Even on the same hardware, if the reader’s processing time can be reduced, this time will decrease; other factors such as the traversal time through Connext DDS and the transport are typically in microseconds range (depending on machines of course).

For example, let’s also say that the worst case burst is 1000 DDS samples. The writing thread will of course not block because it is merely copying each of the 1000 DDS samples to the Connext DDS queue on the writer side; on a typical modern machine, the act of writing these 1000 DDS samples will probably take no more than a few ms. But it would take at least 1000/20 = 50 resend packets for the reader to catch up to the writer, or 50 times 11 ms = 550 ms. Since the burst model deals with one burst at a time, we would expect that another burst would not come within this time, and that we are allowed to block for at least this period. Including a safety margin, it would appear that we can comfortably handle a burst of 1000 every second or so.

But what if there are multiple readers? The writer would then take more time to feed multiple readers, but with a fast transport, a few more readers may only increase the 11 ms to only 12 ms or so. Eventually, however, the number of readers will justify the use of multicast. Even in pull mode, Connext DDS supports multicast by measuring how many multicast readers have requested DDS sample repair. If the writer does not delay response to NACK, then repairs will be sent in unicast. But a suitable NACK delay allows the writer to collect potentially NACKs from multiple readers, and feed a single multicast packet. But as discussed in Aperiodic Use Case: One-at-a-Time, by delaying reply to coalesce response, we may end up waiting much longer than desired. On a Windows system with 10 ms minimum sleep achievable, the delay would add at least 10 ms to the 11 ms delay, so that the time to push 1000 DDS samples now increases to 50 times 21 ms = 1.05 seconds. It would appear that we will not be able to keep up with incoming burst if it came at roughly 1 second, although we put fewer packets on the wire by taking advantage of multicast.

Line 25-Line 28 (Use Cases): We now understand how the writer feeds the reader in response to the NACKs. But how does the reader realize that it is behind? The writer notifies the reader with a Heartbeat to kick-start the exchange. Therefore, the latency will be lower bound by the writer’s fast heartbeat period. If the application is not particularly sensitive to latency, the minimum wait time supported by the OS (10 ms on Windows systems, for example) might be a reasonable value.

Line 29 (Figure: QoS for an Aperiodic, Bursty Writer): With a fast heartbeat period of 50 ms, a writer will take 500 ms (50 ms times the default max_heartbeat_retries of 10) to write-off an unresponsive reader. If a reader crashes while we are writing a lot of DDS samples per second, the writer queue may completely fill up before the writer has a chance to drop the crashed reader. Lowering max_heartbeat_retries will prevent that scenario.

Line 31-Line 32 (Figure: QoS for an Aperiodic, Bursty Writer): For an aperiodic writer, turning off slow periodic Heartbeats will remove unwanted traffic from the network.

Figure: QoS for an Aperiodic, Bursty Reader shows example code for a corresponding aperiodic, bursty reader.

Figure: QoS for an Aperiodic, Bursty Reader

1. qos->reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;

2. qos->history.kind = DDS_KEEP_ALL_HISTORY_QOS;

3. qos->resource_limits.initial_samples =

4. qos->resource_limits.max_samples =

5. qos->reader_resource_limits.max_samples_per_remote_writer = 32;

6.

7. //use these hard coded value until you use key

8. qos->resource_limits.max_samples_per_instance =

9. qos->resource_limits.max_samples;

10. qos->resource_limits.initial_instances =

11. qos->resource_limits.max_instances = 1;

12.

13. // the writer probably has more for the reader; ask right away

14. qos->protocol.rtps_reliable_reader.min_heartbeat_response_delay.sec = 0;

15. qos->protocol.rtps_reliable_reader.min_heartbeat_response_delay.nanosec = 0;

16. qos->protocol.rtps_reliable_reader.max_heartbeat_response_delay.sec = 0;

17. qos->protocol.rtps_reliable_reader.max_heartbeat_response_delay.nanosec = 0;

Line 1-Line 2 (Figure: QoS for an Aperiodic, Bursty Reader ): Unlike a writer, the reader’s default reliability setting is best-effort, so reliability must be turned on. Since we don’t want to drop anything, we choose KEEP_ALL for the History QoS kind.

Line 3-Line 5 (Figure: QoS for an Aperiodic, Bursty Reader ): Unlike the writer, the reader’s queue can be kept small, since the reader is free to send ACKs for as much as it wants anyway. In general, the larger the queue, the larger the packet needs to be, and the higher the throughput will be. When the reader NACKs for lost DDS sample, it will only ask for this much.

Line 7-Line 11 (Figure: QoS for an Aperiodic, Bursty Reader ): We do not use keys in this example.

Line 13-Line 17 (Figure: QoS for an Aperiodic, Bursty Reader ): We respond immediately to catch up as soon as possible. When there are many readers, this may cause a NACK storm, as discussed in the reader code for one-at-a-time reliable reader.

Periodic

In a periodic reliable model, we can use the writer and the reader queue to keep the data flowing at a smooth rate. The data flows from the sending application to the writer queue, then to the transport, then to the reader queue, and finally to the receiving application. Unless the sending application or any one of the receiving applications becomes unresponsive (including a crash) for a noticeable duration, this flow should continue uninterrupted.

The latency will be low in most cases, but will be several times higher for the recovered and many subsequent DDS samples. In the event of a disruption (e.g., loss in transport, or one of the readers becoming temporarily unresponsive), the writer’s queue level will rise, and may even block in the worst case. If the writing thread must not block, the writer’s queue must be sized sufficiently large to deal with any fluctuation in the system. Figure: QoS for a Periodic Reliable Writer shows an example, with line-by-line analysis below.

Figure: QoS for a Periodic Reliable Writer

1. qos->reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;

2. qos->history.kind = DDS_KEEP_ALL_HISTORY_QOS;

3. qos->protocol.push_on_write = DDS_BOOLEAN_TRUE;

4.

5. //use these hard coded value until you use key

6. qos->resource_limits.initial_instances =

7. qos->resource_limits.max_instances = 1;

8.

9. int unresolvedSamplePerRemoteWriterMax =

10. 	worstCaseApplicationDelayTimeInMs * dataRateInHz / 1000;

11. qos->resource_limits.max_samples = unresolvedSamplePerRemoteWriterMax;

12. qos->resource_limits.initial_samples = qos->resource_limits.max_samples/2;

13. qos->resource_limits.max_samples_per_instance =

14. 	qos->resource_limits.max_samples;

15.

16. int piggybackEvery = 8;

17. qos->protocol.rtps_reliable_writer.heartbeats_per_max_samples =

18. 	qos->resource_limits.max_samples / piggybackEvery;

19.

20. qos->protocol.rtps_reliable_writer.high_watermark = piggybackEvery * 4;

21. qos->protocol.rtps_reliable_writer.low_watermark = piggybackEvery * 2;

22. qos->reliability.max_blocking_time = blockingTime;

23.

24. qos->protocol.rtps_reliable_writer.min_nack_response_delay.sec = 0;

25. qos->protocol.rtps_reliable_writer.min_nack_response_delay.nanosec = 0;

26.

27. qos->protocol.rtps_reliable_writer.max_nack_response_delay.sec = 0;

28. qos->protocol.rtps_reliable_writer.max_nack_response_delay.nanosec = 0;

29.

30. qos->protocol.rtps_reliable_writer.fast_heartbeat_period.sec = 0;

31. qos->protocol.rtps_reliable_writer.fast_heartbeat_period.nanosec =

32. `	alertReaderWithinThisMs * 1000000;

33. qos->protocol.rtps_reliable_writer.max_heartbeat_retries = 7;

34.

35. // essentially turn off slow HB period

36. qos->protocol.rtps_reliable_writer.heartbeat_period.sec = 3600 * 24 * 7;

Line 1 (Figure: QoS for a Periodic Reliable Writer): This is the default setting for a writer, shown here strictly for clarity.

Line 2 (Figure: QoS for a Periodic Reliable Writer): Since we do not want any data lost, we set the History kind to KEEP_ALL.

Line 3 (Figure: QoS for a Periodic Reliable Writer): This is the default setting for a writer, shown here strictly for clarity. Pushing will yield lower latency than pulling.

Line 5-Line 7 (Figure: QoS for a Periodic Reliable Writer): We do not use keys in this example, so there is only one instance.

Line 9-Line 11 (Figure: QoS for a Periodic Reliable Writer): Though a simplistic model of queue, this is consistent with the idea that the queue size should be proportional to the data rate and the wort case jitter in communication.

Line 12 (Figure: QoS for a Periodic Reliable Writer): Even though we have sized the queue according to the worst case, there is a possibility for saving some memory in the normal case. Here, we initially size the queue to be only half of the worst case, hoping that the worst case will not occur. When it does, Connext DDS will keep increasing the queue size as necessary to accommodate new DDS samples, until the maximum is reached. So when our optimistic initial queue size is breached, we will incur the penalty of dynamic memory allocation. Furthermore, you will wind up using more memory, as the initially allocated memory will be orphaned (note: does not mean a memory leak or dangling pointer); if the initial queue size is M_i and the maximal queue size is M_m, where M_m = M_i * 2^n, the memory wasted in the worst case will be (M_m - 1) * sizeof(DDS sample) bytes. Note that the memory allocation can be avoided by setting the initial queue size equal to its max value.

Line 13-Line 14 (Figure: QoS for a Periodic Reliable Writer): If there is only one instance, maximum DDS samples per instance is the same as maximum DDS samples allowed.

Line 16-Line 18 (Figure: QoS for a Periodic Reliable Writer): Since we are pushing out the data at a potentially rapid rate, the piggyback heartbeat will be useful in letting the reader know about any missing DDS samples. The piggybackEvery can be increased if the writer is writing at a fast rate, with the cost that more DDS samples will need to queue up for possible resend. That is, you can consider the piggyback heartbeat to be taking over one of the roles of the periodic heartbeat in the case of a push. So sending fewer DDS samples between piggyback heartbeats is akin to decreasing the fast heartbeat period seen in previous sections. Please note that we cannot express piggybackEvery directly as its own QoS, but indirectly through the maximum DDS samples.

Line 20-Line 22 (Figure: QoS for a Periodic Reliable Writer): If piggybackEvery was exactly identical to the fast heartbeat, there would be no need for fast heartbeat or the high watermark. But one of the important roles for the fast heartbeat period is to allow a writer to abandon inactive readers before the queue fills. If the high watermark is set equal to the queue size, the writer would not doubt the status of an unresponsive reader until the queue completely fills—blocking on the next write (up to blockingTime). By lowering the high watermark, you can control how vigilant a writer is about checking the status of unresponsive readers. By scaling the high watermark to piggybackEvery, the writer is expressing confidence that an alive reader will respond promptly within the time it would take a writer to send 4 times piggybackEvery DDS samples. If the reader does not delay the response too long, this would be a good assumption. Even if the writer estimated on the low side and does go into fast mode (suspecting that the reader has crashed) when a reader is temporarily unresponsive (e.g., when it is performing heavy computation for a few milliseconds), a response from the reader in question will resolve any doubt, and data delivery can continue uninterrupted. As the reader catches up to the writer and the queue level falls below the low watermark, the writer will pop out to the normal, relaxed mode.

Line 24-Line 28 (Figure: QoS for a Periodic Reliable Writer): When a reader is behind (including a reader whose Durability QoS is non-VOLATILE and therefore needs to catch up to the writer as soon as it is created), how quickly the writer responds to the reader’s request will determine the catch-up rate. While a multicast writer (that is, a writer with multicast readers) may consider delaying for some time to take advantage of coalesced multicast packets. Keep in mind the OS delay resolution issue discussed in the previous section.

Line 30-Line 33 (Figure: QoS for a Periodic Reliable Writer): The fast heartbeat mechanism allows a writer to detect a crashed reader and move along with the remaining readers when a reader does not respond to any of the max_heartbeat_retries number of heartbeats sent at the fast_heartbeat_period rate. So if you want a more cautious writer, decrease either numbers; conversely, increasing either number will result in a writer that is more reluctant to write-off an unresponsive reader.

Line 35-Line 36 (Figure: QoS for a Periodic Reliable Writer): Since this a periodic model, a separate periodic heartbeat to notify the writer’s status would seem unwarranted; the piggyback heartbeat sent with DDS samples takes over that role.

Figure: QoS for a Periodic Reliable Reader shows how to set the QoS for a matching reader, followed by a line-by-line explanation.

Figure: QoS for a Periodic Reliable Reader

1. qos->reliability.kind = DDS_RELIABLE_RELIABILITY_QOS;

2. qos->history.kind = DDS_KEEP_ALL_HISTORY_QOS;

3. qos->resource_limits.initial_samples =

4. qos->resource_limits.max_samples =

5. qos->reader_resource_limits.max_samples_per_remote_writer =

6.  ((2*piggybackEvery - 1) + dataRateInHz * delayInMs / 1000);

7.

8. //use these hard coded value until you use key

9. qos->resource_limits.max_samples_per_instance =

10.      qos->resource_limits.max_samples;

11. qos->resource_limits.initial_instances =

12.      qos->resource_limits.max_instances = 1;

13.

14. qos->protocol.rtps_reliable_reader.min_heartbeat_response_delay.sec = 0;

15. qos->protocol.rtps_reliable_reader.min_heartbeat_response_delay.nanosec = 0;

16. qos->protocol.rtps_reliable_reader.max_heartbeat_response_delay.sec = 0;

17. qos->protocol.rtps_reliable_reader.max_heartbeat_response_delay.nanosec = 0;

Line 1-Line 2 (Figure: QoS for a Periodic Reliable Reader): Unlike a writer, the reader’s default reliability setting is best-effort, so reliability must be turned on. Since we don’t want to drop anything, we choose KEEP_ALL for the History QoS.

Line 3-Line 6 (Figure: QoS for a Periodic Reliable Reader) Unlike the writer, the reader queue is sized not according to the jitter of the reader, but rather how many DDS samples you want to cache speculatively in case of a gap in sequence of DDS samples that the reader must recover. Remember that a reader will stop giving a sequence of DDS samples as soon as an unintended gap appears, because the definition of strict reliability includes in-order delivery. If the queue size were 1, the reader would have no choice but to drop all subsequent DDS samples received until the one being sought is recovered. Connext DDS uses speculative caching, which minimizes the disruption caused by a few dropped DDS samples. Even for the same duration of disruption, the demand on reader queue size is greater if the writer will send more rapidly. In sizing the reader queue, we consider 2 factors that comprise the lost DDS sample recovery time:

How long it takes a reader to request a resend to the writer.

The piggyback heartbeat tells a reader about the writer’s state. If only DDS samples between two piggybacked DDS samples are dropped, the reader must cache piggybackEvery DDS samples before asking the writer for resend. But if a piggybacked DDS sample is also lost, the reader will not get around to asking the writer until the next piggybacked DDS sample is received. Note that in this worst case calculation, we are ignoring stand-alone heartbeats (i.e., not piggybacked heartbeat from the writer). Of course, the reader may drop any number of heartbeats, including the stand-alone heartbeat; in this sense, there is no such thing as the absolute worst case—just reasonable worst case, where the probability of consecutive drops is acceptably low. For the majority of applications, even two consecutive drops is unlikely, in which case we need to cache at most (2*piggybackEvery - 1) DDS samples before the reader will ask the writer to resend, assuming no delay (Line 14-Line 17, Figure: QoS for a Periodic Reliable Reader).

How long it takes for the writer to respond to the request.

Even ignoring the flight time of the resend request through the transport, the writer takes a finite time to respond to the repair request--mostly if the writer delays reply for multicast readers. In case of immediate response, the processing time on the writer end, as well as the flight time of the messages to and from the writer do not matter unless very larger data rate; that is, it is the product term that matters. In case the delay for multicast is random (that is, the minimum and the maximum delay are not equal), one would have to use the maximum delay to be conservative.

Line 8-Line 12 (Figure: QoS for a Periodic Reliable Reader): Since we are not using keys, there is just one instance.

Line 14-Line 17 (Figure: QoS for a Periodic Reliable Reader): If we are not using multicast, or the number of readers being fed by the writer, there is no reason to delay.