KEEP_LAST=N vs KEEP_ALL + ResourceLimits.max_samples=N

12 posts / 0 new
Last post
Offline
Last seen: 11 months 4 days ago
Joined: 07/01/2016
Posts: 3
KEEP_LAST=N vs KEEP_ALL + ResourceLimits.max_samples=N

Hi,

Is there any difference between those two settings for a topic with one instance?

HistoryQosPolicy.kind = KEEP_LAST_HISTORY_QOS
HistoryQosPolicy.depth = N

and

HistoryQosPolicy.kind = KEEP_ALL_HISTORY_QOS
ResourceLimitsQosPolicy.max_samples = N
ResourceLimitsQosPolicy.max_samples_pre_instance = N

Best regards,
Rafał

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 1 day ago
Joined: 06/02/2010
Posts: 601

Hi,

Yes it is quite different.

In the first case you are telling the middleware that it needs to only keep N samples (per instance) in its history. In other words as long as you have N samples the N+1 no longer matters. So if the DataWriter is writing faster than some of the DataReaders can handle, and/or some messages are "lost" by the network, the DataWriter will be OK replacing the old samples with newer ones even if some of the old samples have not been acknosledged by some of the reliable readers.

On the second case you are saying that all samples matter. Nothing should be lost or replaced. You are setting a resource limit by which the DataWriter cannot hold more than N samples in its writer cache, but that gives no license for the DataWriter to lose (or replace) the old samples if the are still waiting to be delivered/acknowledged. In the scenario of a DataWriter writing faster than some of the subscribers can handle, and/or some messages being "lost" by the network, the DataWriter will block when you call the write operation (or return a TIMEOUT error)  when you try to write a sample if it cannot fit it into its cache because the samples there need to still be acknowledged by some of the reliable readers.  Effectively this creates a back-pressure similar to what TCP would give you so that the DataWriter will self regulate to send only as fast as the reliable readers can handle.

Gerardo

 

Offline
Last seen: 11 months 4 days ago
Joined: 07/01/2016
Posts: 3

So if the DataWriter is writing faster than some of the DataReaders can handle, and/or some messages are "lost" by the network, the DataWriter will be OK replacing the old samples with newer ones even if some of the old samples have not been acknosledged by some of the reliable readers.

I thought that reliable DataWriter, writing to reliable DataReader, guarantees that all messages from its cache will be delivered to DataReader's cache. In other words, when write() method on DataWriter returns success I can be sure that this message will not be deleted before it is delivered to DataReader's cache. Doesn't it work this way?

Rafał

Offline
Last seen: 3 months 6 days ago
Joined: 02/11/2016
Posts: 144

Hey Rafal,

There's no practical way of implementing perfect reliability (that is: if a writer wrote a sample, it is guaranteed that all readers will receive it).

There are ways of obtaining a certain level of reliability.

Within DDS, the QoS allows you to set many different properties which either affect reliability direclty or indirectly allowing you to make your reliability the way that is most fitting to your use case.

It's important to note that the problem of perfect reliability is not unique to DDS.

You wouldn't want all of your reliable writes to block until all of the readers acknowledge your sample and you wouldn't be able to store an infinite amount of samples if a reader (or readers) haven't acknowledged any of them.

Because perfect reliability is sort of an impossible dream, DDS allows you (through the QoS) to tweak how your reliability works, hopefully yielding a solution best fitted to your use case.

Hope this helps,

Roy.

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 1 day ago
Joined: 06/02/2010
Posts: 601

Hi Rafal,

Roy's explanation is spot on! There is always a tradeoff when it comes to reliability, timeliness, and resources.

Further I would like to explain the underlying philosophy in DDS.

The "DDS" reliability contract is between the DataWriter cache and the DataReader cache. In other words it is not a guarantee of reliability on the samples "written" by the DataWriter, rather it is a reliability on the samples that "exist" on the DataWriter cache. But that can be a moving target so unless the DataWriter stops writing the behavior can be affected by timing and sample loss. See below.

This is a subtle but important difference. Samples can be removed from the DataWriter cache for various reasons. One is the History Qos Policy kind sent to KEEP_LAST (contract is to keep the last "N" samples per instance) another is the Lifespan Qos policy set to a finite value. In these cases samples can be removed from the DataWriter cache independent of DataReader activity.  Depending on its configuration, the DataWriter will try to send the samples when it they are first written, but if a sample is not received by some readers and in the meantime it is replaced in the DataWriter cache bacause of Qos policies then the DataWriter can proceed to "remove" that "non fully acknowledged" sample without violating the "RELIABILITY" contract. Stated differently, the DataWriter had its own autonomous reasons to remove the samples from its cache and the reliabilility only offers guarentees on samples present on DataWriter cache but does not prevent those "autonomous" Qos-policy driven changes.

Of course you can get "TCP" style reliability by setting Qos that does not allow the DataWriter to remove something from the Cache if it has not been fully acknowledged by all reliable readers. This you do setting History Qos Policy kind sent to KEEP_ALL and Lifespan to Infinite.  But as Roy said this can result on the DataWriter increasing ints memory up to what is allowed by the RESOURCE_LIMITS Qos and then blocking, in the case where some readers cannot keep up.  Even in these case there are some policies where the DataReader can demote pathologically non-reponsive DataReaders to best-efforts because you would not want to stop delivery of data to all readers just because some can't keep up... How long the DataWriter waits for that it also configurable via Qos.

Note that in many cases the "keep last" reliability is exactly what the application needs. Why waste time processing all data at the expense of looking at the current values? For many real-time variables it is better to look at the current values than the past... But there are other situations where every message counts and nothing can be lost. This is why with DDS you can configure the behavior per DataWriter/DataReader via Qos so you get the tradeoff that makes sense to the application.

Gerardo

Offline
Last seen: 11 months 4 days ago
Joined: 07/01/2016
Posts: 3

Thank you for your explanations! My understanding of KEEP_LAST=N was incorrect, I thought that DataWriter will not remove anything from its cache when new samples are coming. But it makes sense to remove them if we are only interested in the N newest samples...

Rafał

Offline
Last seen: 4 years 9 months ago
Joined: 03/25/2015
Posts: 33

Gerardo/Roy,

Perfect link I came across when trying to understand the difference between KEEP_LAST and KEEP_ALL in conjunction with resource limits.. Good post.. I have one question though.. From case II, which is

HistoryQosPolicy.kind = KEEP_ALL_HISTORY_QOS
ResourceLimitsQosPolicy.max_samples = N
ResourceLimitsQosPolicy.max_samples_pre_instance = N

I didn't see any reference to KEEP_ALL in your explanation. Can I assume that when max_samples is defined with some definite value, then KEEP_ALL has no effect?

Uday

Offline
Last seen: 3 months 6 days ago
Joined: 02/11/2016
Posts: 144

Hey Uday,

I believe (based on my experience) that your claim is false.

HistoryQosPolicy.kind affects durability and reliability.

In your examples (and assuming history.depth of N or higher):

if this is set for the reader

N samples will be received and then new samples will be rejected (unless you use "take" on the data reader)

if this is set for the writer

if the writer is reliable and has N unacked samples, it will be "stuck" until some timeout occurs (or until an ack arrives).

 

But that's what I know,

You may have a deeper look into the User Manual (or test it).

 

Have fun,

Roy.

Offline
Last seen: 4 years 9 months ago
Joined: 03/25/2015
Posts: 33

Hi Roy,

Thanks for quick response. I see that even in your current explanation you are referring to a finite depth Value. Resource limit size of N is what is used to ensure reliability. What I am not able to get,  yet, is the contribution of KEEP_ALL in this case.

I tried to look into documentation too. I can get individual explanations, but together I am clueless.

Uday

Offline
Last seen: 3 months 6 days ago
Joined: 02/11/2016
Posts: 144

Hey Uday,

 

I'll try to explain it better this time:

A writer that has KEEP_ALL should not remove samples before they are considered useless.

that means that a writer with KEEP_ALL can become stuck when it reaches the resource limits (or the history depth).

Similarly, a reader that has KEEP_ALL will not remove samples unless they are considered useless (a "take", for example, should make a sample "useless" for the reader).

That means that reader with KEEP_ALL can become stuck when it reaches the resource limits (or history depth).

setting the depth of history to infinite will mean that the limit is based on the resource limits settings.

setting those to infinite will mean that you are limited by what the OS will give you.

Using KEEP_LAST means that the writer (or reader) will keep the last N (based on history) samples and when a new sample arrives it will (in the default case) replace the oldest sample.

This is how I'm aware this works.

 

Hopefully this is more relevant?

 

Good luck,

Roy.

Offline
Last seen: 4 years 9 months ago
Joined: 03/25/2015
Posts: 33

Hi  Roy,

Thanks for such a detailed explanation. Just yesterday I could reach the same understanding - but with some experiments. Good that your explanation and my experiments are synching well - else I would have been left in some confused state :-)

Can you throw some light on the following experiment I had done:

Publisher - publishes 10000 samples in a row without break or waiting for any ack.. History kind is KEEP_LAST with depth 5. Resource limits is set to some small number (under 100)

Subscriber - blocks (for 10 sec) after first sample reception.. history kind is KEEP_LAST with depth 10. Resource limits is again some finite value (under 100)

Reliability kind is "RELIABLE_RELIABILITY_QOS".. 

My doubt:

- Though my subscribing application resumed only after my publisher is done, it got "all" the samples.. Is it the DDS middleware that book-keeps the samples as the communication is reliable?

One additional point.. When I performed the same experiment with 100000 samples, I could see some sample loss with confirmed delivery of last 5 samples. This is inline with the history settings, I guess.

What I couldn't get is what settings could be making sure of the 10000 successfull delivery, in first case?

Thanks.

Uday

Offline
Last seen: 3 months 6 days ago
Joined: 02/11/2016
Posts: 144

Hey Uday,

I have a few questions:

1. are you setting the resource limits to about 100 samples per instance and sending different instances?

2. how are you "blocking" on the subscriber side?

Given how I'm aware that keep last works, if you aren't sending different instances, only the last 10 can be received.

So assuming you are sending different instances

in the 10,000 example it is possible that you are sending between 1,000 and 10,000 different instances and that the reader manages to ack all of them in time so that the publisher doesn't have to skip any.

in the 100,000 example I'm guessing the publisher has to "give up" on some of the samples but i still assume that the last 5 samples per instance should arrive.

 

Hope this helps,

Roy.