gap and heartbeat difference discussion

2 posts / 0 new
Last post
rip's picture
Last seen: 6 hours 27 min ago
Joined: 04/06/2012
Posts: 324
gap and heartbeat difference discussion

This topic is an off-shoot from another, on which I Made a Mistake :)  As Neil Gaiman says... "If you're making mistakes, it means you're out there doing something."

The upshot of the pertinant bit of the discussion was what happens when KEEP_LAST is set to some arbitrary N, the Writer side cache is filled with unacknowledged samples, and the application calls .write(...) on that writer.  My (imperfect) understanding (alas!) was that the writer would block until a slot was available (gray used to indicate incorrect information).

The writer doesn't hold on to keep_last data, even if it wasn't acknowledged... unexpected. I guess I assumed "keep_last" semantics were "Keep the last, unacknowledged, instances".  Turns out that the semantics of KEEP_LAST are exactly that.  If your application has a KEEP_LAST == 1 then exactly 1 will be kept, and then replaced on the next .write(...), because the previous is now no longer relevant.

Reading through the protocol, RTPS would in this event (eventually) send a gap submessage ( to indicate that the dropped sequences were no longer relevant, which is different from sending a heartbeat to indicate that the sample is no longer available.  My assumption here is that the heartbeat, which includes information about what samples are available, would show that the sample was no longer available and that the reader should react accordingly. 

The consequence is that a reader then only has a window of opportunity to request a missed instance. 

What happens on the Reader side... when a gap submessage is sent, will the reader's state/callback on_sample_lost get called?  If the underlying key concept is relevance (gap == sample was no longer relevent, heartbeat == sample is no longer available), then assumption is that on_sample_lost is NOT called, because the sample was not relevant according to the writer.  A quick test validates the behavior:  Reliable, keep_last 3, send sequence #0-4, artificial pause on the reader before reading, only sequence #2-4 are displayed, and no on_sample_lost.

Feel free to continue the discussion below.



Gerardo Pardo's picture
Last seen: 1 month 4 weeks ago
Joined: 06/02/2010
Posts: 598

Hi Rip,

If a sample in the history cache is replaced with bacause the History kind is set to KEEP_LAST and newer samples are written then those 'replaced' samples are not considered lost.

The setting of KEEP_LAST was telling the DataWriter that it is not important to communicate every single change to a particilar instance. Rather the important thing is to communicate the 'latest values'. This is actually quite helpful in situations where samples are written faster than certain DataReaders can handle. The DataWriter can just keep and send the most recent/relevant data per instance rather than of delaying the writer, or consuming a lot of memory saving the intermediate values.  Of course this is only sensible for certain kinds of data, like samples of continuous data signals. It is not appropriate for messages such as alarms.  Those other types of data where all samples must be delivered should set HISTORY kind to KEEP_ALL.

As you suggested, at the wire protocol those samples replaced due to history result in RTPS GAP messages. As you also said the GAP tells the DataReader that those samples are not relevant to the DataReader. This does not indicate the samples were lost so the on_sample_lost is indeed not called. Incidentally, this mechanism is also used for writer-side content filtering. Samples that are filtered by the DataWriter  for a specific DataReader result on GAP messages to the DataReader and are also not considered 'lost'.