This question is about TRANSIENT_LOCAL, RELIABLE topics when the communication link goes down for a long enough period for the samples to get invalidated. The ConnextDDS user manual states that TRANSIENT_LOCAL samples will be delivered if the samples have not yet been acknowledged.
Consider one DataReader and one DataWriter with the above mentioned QoS's and the communication link goes down for a time period for the samples to become invalidated. If that link gets restored, will the samples again be delivered to the DataWriter ? Or will the acknowledged state persists and the data will not be delivered ?
From testing, it seems the actual samples do not get delivered. However, when one monitors thereceived_sample_count and alive_count, these counts change as if the sample was delivered. However, when one does a read, there are no samples available (The read is made with SampleStateKind.ANY_SAMPLE_STATE, ViewStateKind.ANY_VIEW_STATE and InstanceStateKind.ALIVE_INSTANCE_STATE
The Use Case:
The use case here is that the reader should be notified if the data samples become invalid (after extended communication down state) and are removed from the reader cache (this works). When communication is restored, the reader should be notified again (this also seems to work). What does not seem to happen is that the reader cache is not updated with a sample even though the "samples available" count increases. It is also not desirable to restart either application to referesh the reader's cache.
Maybe another way to ask this question is: Is there a way for readers to request a refresh of its caches after it detected data became invalid after a broken communication link ?
Thanks
Nico
Hello Nico,
I do not understand well what you meant by "samples becoming invalidated" in general samples are valid forever unless you set a finite LIFESPAN which would cause them to be "removed" after the lifespan time elapses. Another thing that can remove samples is a HISTORY kind KEEP_LAST if newer samples replace the older ones for the same key. DO you mean they are marked with an instance state NOT_ALIVE_NO_WRITERS?
In the case you mentioned if the link is doen long enough the discovery process will signal the remote Participant as "stale" this will cause the DataWriter to "forget" about the matched DataReader and likewise the DataReader to "forget" about the matched DataWriter. This may fire notifications such as "NOT_ALIVE_NO_WRITERS" on some DataReader instances, and you will also see it if you monitor the match status.
Even if the DataWriter forgets about the DataReader, the DataReader doe not totaly forget the Writer. It keeps minimal state to remember the GUID of the DataWriter and the higuest sequence number it had received from that DataWriter.
Then when the Link is up again the DataWriter and DataReader will discover each other and create a new "match'". Then the DataWriter ill hearteat the DataReader to announce the sequence numbers that the DataReader should have and the DataReader will realize that it already has many of them (because of the GUID of the DataWriter and the state of the higuest sequence number associated with that GUID that it kept). Based on this it will "NACK" only the things it had not received before.
Even if the samples were somehow sent by the DataWriter, the DataReader will drop them basen on the fact that it had received them already. Maybe this is cousing the increase on the received_sample_count you mention.
Now if the DataWriter does not write new samples for an instance since the Link was down, then after the communication is re-stablished no samples will be received by the DataReader so it will not know that the instance state is no longer NOT_ALIVE_NO_WRITERS. I agree this use-case is not well covered. We have in fact am RFE to address this situation and "refreshen" the instances.
There are some potential workarounds, there is a new TopicQuery feature available with Connext DDS 5.3 which can help the DataReader query a s "snapshot" of the DataWriter cache. You can find more about this feature in Chapter 22 of the Connext DDS 5.3 User's Guide, titled "Topic Queries"
Gerardo
Hi Gerardo
Thanks for that information, it was helpful. I experimented again with some of what you said and it cleared up the misunderstandings I had. One of the realizations was that samples can still be valid (from SampleInfo) even if the state is NOT_ALIVE_NO_WRITERS_INSTANCE_STATE.
Since the samples in our use case are configuration data that is a one-time publication on startup, it will be OK to query the data with InstanceStateKind.ANY_INSTANCE_STATE instead of InstanceStateKind.ALIVE_INSTANCE_STATE. The valid samples can then still be queried even though the InstanceStateKind is not alive.
The only information missing now is the update to instance state when the writer is alive again and it seems the RFE you mentioned will address this situation in the future. That is good news.
Thank you
Nico