Hi,
We are trying to solve a potential problem with the following scenario:
1. A writer is set up and sends 1 sample of a single instance
2. Persistence Service reads the sample and stores it.
3. The writer is closed.
4. Persistence Service crashes due to some problem (maybe the machine crashes for some reason).
5. All readers are receiving the on_liveliness_change callback, with the instance-state being "NOT_ALIVE_NO_READERS"
6. Persistence Service starts up again.
We would like to notify all readers that all instances that were resolved as "NOT_ALIVE_NO_WRITER" are now alive, because Persistence Service is up.
However, I've noticed that the instance is not being sent again to readers. Also, I see that the "on_liveliness change" callback is called, but the read operation throws the "NO_DATA" exception, so we can't tell which instances are now back alive.
How can we make readers be aware of liveliness of instances being changed from "NOT_ALIVE_NO_WRITERS" to "ALIVE"?
Hello Meir,
I am not 100% sure if this will work as I describe here but I think the only way there is to make the readers become aware that the persistence service is re-started and has the instances is to use the DataReader PropertyQos and set the property
dds.data_reader.state.filter_redundant_samples
to the value0
. ( Connext DDS User's Manual Section 12.4.4 How to Configure a DataReader for Durable Reader State.) When you do this the DataReader will not "remember" the samples that the PersistenceService sent to it so when the PersistenceService re-starts it will receive all the samples from the PersistenceService again. This will give you the notification, but you will also duplicate samples (i.e. samples you already receives) which may be undesirable in some cases.As a side note, I am not convinced that the behavior you are describing, where the fact that a PersistenceService is around causes the DataWriters to be considered 'alive' is really the most "correct" one from the DDS point of view. Yes, this is what RTI PersistenceService does by default, but the PersistenceService can also be configured to operate differently... See below.
When configuring a PersistenceGroup within the PersistenceService configuration there is an option called
propagate_dispose
(See RTI Connext DDS User's Manual Section 27.8 "Creating Persistence Groups"). When this option is set to TRUE and the Persistence Service notices that all the DataWriters that are writing an instance disappear then the PersistenceService will also unregister the instance so that applications know that DataWriter is updating the instance anymore. So the DataReader will get a NOT_ALIVE_NO_WRITERS even with the persistence service running. To me this behavior appears more correct because the PersistenceService is holding the last value(s) of the Instance but by itself it will not update the instance and this is really what the NOT_ALIVE_NO_WRITERS indicates. I can even imagine that a future version of RTI PersistenceService may make this behavior (propagate_dispose set to TRUE) the default.Given the above I wonder if there may be a better way that using the NOT_ALIVE_NO_WRITERS notification to accomplish what you are ultimately after... If you post here additional details on why you want the behavior you describe perhaps we could think of other ways to accomplish it.
Gerardo
Hi Gerardo,
Thank you for your answer.
If I'm not wrong, setting
dds.data_reader.state.filter_redundant_samples
to the value 0 will cause duplicate samples when using two PersistenceServices (we use two PersistenceServices for redundancy). If that is the case, we probably can't use this option.Also, I'm not sure I understand what you wrote about PersistenceService propagating dispose messages. Isn't the point of PersistenceService is to backup durable data for cases where the original writer has stopped working?
I also think that our solution might not be the right one. What we're really trying to accomplish is:
We have many topics with all readers and writers being persistent. For those topic, we don't need use the on_liveliness_chaneged callback, we are only interested in the data itself. Also, most of the time the writers are just used to send a batch of data and then closed and the data is being held by PersistenceService.
Also, we have a few topics which are durable, but the data is only valid while the original writer is up and working (like "State" information that is not relevant if the application crashed). PersistenceService is not configured to keep data from these topics.
We have one API that we wrote in order to wrap dds with something simpler. In this API We've implemented a callback interface that has two methods:
onDataArrival
,onDataRemoval
. We use this callback in all of our reader applications. In order to support the requirement that data will be invalided after the writer had closed (which is relevant only for a little amount of our topics), we callonDataRemoval
for disposes andon_liveliness_changed
events.Because of working with PS, our solution is fine while PS is running, but doesn't support crashes of the PS, and therefore not fully fault-tolerant.
I think now that there is no option other then creating two callback interfaces in our API in order to separate between the case. Unfortunately this will result in code changed throughout the entire system...
To further explain the rationale for trying to detect changes, suppose we have a topic on which we send the state of a certain module.
When the module is up, it sends an 'OK' state to all DataReaders, it might send 'ERROR' when it's faulty, but there could also be a case where the module is OK but it's disconnected from the network. In such a case, we'd like the reader applications to be aware of the change and act accordingly (same goes for when the module reconnects).
The problem lies in the fact that as far as the module is concerned, its state didn't change, so no new data is sent. So the module can disconnect and connect multiple times, and each time we get only on_liveliness_changed callback with no actual way to infer the change in state.
What could be done in such a scenario?
Regards,
Michael