Is there a way to have strictly reliable communication between a topic's datawriter and persistence data reader, and likewise between a persistence data writer and the topic's data reader? I would like for my datawriter to block and time out after some specified period if the acknowledgement is not received. For my topic's datawriters and datareaders I have the direct_communication xml value set to false, so that all messages being written have to go through the persistence service before being received by the datareader. This is necessary to make sure data gets persisted but with my current settings I get no feedback if the persistence service goes down for whatever reason, I just eventually lose messages when the history depth of the writer is reached.
Currently I am using built-in profile, Generic.KeepLastReliable, with some modifications for my datareader and datawriter in my qos for my topic. In the qos for the persistence service the history kind is set to KEEP_LAST which I've noticed is different from the built-in Generic.StrictReliable profile which is set to KEEP_ALL_HISTORY.
Could you possibly provide an example of how to provide strict reliable communications between a data writer and a perstistence service data reader?
Thanks,
Jake
I am trying the following scenario:
I would like to run my application and see the expected communication with the persistence service. Then, to simulate some system error, I shut down just the persistence service. When I configure my datawriter talking to the persistence service with strict reliability I am expecting my call to .write() to block when the persistence service is down, but it currently isn't doing that. My configuration is described below. Any feedback would be extremely helpful!
XML Configuration:
For my datawriter that is writing to the persistence service I have tried using strict reliability by setting reliability kind to RELIABLE and history to KEEP_ALL_HISTORY. I also set the "max_blocking_time" to some value under 500 ms for now, and for now using max_samples_per_instance set as 1 to try and make sure nothing get's queued by the writer. My datareader on the persistence service is configured with strict reliability also.
A little more background:
I have messages being sent in response to a user click and I want that click to immediately either result in an event being stored by the persistence service or a failure notification. I never want a user click that resulted in a timeout with the persistence service to ever be queued up and sent when the persistence service returns. I want that sample to be dropped immediately if there is no acknowledgement.
I want a write to either succeed or fail. For a write to succeed I expect to get acknowledgement from the persistence service, if it fails I expect a timeout. If there is a timeout I would like for the sample to be dropped immediately.
The observed behavior:
What is happening is that when I have the resource limits for max_samples_per_instance fixed to some value, like 1, I am seeing the timeout exception if I send messages too quickly. I'm guessing this may be due to a time delay between the write and the persist. If I remove the resource limit, I no longer get the timeout exception.
Regardless of what I have as my max_samples_per_instance set to, when I kill my persistence service I never get a timout exception when writing. The write simply just returns as if nothing went wrong. When the persistence service comes back, the stored samples by the writer (based on the resource limits) are all dumped on my persistence service which is not what I want.
My question at this point:
How can I tell RTI to treat the persistence service as if it is always a known data reader that I need acknowledge from? The persistence service is my single point of failure so I need it to always be up and I need to be able to quickly detect if it goes down or is ever not up at any point. As mentioned before, any feedback is very helpful.
Thanks,
Jake
I would like to close out this thread since I found a solution. Anyone is welcome, however to contribute in case this might be helpful to others.
The solution I am using involves having my reader listen for the LivelinessChangedStatus and checking for when the alive_count becomes less than one. I categorize my reader as "healthy" for the time being if this alive_count field is >=1 for my purposes. If this field is less than 1 or there has not yet been a liveliness indication to let me know the reader is up, I assume my reader is not ready to receive messages yet.