Hello,
I am new to RTI DDS and I am trying to understand the following behaviour which I've encountered in my program:
I hava a DataWriter that writes (using write_untyped) 5 messages on some specipic topic name (all messages are samples of the same instance).
The DataWriter is being deleted by its publisher right after sending the messages.
I also have a DataReader that listens and reads messages on the same topic name. For some reason, my reader only recieves the first 2/3 samples and not the rest.
Howerver, if I wait a little bit before deleting the writer then my reader recieves all 5 messages.
I don't understand why isn't the reader getting all samples sent by the writer, and how does waiting between sending the last message and deleting the writer help in fixing that problem.
Any help would be appreciated!
So, if the code is deleting the DataWriter right after it calls datawriter->write(), it's possible that the data hasn't yet been received by the datareader. In the default configuration, DDS will definitely send it before the write() call returns, but it's not waiting for the DataReader to receive the data before letting you delete the DataWriter.
How fast are you sending the data (how long is your code waiting between sending each data sample)? I would only expect the data loss to happen every now and then (it's a race condition) for only the last sample sent if you're waiting a few milliseconds between each data sample.
Also, if you're using default QOS, the default is BEST_EFFORT and a HISTORY setting of KEEP_LAST 1, which means that at any time the datawriter or datareader will only buffer 1 data sample, and the connection is not reliable. When a datawriter is deleted, DDS will send a last data sample that doesn't have real value, but does let the Reader know that the Writer has been delete.
If you have a history buffer of 1, then the "disposed writer data sample" can overwrite any data previously received but not yet processed by the DataReader, so the application code may not see the last real data sample sent since it was immediately overwritten by a "displosed writer data sample". This can/will happen even when the network delivers every single packet from writer to reader successfully.
There are APIs that your code can call to know if the reader has acknowledged receipt of data sent by a writer when using a RELIABLE Reliability QOS. Look at
https://community.rti.com/static/documentation/connext-dds/6.0.1/doc/api/connext_dds/api_cpp/classDDSDataWriter.html#a57db2fe3bff153b070463e62f9cc4925
but you have to enable RELIABLE Reliability and KEEP_ALL History QOS for both reader and writer. The recommendation is to use the Builtin QOS profile, Generic.StrictReliable, to create your DataReaders and DataWriters if you want a reliable connection between them.
See this:
https://community.rti.com/examples/built-qos-profiles
Finally, what is your use case? DDS is not really designed for an application to start, connect with DDS, send some "messages", and then shutdown. While you could do that, as you can see, you have to do alot more to ensure that all data send is received. There's also a race condition for startup, where your application can start sending data before it successfully discover other applications that want the data. In that case, the initial data sent is never received by subscribers. See this:
https://community.rti.com/kb/why-does-my-dds-datareader-miss-first-few-samples
Fundamentally, in systems that use DDS, applications are started when the system is started and the applications don't shutdown until you shutdown the entire system...thus avoiding race conditions that potentially can affect messages sent around startup and shutdown.
Hi Howard, thanks for the quick response.
There is no waiting (in my code) between sending the data samples. The data loss happens every time and always for the last few samples (never the first ones) unless I wait a few millis before deleting the datawriter. My datareader has a KEEP_ALL History QOS so I don't believe that the samples are being overwritten at the reader's side. I also don't want to have a RELIABLE reliability QOS for my reader/writer.
I still don't understand why waiting a few millis before deleting the datawriter helps in fixing this problem. As you mentioned, DDS will definitely send it before the write() call returns so the reader is supposed to recieve the messages anyway (it does when we're waiting). (also the reader is waiting for the messages and isn't being deleted as a part of a test in my code)
So...not sure what you're trying to do. If you need to get all of the data, you should be using the RELIABLE Reliability. Otherwise, there's no guarantee that you'll get all of the data.
With BEST_EFFORT Reliability, KEEP_ALL History doesn't prevent DDS from overwritting data samples that are in queue that the user code hasn't taken(). It's basically in a mode of using the queue as a circular buffer, with new data overwritting the oldest when the queue becomes full.
So, the physical size of the queue and how it's managed with regards to instances (unique key values) is controlled by the ResourceLimits QOS Policy. By default, with Connext 5.x/6.x, the queue is "unlimited" in size, i.e., it'll allocate as much memory as needed if the user doesn't take data out of the queue. Did you change the ResourceLimits QOS Policy?
How is your subscribing application getting the received data? Through a listener called by a DDS internal thread or directly in your own thread blocked on a waitset or periodically polling?
You can certainly try to use wireshark to confirm if all of the data is sent to the subscribing application (assuming you're running on 2 different hosts). If you run on the same host, then wireshark won't be useful by default since by default, it'll use shared memory and bypass the network stack.