5.5. Reliability Protocol and Wire Representation
5.5.1. [Critical] Writer-side filtered samples not always marked as acknowledged when application acknowledgment was used
When application acknowledgments were used in conjunction with ContentFilteredTopics, samples that were writer-side filtered were not always immediately marked as acknowledged. This problem, which was documented as fixed via RTI Issue ID CORE-6132 in previous releases (such as 7.3.0), was not completely fixed in those releases. Specifically, DataWriters failed to acknowledge writer-side filtered samples when the DataReader used a filter expression that was based only on key fields or only on metadata fields. This problem is now fixed, and DataWriters now always mark writer-side filtered samples as acknowledged.
Note: In release 5.3.1.45, both CORE-6132 and CORE-14980 were fixed together at the same time.
[RTI Issue ID CORE-14980]
5.5.2. [Critical] Late-joiner DataReader may have stopped receiving samples from DataWriters using a finite durability.writer_depth
A late-joining DataReader may have stopped receiving samples from a
DataWriter configured with a finite durability.writer_depth. This
issue might have occurred under the following conditions:
- The *DataReader* was using multicast.
The DataWriter was configured to use an asynchronous publisher.
When the issue occurred, you would have observed an infinite exchange of Heartbeat messages, followed by NACK messages, on the wire without the DataWriter re-sending the samples mentioned in the NACK messages.
[RTI Issue ID CORE-14930]
5.5.3. [Critical] Unexpected data loss when using batching and finite reader resource limits
Consider the following scenario:
A DataWriter enables batching.
A reliable, keep-all history DataReader sets its
resource_limits.max_samplesto a finite value. For example,max_samples= 4.The DataWriter writes enough batches to exceed the
resource_limits.max_samples, but the DataReader initially fails to receive one of them. For example, the reader receives batch sequence numbers 1, 3, 4, 5, and then batch sequence number 2 gets repaired.The application using the DataReader does not call
take()to remove any of the received samples from the reader queue.
For the simplicity of this example, let’s also assume that there is only one sample per batch.
The problem was that the DataReader would never deliver batches 2 through 4 to the user application. After fixing the problem, the DataReader now delivers batches 1 through 4 to the user application.
[RTI Issue ID CORE-16005]
5.5.4. [Critical] Repair sample was not sent if the sample was smaller than either transport’s message_size_max
This issue was fixed in previous releases but not documented at those times.
Consider the following scenario:
The DomainParticipant is using two transports.
The transports have different values for
message_size_max.The DataWriter’s
publish_mode.kindisSYNCHRONOUS.The DataWriter writes a sample whose serialized size is between the two values of
message_size_max.The DataReader’s
reliability.kindisRELIABLE.
In release 5.0.0, this sample was successfully sent as live data using
the transport with the larger message_size_max. But if this sample
ever had to be resent as repair data, the resend attempt would fail with
the following error message:
!write resend. Reliable large data requires asynchronous writer
The problem was the inconsistency between the behavior of live data and repair
data. This problem affected release 5.0.0. A fix was made in 5.3.0
(undocumented) so both the live data and repair data were successfully sent
using the transport with the larger message_size_max.
Further changes were made in 5.3.1.20 and 6.0.1, as part of RTI Issue ID
CORE-9287 (see Section 4.3.1 Invalid fragment size when using FlowController
in the Release Notes for 6.0.1).
Starting in those releases, both live data and repair data in the above
scenario fail to be sent because one of the message_size_max values
is too small for the sample. The send attempt fails with the following error
message:
COMMENDFacade_canSampleBeSent:NOT SUPPORTED | Reliable fragmented data requires asynchronous writer.
The behaviors for sending live data and repair data are now consistent.
If either transport’s message_size_max is too small, neither live
nor repair data will be sent.
[RTI Issue ID CORE-9297]
5.5.5. [Critical] Sample may not have been delivered to a DataReader in a Required Subscription
A DataWriter configured with Required Subscriptions might not have delivered some samples to any DataReader that was part of those subscriptions. These samples were incorrectly lost.
This issue occurred when some samples had to be gapped (through GAP messages) to the DataReaders in the Required Subscription—for example, when a DataReader used a ContentFilteredTopic. It could also occur when DataReaders belonging to a Durable Subscription were restarted.
[RTI Issue ID CORE-16166]
5.5.6. [Major] DataWriters with finite durability.writer_depth may have sent repairs over unicast instead of multicast
A DataWriter configured with finite durability.writer_depth is
intended to use unicast to send repairs to a late-joining DataReader
configured with a multicast address, then switch to multicast once the
DataReader has caught up. This switch from unicast to multicast did
not occur as expected. As a result, late-joining DataReaders may have
continued receiving repair data via unicast instead of multicast even
after catching up, potentially affecting network performance.
The issue only affected DataWriters using batching or multichannel features.
[RTI Issue ID CORE-14823]
5.5.7. [Major] Reliable large data performance issues due to redundant fragment repairs
In this scenario, while publishing large data reliably, a DataWriter was sending a sample and the DataReader had received some fragments from that sample. If the DataReader received a heartbeat message, the DataReader may have requested all the fragments missing from that sample (by sending a NACK_FRAG message), including the ones that had not been sent yet.
When the DataWriter received the NACK_FRAG message, this triggered the DataWriter to send redundant repair messages for those fragments that had not been sent. This behavior affected the DataWriter’s performance. Now the DataReader only requests the fragments that the DataWriter has sent and are missing.
[RTI Issue ID CORE-14599]