5.5. Reliability Protocol and Wire Representation

5.5.1. [Critical] Writer-side filtered samples not always marked as acknowledged when application acknowledgment was used

When application acknowledgments were used in conjunction with ContentFilteredTopics, samples that were writer-side filtered were not always immediately marked as acknowledged. This problem, which was documented as fixed via RTI Issue ID CORE-6132 in previous releases (such as 7.3.0), was not completely fixed in those releases. Specifically, DataWriters failed to acknowledge writer-side filtered samples when the DataReader used a filter expression that was based only on key fields or only on metadata fields. This problem is now fixed, and DataWriters now always mark writer-side filtered samples as acknowledged.

Note: In release 5.3.1.45, both CORE-6132 and CORE-14980 were fixed together at the same time.

[RTI Issue ID CORE-14980]

5.5.2. [Critical] Late-joiner DataReader may have stopped receiving samples from DataWriters using a finite durability.writer_depth

A late-joining DataReader may have stopped receiving samples from a DataWriter configured with a finite durability.writer_depth. This issue might have occurred under the following conditions:

-  The *DataReader* was using multicast.
  • The DataWriter was configured to use an asynchronous publisher.

When the issue occurred, you would have observed an infinite exchange of Heartbeat messages, followed by NACK messages, on the wire without the DataWriter re-sending the samples mentioned in the NACK messages.

[RTI Issue ID CORE-14930]

5.5.3. [Critical] Unexpected data loss when using batching and finite reader resource limits

Consider the following scenario:

  • A DataWriter enables batching.

  • A reliable, keep-all history DataReader sets its resource_limits.max_samples to a finite value. For example, max_samples = 4.

  • The DataWriter writes enough batches to exceed the resource_limits.max_samples, but the DataReader initially fails to receive one of them. For example, the reader receives batch sequence numbers 1, 3, 4, 5, and then batch sequence number 2 gets repaired.

  • The application using the DataReader does not call take() to remove any of the received samples from the reader queue.

For the simplicity of this example, let’s also assume that there is only one sample per batch.

The problem was that the DataReader would never deliver batches 2 through 4 to the user application. After fixing the problem, the DataReader now delivers batches 1 through 4 to the user application.

[RTI Issue ID CORE-16005]

5.5.4. [Critical] Repair sample was not sent if the sample was smaller than either transport’s message_size_max

This issue was fixed in previous releases but not documented at those times.

Consider the following scenario:

  • The DomainParticipant is using two transports.

  • The transports have different values for message_size_max.

  • The DataWriter’s publish_mode.kind is SYNCHRONOUS.

  • The DataWriter writes a sample whose serialized size is between the two values of message_size_max.

  • The DataReader’s reliability.kind is RELIABLE.

In release 5.0.0, this sample was successfully sent as live data using the transport with the larger message_size_max. But if this sample ever had to be resent as repair data, the resend attempt would fail with the following error message:

!write resend. Reliable large data requires asynchronous writer

The problem was the inconsistency between the behavior of live data and repair data. This problem affected release 5.0.0. A fix was made in 5.3.0 (undocumented) so both the live data and repair data were successfully sent using the transport with the larger message_size_max.

Further changes were made in 5.3.1.20 and 6.0.1, as part of RTI Issue ID CORE-9287 (see Section 4.3.1 Invalid fragment size when using FlowController in the Release Notes for 6.0.1). Starting in those releases, both live data and repair data in the above scenario fail to be sent because one of the message_size_max values is too small for the sample. The send attempt fails with the following error message:

COMMENDFacade_canSampleBeSent:NOT SUPPORTED | Reliable fragmented data requires asynchronous writer.

The behaviors for sending live data and repair data are now consistent. If either transport’s message_size_max is too small, neither live nor repair data will be sent.

[RTI Issue ID CORE-9297]

5.5.5. [Critical] Sample may not have been delivered to a DataReader in a Required Subscription

A DataWriter configured with Required Subscriptions might not have delivered some samples to any DataReader that was part of those subscriptions. These samples were incorrectly lost.

This issue occurred when some samples had to be gapped (through GAP messages) to the DataReaders in the Required Subscription—for example, when a DataReader used a ContentFilteredTopic. It could also occur when DataReaders belonging to a Durable Subscription were restarted.

[RTI Issue ID CORE-16166]

5.5.6. [Major] DataWriters with finite durability.writer_depth may have sent repairs over unicast instead of multicast

A DataWriter configured with finite durability.writer_depth is intended to use unicast to send repairs to a late-joining DataReader configured with a multicast address, then switch to multicast once the DataReader has caught up. This switch from unicast to multicast did not occur as expected. As a result, late-joining DataReaders may have continued receiving repair data via unicast instead of multicast even after catching up, potentially affecting network performance.

The issue only affected DataWriters using batching or multichannel features.

[RTI Issue ID CORE-14823]

5.5.7. [Major] Reliable large data performance issues due to redundant fragment repairs

In this scenario, while publishing large data reliably, a DataWriter was sending a sample and the DataReader had received some fragments from that sample. If the DataReader received a heartbeat message, the DataReader may have requested all the fragments missing from that sample (by sending a NACK_FRAG message), including the ones that had not been sent yet.

When the DataWriter received the NACK_FRAG message, this triggered the DataWriter to send redundant repair messages for those fragments that had not been sent. This behavior affected the DataWriter’s performance. Now the DataReader only requests the fragments that the DataWriter has sent and are missing.

[RTI Issue ID CORE-14599]