Statuses changes
Note: This article applies to RTI Connext 6.1.0 and above.
Connext DDS 6.1.0 has introduced some changes and new behaviors for statuses. This article introduces and clarifies some of those changes. First, let's explain some status definitions:
Lost: This status indicates that one or more DDS samples written by a matched DataWriter have failed to be received and will never be received. By reporting a sample as lost, the DataReader has declared that the sample will never be received, and will therefore not NACK it. It cannot be repaired by a DataWriter or resent to the DataReader.
Rejected: This status indicates that one or more DDS samples received from a matched DataWriter have been rejected by the DataReader because a resource limit would have been exceeded: for example, when the receive queue is full because the number of DDS samples in the queue is equal to the max_samples parameter of the RESOURCE_LIMITS QosPolicy. These rejected samples could be accepted later once the conditions for acceptance are met (e.g., once the number of samples in the queue becomes less than max_samples). A sample that is rejected can be resent any number of times until it is eventually reported as lost, dropped, or accepted.
Before understanding the status changes in release 6.1.0, let's first review sample and instance limits in 6.1.0:
Samples Limit: Happens when reaching max_samples in the RESOURCE_LIMITS QosPolicy.
Rejected Reason: DDS_REJECTED_BY_SAMPLES_LIMIT
Lost Reason: LOST_BY_SAMPLES_LIMIT
Instances Limit: Happens when reaching max_instances in the RESOURCE_LIMITS QosPolicy.
Rejected Reason: DDS_REJECTED_BY_INSTANCES_LIMIT
Lost Reason: LOST_BY_INSTANCES_LIMIT
Before release 6.1.0, when the Sample Limit or Instance Limit reasons were triggered, both Reject and Lost reasons were activated. But reporting samples as both lost and rejected was confusing and hard to justify. In addition, samples were sometimes rejected in best effort communication, which is incorrect according to the above definitions. Now, in release 6.1.0, any situation in which a sample is rejected in reliable communication will be mapped to a lost event in best effort communication.
To address the problem in previous releases of samples being both lost and rejected, we have modified Connext's behavior so that only one callback can be triggered at a time.
1. Instances Limit
When the max_instances limit is reached, a DataReader will try to make space for a new instance by replacing an existing instance according to the instance replacement kind set in instance_replacement in the DATA_READER_RESOURCE_LIMITS QosPolicy.
If there is no space for this new instance, the sample for the new instance will be lost with the reason: LOST_BY_INSTANCES_LIMIT.
Connext will optimize its resources and, after triggering LOST_BY_INSTANCES_LIMIT, won’t put any effort into trying to recover the sample, since it’s very unlikely that the DataReader will have space in its queue in the near future, although it could happen in some scenarios. This behavior was already present in previous versions.
Example:
Conditions:
Keyed type
max_instances 2
initial_instances 1
autopurge_disposed_samples_delay 1 nanose
Procedure:
Send Sample: Instance 1 - Sample 1 (“Sample 1.1”)
Send Sample: Instance 2 - Sample 1 (“Sample 2.1”)
Send Sample: Instance 3 - Sample 1 (“Sample 3.1”)→ triggers on_sample_lost() by instance limit (LOST_BY_INSTANCES_LIMIT) in the DataReader
Dispose Instance: Instance 1 → triggers autopurge_disposed_samples_delay in the DataReader
Send Sample: Instance 3 - Sample 2 (“Sample 3.2”)
Conclusions:
It is important to note that Connext triggers on_sample_lost() regardless of the RELIABILITY QosPolicy that it is using. “Sample 3.2” will be received, since Connext disposed Instance 1, but “Sample 3.1” won’t be repaired since it is marked as lost.
2. Samples Limit
To decide whether to trigger a Lost or Rejected reason kind for the Samples Limit, Connext checks the RELIABILITY QosPolicy and HISTORY QosPolicy. There are two possible scenarios:
2.1 Using DDS_RELIABLE_RELIABILITY_QOS and DDS_KEEP_ALL_HISTORY_QOS:
In this scenario, Connext triggers the on_sample_rejected() callback with the reason kind DDS_REJECTED_BY_SAMPLES_LIMIT. Then, it may ask for this sample to be repaired:
2.1.1 If the DataReader queue is full, Connext won’t ask for it to be repaired since there isn't space for it.
Example:
Conditions:
Keyed type
initial_instances 1
max_instances 3
initial_samples 1
max_samples 5
Strict Reliability (DDS_RELIABLE_RELIABILITY_QOS + DDS_KEEP_ALL_HISTORY_QOS)
Procedure:
Send Sample: Instance 1 - Sample 1
Send Sample: Instance 2 - Sample 1
Send Sample: Instance 3 - Sample 1
Send Sample: Instance 1 - Sample 2
Send Sample: Instance 2 - Sample 2
Send Sample: Instance 3 - Sample 2 → triggers on_sample_rejected() by sample limit (DDS_REJECTED_BY_SAMPLES_LIMIT) in the DataReader
- Dispose Instance: Instance 1→ triggers on_sample_rejected() by sample limit (DDS_REJECTED_BY_SAMPLES_LIMIT) in the DataReader
Conclusions:
If Connext tries to make room in the DataReader by disposing an instance, it will trigger another on_sample_rejected(). It won’t be able to dispose the instance because disposing an instance is another DDS message.
In this scenario, we will see a loop of HBs and ACKNACKs until Connext somehow makes room in the DataReader queue or we shut down the application. If there is space in the DataReader queue, Connext will try to repair the rejected sample. During this HB-ACKNACK loop, we will see the following:
The Heartbeats will announce what samples are available in the DataWriter. In this example, samples 1 through 7 are available. (That's the 6 samples listed above, plus the dispose message.) It’s important to note that in this scenario, sample 6 was rejected.
ACKNACKs inform that they are expecting sample 6, but they are not asking the DataWriter to repair it.
Here is the loop:
Heartbeat:
ACKNACK:
Note that here numBits is 0, meaning that Connext is not asking for a sample to be repaired since there is no space available in the DataReader queue.
2.1.2 If we, somehow, make room in the DataReader queue, Connext asks for the sample to be repaired.
In order to make room in the DataReader queue, we need to remove samples from it. This can be done, for example, by using the method take().
Example:
Conditions:
Keyed type
max_samples 5
initial_samples 1
Strict Reliability (DDS_RELIABLE_RELIABILITY_QOS + DDS_KEEP_ALL_HISTORY_QOS)
Procedure:
Send Sample: Instance 1 - Sample 1
Send Sample: Instance 2 - Sample 1
Send Sample: Instance 3 - Sample 1
Send Sample: Instance 1 - Sample 2
Send Sample: Instance 2 - Sample 2
Send Sample: Instance 3 - Sample 2→ triggers on_sample_rejected() by sample limit (DDS_REJECTED_BY_SAMPLES_LIMIT) in the DataReader
Within the on_sample_rejected() callback, Connext makes room in the DataReader queue using the method take().
Instance 3 - Sample 2 gets repaired
Conclusions:
If there is room in the DataReader queue, Connext asks the DataWriter to repair the rejected sample, and Connext is able to repair it.
In this scenario, Connext made room using the take() method from the on_sample_reject() callback. The take() method made space in the queue, and therefore, Connext ould ask for the sample to be repaired. In this case, the ACKNACK message is:
Note that here numBits is 1, meaning that we are asking for a sample to be repaired.
2.2 Using DDS_BEST_EFFORT_RELIABILITY_QOS and DDS_KEEP_ALL_HISTORY_QOS:
In this scenario, Connext triggers on_sample_lost() with the reason kind LOST_BY_SAMPLES_LIMIT and does not try to repair it.
Note: If using DDS_KEEP_LAST_HISTORY_QOS or the instance_replacement QosPolicy, max_samples will never be reached, since Connext will replace existing samples before reaching max_samples ever happens.