Why I am receiving LOST_BY_WRITER?

You can find the definition of “LOST_BY_WRITER” here in the API: 

A DDSDataWriter removed the sample before being received by the DDSDataReader. This constant is an extension to the DDS standard.

 

If you are using KEEP_LAST as the History kind, the reason for this data loss might be the following: when the DataWriter queue is full and one of the matching reliable DataReaders has not acknowledged a sample from the queue, the DataWriter will write the next sample and it will overwrite the oldest sample in the queue. When the DataReader asks for this sample that has been overwritten in the DataWriter queue, the DataWriter will send a gap message since the sample no longer exists.  (Here is a knowledge base article that explains the different kinds of non-user data messages, including gap messages.)

A gap message is sent by the DataWriter in a Reliable scenario when a sample is lost; LOST_BY_WRITER is the DDS_SampleLostStatusKind of that lost sample.

The lost sample (and LOST_BY_WRITER status)  might happen because the History kind is KEEP_LAST and the configured depth is too small, or because the DataWriter is writing faster than the DataReader is acknowledging.

You can avoid these gaps by either, (a) increasing the History depth, or (b) using KEEP_ALL as History kind and Reliable as the Reliability QosPolicy:

(a) Increasing the History depth may not prevent the gaps from being sent (it will make them less frequent) unless the History depth is big enough.  

(b) If you set History kind to KEEP_ALL in a Reliable profile and the DataWriter queue is full because a resource limit is reached, the DataWriter will block for max_blocking_time until the DataReader acknowledges a sample (allowing the DataWriter to remove it from its queue and releasing resources for the next sample).

If the max_blocking_period elapses without the DataWriter releasing space in it’s queue, then the write call will return a DDS_RETCODE_TIMEOUT error code, without writing anything.

 

If you are using KEEP_ALL as the History kind and RELIABLE as the Reliability kind, you can have a LOST_BY_WRITER state for a sample if the DataWriter considers the DataReader unresponsive/inactive. Here is how it happens:

  • When a DataReader is unresponsive, first, the DataWriter asks the DataReader to acknowledge the sample; the DataWriter does this by sending heartbeats at a faster rate (configured with fast_heartbeat_period and max_heartbeat_retries). Also, when there is no more space in the DataWriter’s queue (due to the send_window_size being reached), the DataWriter will block for max_blocking_time.

To avoid the DataWriter blocking when there is a non-progressing DataReader, you can use the property inactivate_nonprogressing_reader.

  • Once the DataReader is considered unresponsive by the DataWriter, the DataWriter will continue sending heartbeats to the DataReader, but it won’t expect an ACKNACK. In this case, a sample can be overwritten in the DataWriter’s queue as explained above, provoking the LOST_BY_WRITER DDS_SampleLostStatusKind.

In order to fix this situation, you can change max_heartbeat_retries to a bigger value or —if you don’t want the slow DataReader to be considered as unresponsive at all —to DDS_LENGTH_UNLIMITED. The max_heartbeat_retries property determines the maximum number of heartbeat retries before a DataReader is considered inactive.

For more information about this property, see the section “Controlling How Many Times Heartbeats are Resent (max_heartbeat_retries)” in the User’s Manual.