Why am I receiving LOST_BY_WRITER?

You can find the definition of “LOST_BY_WRITER” in the API: 

A DDSDataWriter removed the sample before being received by the DDSDataReader. This constant is an extension to the DDS standard.

If you are using KEEP_LAST as the History kind, the reason for this data loss might be the following: when the DataWriter queue is full and one of the matching reliable DataReaders has not acknowledged a sample from the queue, the DataWriter will write the next sample and it will overwrite the oldest sample in the queue. When the DataReader receives a HB message that announces that a sample which the DataReader is waiting for is no longer in the DataWriter's queue, the DataReader will report that sample as lost. 

The lost sample (and LOST_BY_WRITER status)  might happen because the History kind is KEEP_LAST and the configured depth is too small, or because the DataWriter is writing faster than the DataReader is acknowledging.

You can avoid these sample losses by either, (a) increasing the History depth, or (b) using KEEP_ALL as History kind and Reliable as the Reliability QosPolicy:

(a) Increasing the History depth may not prevent the sample losses (it will make them less frequent) unless the History depth is big enough.  

(b) If you set the History kind to KEEP_ALL in a Reliable profile and the DataWriter queue is full because a resource limit is reached, the DataWriter will block for max_blocking_time (described here for the current release or here for release 5.3.1) until the DataReader acknowledges a sample (allowing the DataWriter to remove it from its queue and releasing resources for the next sample).

If max_blocking_time elapses without the DataWriter releasing space in it’s queue, then the write call will return a DDS_RETCODE_TIMEOUT error code, without writing anything. 

If you are using KEEP_ALL and RELIABLE, you can have a LOST_BY_WRITER state for a sample if the DataWriter considers the DataReader unresponsive/inactive. Here is how it happens:

  • When a DataReader is unresponsive, first, the DataWriter asks the DataReader to acknowledge the sample; the DataWriter does this by sending heartbeats at a faster rate (configured with fast_heartbeat_period and max_heartbeat_retries). Also, when there is no more space in the DataWriter’s queue (due to the send_window_size being reached), the DataWriter will block for max_blocking_time.

To avoid the DataWriter blocking when there is a non-progressing DataReader, use the property inactivate_nonprogressing_reader (described here for the current release or here for release 5.3.1).

  • Once the DataReader is considered unresponsive by the DataWriter, the DataWriter will continue sending heartbeats to the DataReader, but it won’t expect an ACKNACK. In this case, a sample can be overwritten in the DataWriter’s queue as explained above, provoking the LOST_BY_WRITER DDS_SampleLostStatusKind.

In order to fix this situation, you can change max_heartbeat_retries (described here for the current release or here for release 5.3.1) to a larger value or—if you don’t want the slow DataReader to be considered as unresponsive at all—to DDS_LENGTH_UNLIMITED. The max_heartbeat_retries property determines the maximum number of heartbeat retries before a DataReader is considered inactive.

For more information about this property, see Controlling Heartbeats and Retries with DataWriterProtocolQosPolicy, in the User's Manual, here for the current release or here for release 5.3.1.