Understanding on_sample_lost LOST_BY_WRITER

4 posts / 0 new

Last post

Thu, 02/21/2019 - 15:51

jasontiller2

Offline

Last seen: 5 years 1 month ago

Joined: 02/02/2018

Posts: 13

Understanding on_sample_lost LOST_BY_WRITER

Hello, all,

We have recently begun to see a significant number of samples being lost in our system, with on_sample_lost indicating that the sample was dropped by the writer (LOST_BY_WRITER). Most of our topics are best-effort reliability.

Does anyone know how this particular error is detected and communicated?

My initial assumption is that the the publisher is trying to send samples faster than they can be sent over the network. Is that assumption valid?

For some background, our system consists of:

2 nodes (Linux & Qnx)
12 hosts (2 on Qnx, rest on Linux)
30 topics
~10,000 samples/sec
Shared memory & UDP transports

Any suggestions? Thanks!

Keywords:

Thu, 02/21/2019 - 17:21

garyb

Offline

Last seen: 5 months 12 hours ago

Joined: 09/23/2018

Posts: 63

You may need to experiment with some of the parameters in your Resource Limits QoS to get a better idea if the loss is due to history life-span, queue sizes, etc.

I've found this community article useful for problems such as this:

Tuning Queue Sizes and Other Resource Limits:

https://community.rti.com/static/documentation/connext-dds/5.3.1/doc/manuals/connext_dds/html_files/RTI_ConnextDDS_CoreLibraries_UsersManual/Content/UsersManual/Tuning_Queue_Sizes_and_Other_Resource_Li.htm

I would start with the history kind and depth to see if that has an effect.

Thu, 02/21/2019 - 18:28

jasontiller2

Offline

Last seen: 5 years 1 month ago

Joined: 02/02/2018

Posts: 13

Thank you for your reply, Gary. That is indeed an instructive (and dense) article. I haven't fully digested it yet, but one thing jumped out at me:

Aren't those queue sizes and tuning strategies mostly geared towards RELIABLE delivery samples? Almost all of our samples are best effort, and almost all of the on_sample_lost errors are reported on topics with best-effort delivery.

Am I misunderstanding the contents of the article?

Fri, 02/22/2019 - 10:26

garyb

Offline

Last seen: 5 months 12 hours ago

Joined: 09/23/2018

Posts: 63

Hi Jason,

You are correct, most of the tuning strategies are for reliable delivery so no need to go down that route. Sorry for the detour.

In terms of the cause of the LOST_BY_WRITER on_sample_lost status indicator, I was discussing this with a few colleagues and this is a general message. This message is triggered when a reader is waiting for a specific sequence number data packet but was delivered a message with a different sequence number. The reader makes the assumption that the expected sequence was dropped by the writer. However, it could have been lost in a transient for any number of reasons.

Since you started seeing a lot more of these messages recently, it does raise the question of what has changed in general: network changes or congestion, faster/slower machines, etc.

You could probably get a better idea of the cause by enabling logging and/or instrumenting your code.

Secondary menu

Navigation

Understanding on_sample_lost LOST_BY_WRITER

RTI Community Portal Terms of Use

Search

Secondary menu

You are here

Navigation

User login

Understanding on_sample_lost LOST_BY_WRITER

RTI Community Portal Terms of Use