Samples for InstanceState Change from NOT_ALIVE_NO_WRITERS to ALIVE

3 posts / 0 new
Last post
Offline
Last seen: 5 years 6 months ago
Joined: 10/29/2015
Posts: 12
Samples for InstanceState Change from NOT_ALIVE_NO_WRITERS to ALIVE

Hi everyone,

I'm currently experimenting with the following situation:

  • keyed topic, Reliable, TransientLocal
  • Keep All History on DataWriter side, Keep Last 2 on DataReader side
  • A single DataWriter on one host, a single DataReader on another
  • I always take() all samples from the DataReader

I go through the following steps:

  • Start reader program, writer is running on other host
  • I receive a number of samples for different instances from the writer
  • I then pull my network cable
  • After the default liveliness timeout, I receive a single invalid sample for each instance indicating the new InstanceState not_alive_no_writers (as expected)
  • I then plug the cable back in
  • Almost immediately I see the liveliness change for the DataReader in general
  • I do not receive any samples, valid or invalid, showing that the aforementioned instances have gone back to InstanceState alive
  • If I let the writer republish a sample for one of the instances, I see that the reader receives it (and then considers the sample to be alive again). It still does not receive any samples for the other instances, so I have to assume they are still not alive (even though they shoud be).
  • If I restart my reader program, it finds the writer as expected and gets all samples again.

What is the reason I get an invalid sample when the instance goes from alive to not_alive_no_writers, but I do not get a sample when it goes back the other way? Is that a bug of some sort in RTI DDS v5.2.0, or am I misunderstanding something?

Note that I cannot use read() instead of take(). That means I apparently have no means of finding out whether my instances have come back alive. There doesn't seem to be any way to find out an instance's InstanceState from a DataReader when I have nothing but its InstanceHandle. Or is there?

Thanks and best regards,

Jan

Offline
Last seen: 3 months 6 days ago
Joined: 02/11/2016
Posts: 144

Hello,

This is the intentional behavior, although it repeatedly has customers surprised.

When the writer updates an instance it actually sends a sample belonging to that instance, with each sample having a unique sequence number (allowing rti to maintain order of delivery and detect gaps).

Each sample is received by the data reader and then passed to the application (this happens once, regardless of [possible] duplicates of a specific sample being received).

When the network cable is pulled out, the reader detects this and since the instances are only held by the writer that went "missing" they are moved to the not alive no readers state.

 

When the network cable is plugged back we have to appreciate the fact that all of the samples of that writer were already sent by the writer, received by the reader, and passed on to the application.

Since we don't want the same sample being sent to the application twice, it's not.

 

One way to reduce the likelihood of this scenario is to use persistence service (or a similar solution) to maintain the liveliness of the instances even when their original writer is some how "disconnected" temporarily from the network.

Of course this does not solve the issue of a reader losing network connectivity.

 

RTI has mentioned in the past that they are working on a solution that will allow customers to change this specific behavior but I believe it has been a long time since this was mentioned and I haven't seen any announcements about this ability being introduced anytime soon.

Another option would be to use application logic to overcome this "gap" (for example, using writer listeners to republish data when a reader is matched, at the cost of network bandwidth and some pressure on the writer or alternatively maintain the data received in applicative data structures and eliminating the use of instance state)

 

I hope this helps,

Roy.

Offline
Last seen: 5 years 6 months ago
Joined: 10/29/2015
Posts: 12

Dear Roy,

thanks for your comment. However, I still don't quite understand the logic. When my DataReader loses connectivity, then it injects an invalid sample representing the InstanceState change to NOT_ALIVE_NO_WRITERS, even if the application has previously taken and processed all samples.

When the DataReader regains connectivity to any DataWriter, it already detects this, presumably because it receives a heartbeat from the DataWriter. The way I see it, the DataReader should now show perfectly symmetric behavior, and inject another invalid sample representing the InstanceState change back to ALIVE. But this does not happen.

Of course it is possible that while the reader was disconnected, the DataWriter actually unregistered some of its instances. But I am on a Reliable topic with unlimited writer history, so surely I would still receive a sample in this case when connectivity returns.

In summary, I am still not sure why the DataReader gives me an invalid sample going from ALIVE to NOT_ALIVE_NO_WRITERS, but doesn't do the same going the other way. I understand that you don't want to explicitly send a sample twice, but I don't see how that is actually necessary.

As I see it, the only workaround currently would be to build my own data structure on the DataReader side which tracks the known DataWriter IDs for every instance, using a DataReaderListener to (hopefully?) catch the publication handles appearing and disappearing, and that way track its own version of an InstanceState for every instance. Needless to say this would require quite a bit of effort and I'm not sure if the API even provides all the means to allow me to do that reliably. For instance, the LivelinessChangedStatus only gives me the last publication handle that changed, so if more than one publication changes state at a time, I already lose information.