Reconnection with invalid data

4 posts / 0 new
Last post
Offline
Last seen: 2 years 6 months ago
Joined: 01/15/2018
Posts: 8
Reconnection with invalid data

Hello,

 

I am running into a problem and I have not been able to figure out what went wrong. I have a setup with 2 apps where each app sends/receives data from another.

It works as expected until one of the apps stops/shuts the domain connection and recreates a new one. However, at some point, a message comes as invalid where another message is received just fine by the second app.

I am attaching the logs show the sequence of things that happens after the reconnection. Each message type has its own writer and reader

App1 disconnects and reconnects

App2 fires off LIVELINESS_CHANGED events after the reconnection

App1 sends MESSAGE_1 of type 1

App2 receives the message

App1 sends MESSAGE_2 of type 2

App2 receives it as invalid where the instance_state = NOT_ALIVE_NO_WRITER

 

Log sequence for App2 after reconnection

03 Aug 2021 16:25:41,217 DEBUG [Thread-3] (dds:97) - >>>>>>> LIVELINESS_CHANGED aliveCount 1, aliveCountChanged: 1, noteAlive: 0, notAliveCountChanged: 0
03 Aug 2021 16:25:41,222 DEBUG [Thread-3] (dds:180) - >>>>>>> SUBSCRIPTION_MATCHED currentCount 1, currentCountChanged: 1, totalCount: 2, totalCountChanged: 1
03 Aug 2021 16:25:41,222 DEBUG [Thread-3] (dds:97) - >>>>>>> LIVELINESS_CHANGED aliveCount 1, aliveCountChanged: 1, noteAlive: 0, notAliveCountChanged: 0
03 Aug 2021 16:25:42,165 DEBUG [pool-2-thread-1] (dds:444) - <<<<<<< Publishing: InstanceHandle_t[-74,-76,106,-121,-36,63,-36,-4,92,-112,117,88,123,92,98,-117], MESSAGE_XXX, :

03 Aug 2021 16:25:55,923 DEBUG [Thread-4] (dds:121) - >>>>>>> Receiving number of data: 1 SOME_MESSAGE
03 Aug 2021 16:25:55,924 DEBUG [Thread-4] (dds:132) - >>>>>>> Receiving sample info SampleInfo [sample_state=2, view_state=1, instance_state=1, source_timestamp=Time_t[sec=1628033155, nanosec=922988117], instance_handle=InstanceHandle_t[17,-86,-128,-86,-122,-97,-34,93,-101,32,19,-50,-12,85,-98,-41], publication_handle=InstanceHandle_t[1,1,-38,-45,30,66,4,66,-33,0,1,70,-128,0,-128,2], disposed_generation_count=0, no_writers_generation_count=0, sample_rank=0, generation_rank=0, absolute_generation_rank=0, valid_data=true, reception_timestamp=Time_t[sec=1628033155, nanosec=922988117], publication_sequence_number=: 

03 Aug 2021 16:26:09,183 DEBUG [Thread-5] (dds:121) - >>>>>>> Receiving number of data: 1 MESSAGE_1
03 Aug 2021 16:26:09,183 DEBUG [Thread-5] (dds:132) - >>>>>>> Receiving sample info SampleInfo [sample_state=2, view_state=1, instance_state=1, source_timestamp=Time_t[sec=1628033169, nanosec=182997644], instance_handle=InstanceHandle_t[-119,49,74,-68,-86,-16,65,109,-15,-76,-124,80,-101,-64,24,96], publication_handle=InstanceHandle_t[1,1,-38,-45,30,66,4,66,-33,0,1,70,-128,0,-74,2], disposed_generation_count=0, no_writers_generation_count=0, sample_rank=0, generation_rank=0, absolute_generation_rank=0, valid_data=true, reception_timestamp=Time_t[sec=1628033169, nanosec=182997644], publication_sequence_number=: 
high: 0
low: 1
, reception_sequence_number=:
high: 0
low: 3
, publication_virtual_guid=:
value: 1, 1, -38, -45, 30, 66, 4, 66, -33, 0, 1, 70, -128, 0, -74, 2,
, publication_virtual_sequence_number=:
high: 0
low: 1
, original_publication_virtual_guid=:
value: 1, 1, -38, -45, 30, 66, 4, 66, -33, 0, 1, 70, -128, 0, -74, 2,
, original_publication_virtual_sequence_number=:
high: 0
low: 1
, related_original_publication_virtual_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
, related_original_publication_virtual_sequence_number=:
high: -1
low: 4294967295
, flag=0, source_guid=:
value: 1, 1, -38, -45, 30, 66, 4, 66, -33, 0, 1, 70, -128, 0, -74, 2,
, related_source_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
, related_subscription_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
, topic_query_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
], data :

03 Aug 2021 16:26:10,921 DEBUG [Thread-5] (dds:121) - >>>>>>> Receiving number of data: 1 SOME_MESSAGE
03 Aug 2021 16:26:10,921 DEBUG [Thread-5] (dds:132) - >>>>>>> Receiving sample info SampleInfo [sample_state=2, view_state=1, instance_state=1, source_timestamp=Time_t[sec=1628033170, nanosec=919988156], instance_handle=InstanceHandle_t[117,-28,-108,58,81,65,-47,1,108,119,-45,-80,39,110,-14,97], publication_handle=InstanceHandle_t[1,1,-38,-45,30,66,4,66,-33,0,1,70,-128,0,-128,2], disposed_generation_count=0, no_writers_generation_count=0, sample_rank=0, generation_rank=0, absolute_generation_rank=0, valid_data=true, reception_timestamp=Time_t[sec=1628033170, nanosec=920988143], publication_sequence_number=: 
high: 0
low: 2

03 Aug 2021 16:26:12,379 DEBUG [Thread-5] (dds:121) - >>>>>>> Receiving number of data: 1 MESSAGE_2 <============ This is the message received from App1 but no writers.
03 Aug 2021 16:26:12,379 DEBUG [Thread-5] (dds:132) - >>>>>>> Receiving sample info SampleInfo [sample_state=2, view_state=2, instance_state=4, source_timestamp=Time_t[sec=-1, nanosec=0], instance_handle=InstanceHandle_t[-41,-51,75,-106,-60,-13,-16,62,-29,-86,121,102,94,2,-5,-81], publication_handle=InstanceHandle_t[1,1,-97,77,-59,97,-66,-48,23,50,-2,100,-128,0,-102,2], disposed_generation_count=0, no_writers_generation_count=0, sample_rank=0, generation_rank=0, absolute_generation_rank=0, valid_data=false, reception_timestamp=Time_t[sec=1628033058, nanosec=968987525], publication_sequence_number=:
high: -1
low: 4294967295
, reception_sequence_number=:
high: 0
low: 11
, publication_virtual_guid=:
value: 1, 1, -97, 77, -59, 97, -66, -48, 23, 50, -2, 100, -128, 0, -102, 2,
, publication_virtual_sequence_number=:
high: -1
low: 4294967295
, original_publication_virtual_guid=:
value: 1, 1, -97, 77, -59, 97, -66, -48, 23, 50, -2, 100, -128, 0, -102, 2,
, original_publication_virtual_sequence_number=:
high: -1
low: 4294967295
, related_original_publication_virtual_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
, related_original_publication_virtual_sequence_number=:
high: -1
low: 4294967295
, flag=0, source_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
, related_source_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
, related_subscription_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
, topic_query_guid=:
value: 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
], data :

 

Log from App1 after reconnection

03 Aug 2021 16:25:45,908 DEBUG [Thread-96] (dds:180) - >>>>>>> SUBSCRIPTION_MATCHED currentCount 1, currentCountChanged: 1, totalCount: 1, totalCountChanged: 1
03 Aug 2021 16:25:45,908 DEBUG [Thread-96] (dds:97) - >>>>>>> LIVELINESS_CHANGED aliveCount 1, aliveCountChanged: 1, noteAlive: 0, notAliveCountChanged: 0
03 Aug 2021 16:25:45,912 DEBUG [Thread-96] (dds:180) - >>>>>>> SUBSCRIPTION_MATCHED currentCount 1, currentCountChanged: 1, totalCount: 1, totalCountChanged: 1
03 Aug 2021 16:25:45,914 DEBUG [Thread-96] (dds:97) - >>>>>>> LIVELINESS_CHANGED aliveCount 1, aliveCountChanged: 1, noteAlive: 0, notAliveCountChanged: 0

03 Aug 2021 16:25:47,165 DEBUG [Thread-98] (dds:121) - >>>>>>> Receiving number of data: 1
03 Aug 2021 16:25:47,166 DEBUG [Thread-98] (dds:132) - >>>>>>> Receiving sample info SampleInfo [sample_state=2, view_state=1, instance_state=1, source_timestamp=Time_t[sec=1628033147, nanosec=164997876], instance_handle=InstanceHandle_t[-3,-101,44,4,-17,-49,2,-40,85,-15,6,120,23,-28,54,-21], publication_handle=InstanceHandle_t[1,1,-121,-83,-4,-118,103,29,24,110,38,-56,-128,0,-70,2], disposed_generation_count=0, no_writers_generation_count=0, sample_rank=0, generation_rank=0, absolute_generation_rank=0, valid_data=true, reception_timestamp=Time_t[sec=1628033147, nanosec=164997876], publication_sequence_number=:
high: 0
low: 21

03 Aug 2021 16:26:09,182 DEBUG [AWT-EventQueue-0] (dds:444) - <<<<<<< Publishing: InstanceHandle_t[-119,49,74,-68,-86,-16,65,109,-15,-76,-124,80,-101,-64,24,96] MESSAGE_1 <=== this message is received above

03 Aug 2021 16:26:12,377 DEBUG [AWT-EventQueue-0] (dds:444) - <<<<<<< Publishing: InstanceHandle_t[-80,86,-64,40,-50,42,63,-38,-41,96,-19,66,55,35,73,0] MESSAGE_2 <====== this message is not received (but App2 says it is invalid and the instance_handle don't match

 

So I don't understand why some are received as valid and some are received as invalid. Any suggestion or pointer where I can debug it or yield some info that can help with the debug would be appreciated.

Please let me know there are any additional questions I can answer. 

 

Thank you so much!

Tyler

Howard's picture
Offline
Last seen: 4 days 3 hours ago
Joined: 11/29/2012
Posts: 567

When a DataReader receives data in which the corresponding SampleInfo.valid_data flag is FALSE...it is usually because DDS is telling you about the state of the Topic (or for keyed Topics, the state of an instance of the Topic.

So, when the valid_data flag is true, that means that the data sample provided is actually data sent by a DataWriter.

When the valid_data flag is false, you have to use the other information provided in the SampleInfo structure to determine why.   For example, when a DomainParticipant shutdown, all DataWriters (and other Entities) are deleted.  This actually sends packets on the network that tells other apps that this has happened.  For DataReaders in the other apps, they get a pseudo message that fundamentally says that there are no more writers for that Topic (or instance).

Please see the Connext User Manual for more information.  You can start in with the API manual here:

https://community.rti.com/static/documentation/connext-dds/6.1.0/doc/api/connext_dds/api_cpp/structDDS__SampleInfo.html#aa0c5177864ed1aeeb85544fee45904da

You are NOT getting an invalid message as a result of sending data.  You are getting an message with valid_data = false as a result of the deletion of the DataWriter (as a consequence of shutting down the participant).

 

Offline
Last seen: 2 years 6 months ago
Joined: 01/15/2018
Posts: 8

Thanks Howard for the response. I appreciate the link above. I will go through it again to see if there's anything else I may have missed.5. 5

I understand what you're saying there. I see a couple of things I am trying to figure why.

1. After reconnection, I assume all the subscriptions are matched up correctly, ie old reader from the App2 and new writer from App1. Is this not correct? if not, is there any other way for me to figure out from the subscription matched callback what topic it is?

2. I made an assumption that all the log messages for all the subscription matched including the one that I got an invalid data for. I may be wrong that this particular topic has not been matched up. which is weird why it didn't.

3. Are you saying that the writer/reader for this particicular topic are not matched so the reader in App2 is not receiving from the new writer from App1? 

4. For your last point, I agree that the older writer has already be shutdown. So I am now using the new writer to write, why wouldn't the message received? Again go back to my assumption above.

5. Unless on disconnect/shutdown no transfer for deleted writers message being sent to the readers, it may be holding on to the old writer. Is there any other way for me to force the clean up on the reader side?

 

Thanks again Howard,

 

Tyler

Howard's picture
Offline
Last seen: 4 days 3 hours ago
Joined: 11/29/2012
Posts: 567

1. After reconnection, I assume all the subscriptions are matched up correctly, ie old reader from the App2 and new writer from App1. Is this not correct? if not, is there any other way for me to figure out from the subscription matched callback what topic it is?

Yes, when you restart App1, App2 should discover the new instance of App1 and vice versa.  I'm not sure what you mean by "what topic it is".  If a DataReader matches a DataWriter and the subscription_matched callback is called, the Topic is the Topic of the DataReader (and thus DataWriter).  DataReaders cannot match DataWriters of different Topics.

2. I made an assumption that all the log messages for all the subscription matched including the one that I got an invalid data for. I may be wrong that this particular topic has not been matched up. which is weird why it didn't.

Sorry, I don't understand this statement.

3. Are you saying that the writer/reader for this particicular topic are not matched so the reader in App2 is not receiving from the new writer from App1? 

No, after App2 discovers App1 and vice versa, App2's DataReader should receive data from App1.

Of course, if App1 is sending data before the discovery process is complete, it is possible that the DataReader in App2 does NOT receive the data sent by App1.

Please see this article

https://community.rti.com/kb/why-does-my-dds-datareader-miss-first-few-samples

4. For your last point, I agree that the older writer has already be shutdown. So I am now using the new writer to write, why wouldn't the message received? Again go back to my assumption above.

Please see the article above. 

In your testing is your new DataWriter sending data constantly, or does it just sent a single message and stop?

If it's sending data periodically, then I would expect your DataReader to eventually receive the data (as soon as Discovery completes...which in a simple system should take less than 1 sec).

5. Unless on disconnect/shutdown no transfer for deleted writers message being sent to the readers, it may be holding on to the old writer. Is there any other way for me to force the clean up on the reader side?

 There is no need to force cleanup.  The DataReader will receive data from all DataWriters (assuming you're not using the OWNERSHIP QoS Policy).  So even if you have 2 instances of App1 running, it will receive data from both DataWriters.

Generally, the logs that you are printing are not very useful in trying to understand the behavior.  Too much unnecessary info.