Error when subscriber process goes offline

2 posts / 0 new
Last post
Offline
Last seen: 3 days 10 hours ago
Joined: 10/30/2020
Posts: 2
Error when subscriber process goes offline

Hello,

I have simple test case of a reliable writer sending to a reliable reader. when i close the reader, about 100 sec later the writer spits out:

ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x000001C1|SET DR LEASE DURATION|LC:Discovery] data_available_forward:!Precondition not met error: Invalid data
ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x000001C1|SET DR LEASE DURATION|LC:Discovery] data_available_forward:!Precondition not met error: Invalid data
ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x00000000|REMOVE MATCHING ENDPOINT 0x00000000|REMOVE REMOTE ENDPOINTS|:0x000001C1{Domain=0}|REMOVE REMOTE ENDPOINTS] publication_matched_forward:!Invalid argument error:
ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x000001C1|SET DR LEASE DURATION|LC:Discovery] data_available_forward:!Precondition not met error: Invalid data
ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x000001C1|SET DR LEASE DURATION|LC:Discovery] data_available_forward:!Precondition not met error: Invalid data
get matched subscription data
ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x00000000|REMOVE MATCHING ENDPOINT 0x00000000|REMOVE REMOTE ENDPOINTS|:0x000001C1{Domain=0}|REMOVE REMOTE ENDPOINTS] publication_matched_forward:!Invalid argument error:
ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x00000000|REMOVE MATCHING ENDPOINT 0x00000000|REMOVE REMOTE ENDPOINTS|:0x000001C1{Domain=0}|REMOVE REMOTE ENDPOINTS] publication_matched_forward:!Invalid argument error:
ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x000001C1|SET DR LEASE DURATION|LC:Discovery] data_available_forward:!Precondition not met error: Invalid data
ERROR [0x0101000C,0x664E0AB1,0x7085D5A2:0x000001C1|SET DR LEASE DURATION|LC:Discovery] data_available_forward:!Precondition not met error: Invalid data
get matched subscription data
get matched subscription data

where is this coming from? is there a way to catch and detect that the reader went away? is there a way to detect it in a shorter time?

I tried

participant_liveliness_assert_period(dds::core::Duration(0,0.5e9));
participant_liveliness_lease_duration(dds::core::Duration(1,0));

on both reader and writer, but it doesn't seem to make a difference.

thoughts?

Howard's picture
Offline
Last seen: 14 hours 7 min ago
Joined: 11/29/2012
Posts: 658

Well, generally you shouldn't be seeing any error messages from Connext DDS just because a remote application terminates or crashes.  We don't have any reports of seeing any error messages from Connext in this context from anyone.

What version of Connext DDS are you using?

Is is possible that your own application is somehow trying to access information about the remote application that has terminated...e.g., information about a remote DataReader?

I see a printout of

"get matched subscription data"

which is NOT something that Connext prints.  There is an DDS API called "get_matched_subscription_data".   Could it be that some code in that application is calling that function on a handle that is no longer valid because the DataReader (aka subscription) no longer exists?

 

As for this, yes, you can configure Connext DDS to detect when the DataReader's application has crashed in a shorter time (paying the cost of additional network and cpu resources needed).

I don't know exactly what you did here, but what you need to do is set those properties on the DomainParticipant via its Qos on creation.  You can't change those values after the participant has been created.

participant_liveliness_assert_period(dds::core::Duration(0,0.5e9));
participant_liveliness_lease_duration(dds::core::Duration(1,0));

However, if you set the lease duration to 1 second and the assert period to 0.5 seconds, it's possible that Connext will falsely determine that a remote participant is dead if a single "I'm alive" packet (which is sent at the assert period) is dropped or otherwise lost by the network.  So, with this aggressive level of detection, your system may be falsely triggered to think that a remote datareader is killed/dead when it isn't.

If you need to detect when a remote datareader no longer exists within 1 second, and thus are setting the participant_liveliness_lease_duration to 1.0 seconds, you should set the assert period to 0.3 seconds...this means that the remote application will send 3 "I'm alive" messages within a 1 second period and all 3 have to be lost before DDS determines that the other application is dead.

ONE OTHER thing...depending on your exact use case...what failure does the application actually need to detect and why and how it should respond...setting the participant liveliness may or may not be the best solution.

I suggest describing your use case to "https://chatbot.rti.com" and asking it for suggestions on different ways to implement your use case.

AND Finally, if you are working on a project that has RTI developer licenses and thus a support contract, questions like what you posted are best addressed by sending them directly to RTI's support team at "support@rti.com" or via your customer login portal.