How can I react to a participant going stale?

8 posts / 0 new
Last post
Offline
Last seen: 10 years 4 months ago
Joined: 12/08/2011
Posts: 10
How can I react to a participant going stale?

I am trying to find a way how my application can react to a participant going stale without using my own timers.
In the RTI Core Libraries and Utilities Manual, section 14.3.1, I found the following:

Once a remote participant has been added to the Connext database, Connext keeps track of that
remote participant’s participant_liveliness_lease_duration. If a participant DATA for that partic-
ipant (identified by the GUID) is not received at least once within the
participant_liveliness_lease_duration, the remote participant is considered stale, and the
remote participant, together with all its entities, will be removed from the database of the local
participant.

Looking up the participant_liveliness_lease_duration and the participant_liveliness_assert_period, I found that they have defaults of 100 seconds and 30 seconds, respectively, leading me to believe that if a participant doesn't assert it's liveliness at least once within 100 seconds, others will/can/should mark the participant as stale. (However, the participant also should assert its liveliness automatically every 30 seconds.)
Unfortunately, I am confused about which user detectable events these numbers actually affect. According to Figure 14.5 of the RTI Core Libraries and Utilities Manual, these numbers relate to the sending/receiving/and reacting to participant X Data, which I interpret to be the information communicated through the built-in topic DCPSParticipant, i.e. DDS_ParticipantBuiltinTopicData.

As the default of DDS_RemoteParticipantPurgeKind is DDS_LIVELINESS_BASED_REMOTE_PARTICIPANT_PURGE, I expted the participant to do "the right thing" (which is automatically marking a participand as stale and purgin it if it doesn't transmit) and wanted to react to that.
I tried to react to a DDS_LIVELINESS_CHANGED_STATUS directly at the participant level, but I couldn't get my listener to trigger. This is how I set up the listener:

participant  = DDSDomainParticipantFactory::get_instance()->create_participant(
                         domainId             // Domain ID
                        ,participant_qos      // QoS
                        ,participant_listener // Listener
                        ,DDS_LIVELINESS_LOST_STATUS    // Listener Mask
                        |DDS_LIVELINESS_CHANGED_STATUS // Listener Mask
                        );

The  participant_listener is constructed as an instance of the following class:

struct Participant_Listener : public DDSDomainParticipantListener {
    virtual void on_liveliness_changed( DDSDataReader* reader, DDS_LivelinessChangedStatus const & status)
    {  
      qDebug() 
        << ": DataReader liveliness changed for topic" << reader->get_topicdescription()->get_name() << ".\n"
        << "    alive count        :" << status.alive_count << "\n"
        << "not alive count        :" << status.not_alive_count << "\n"
        << "    alive count change :" << status.alive_count_change << "\n"
        << "not alive count change :" << status.not_alive_count_change << "\n";
    };
    
    virtual void on_liveliness_lost( DDSDataWriter *writer,  DDS_LivelinessLostStatus const & status)
    {
      qDebug() 
        << ": DataWriter liveliness lost for topic" <<writer->get_topic()->get_name() << ".\n"
        << "total count        :" << status.total_count << "\n"
        << "total count change :" << status.total_count_change;
    };
};

The only reaction I got out of this listener is related to  DDS_LIVELINESS_CHANGED_STATUS events of my other DataWriters which transport my data via non-built in topics as I did not implement listeners on them which react to that status and the participant listener acts as a last resort catch-all in accordance with the documentation. But shouldn't this also be the case for the liveliness changed events of the built-in DataWriters and DataReaders attached to the DCPSParticipant topic?

Looking up the liveliness of DataReaders just increased my confusion even more: the LIVELINESS QoS for DataReaders seems to have a default lease_duration of infinity. And indeed, that seems to be the case for the built-in readers as well, checked as follows:

DDS_DataReaderQos defReaderQoS;
participant->get_builtin_subscriber()->lookup_datareader(DDS_PARTICIPANT_TOPIC_NAME)->get_qos(defReaderQoS);
bool infLease = defReaderQoS.liveliness.lease_duration.is_infinite();

I aplogize if I am making some hoorible newbie mistakes here, but I couldn't find any example that I could have studied. Does anybody know of a "follow-up" example to http://community.rti.com/content/forum-topic/detect-presence-domainparticipants-datawriters-and-datareaders-dds-domain that deals with the detection of stale participants in user code?

What I was hoping to find (or create) was a listener style callback in the form of  virtual void participant_purged(DDS_ParticipantBuiltinTopicData&); but so far I haven't been able to reproduce anything like it...

Thanks for any hints, examples, links or other source of follow up information that could help me getting this done.

Claus

Offline
Last seen: 10 years 4 months ago
Joined: 12/08/2011
Posts: 10

While further looking into this, I am curious about any thoughts on how I could a posteriri determine which participant went stale, more precisly how I could get the GUID of that participant.I am currently using the GUID of a participant in my user code to uniquly identify participants and use the methods mentioned in this post to get to the GUID data. However, I haven't found any way to get the GUID of a participant that went stale, all I could find so far was data containing the instance handle, mainly via the get_dicovered_participants() method -- which in trun would require that I keep an old version and compare that to a current one to determine the stale participant as the one not present in both.

However, all that is based upon my (potentially wrong) understanding of how to detect a stale participant in the first place--which, as stated above, is also not working...

Any tips are greatly appreciated.


Claus

Gerardo Pardo's picture
Offline
Last seen: 3 weeks 6 days ago
Joined: 06/02/2010
Posts: 602

Hi Claus,

In order to detect remote participants going stale (loosing their liveliness) or gracefully terminating you need to install al listener or monitor the data received by the ParticipantBuiltinTopicDataDataReader.

The FileExchange contains a Java an example called MonitorDiscoveryInformation.java showing how to do this.

Gerardo

Offline
Last seen: 2 years 2 months ago
Joined: 07/29/2022
Posts: 6

I have the same problem. Did you solve this problem?  I'm very grateful to get an solution from you.

Howard's picture
Offline
Last seen: 22 hours 48 min ago
Joined: 11/29/2012
Posts: 622

Have you looked into what Gerardo wrote in his post on 4/12/2013? 

Offline
Last seen: 2 years 2 months ago
Joined: 07/29/2022
Posts: 6

what Gerardo wrote is getting builtin ParticipantBuiltinTopicData DataReader? I've tried to that way. my test code:

code 

-------------------

class BuiltinParticipantListener
: public dds::sub::DataReaderListener<dds::topic::ParticipantBuiltinTopicData> {
public:
void on_liveliness_changed(dds::sub::DataReader<dds::topic::ParticipantBuiltinTopicData>& reader,
const dds::core::status::LivelinessChangedStatus& status) override {
std::cout << "------dds on_liveliness_changed " << cn << std::endl;
}

void on_data_available( dds::sub::DataReader<dds::topic::ParticipantBuiltinTopicData> &reader) override {
dds::sub::LoanedSamples<dds::topic::ParticipantBuiltinTopicData>
samples = reader.select().state(dds::sub::status::DataState::new_instance()).take();
for (const auto &sample : samples) {
if (sample.info().valid()) {
std::cout << "Built-in Reader: found participant" << std::endl;
}
}
}
};

auto& dis_config = participant_qos.policy<rti::core::policy::DiscoveryConfig>();

dis_config.participant_liveliness_lease_duration(dds::core::Duration(15, 10));
dis_config.participant_liveliness_assert_period( dds::core::Duration(1, 0));
dis_config.max_liveliness_loss_detection_period( dds::core::Duration(5, 0));

auto my_participant_listener = std::make_shared<MyParticipantListener>();
dds::domain::DomainParticipant participant(domain_id, participant_qos, my_participant_listener);

dds::sub::Subscriber builtin_subscriber =
dds::sub::builtin_subscriber(participant);

auto participant_listener = std::make_shared<BuiltinParticipantListener>();

std::vector<dds::sub::DataReader<dds::topic::ParticipantBuiltinTopicData>>
participant_reader;
dds::sub::find<
dds::sub::DataReader<dds::topic::ParticipantBuiltinTopicData>>(
builtin_subscriber,
dds::topic::participant_topic_name(),
std::back_inserter(participant_reader));

participant_reader[0].set_listener(participant_listener, dds::core::status::StatusMask::all());

----------------------

the above test can detect the new participant join, in the callback on_data_available, the callback can tell info of the new participant. when the participant leaved for the violation of liveliness_lease_duration, the on_liveliness_changed/on_liveliness_lost wouldn't be called, the on_data_available may be called, but can't tell the which participant leaving.

so my goal is that any time a participant join/leave, DDS can notify the info of participant join/leave.  

A not so good but workable way is like hcc23 mentioned. keep the all joining participants in  on_data_available, and query the discovered participant by dds::domain::discovered_participants() in on_data_available().

is there better way?

thanks!

 
Howard's picture
Offline
Last seen: 22 hours 48 min ago
Joined: 11/29/2012
Posts: 622

When a remote participant is "deleted", i.e., leaving, the on_data_available() call for the DCPSParticipant DataReader will be called.

In that case, the value of

sample.info().valid()

will be false.

sample.info().state().instance_state()

will reflect that the instance (i.e., remote participant) is no longer alive

The value of

sample.info().instance_handle()

will indicate which Remote Participant the callback is about.  You may want to cache a map of instance_handle() to information about the remote participant that is useful to your own application.

if (sample.info().state().instance_state() != dds::sub::status::InstanceState::alive()) {
    

   // sample.info().instance_handle() identifies the Remote Participant that is no longer alive    

}

Offline
Last seen: 2 years 2 months ago
Joined: 07/29/2022
Posts: 6

this solution work for me.

my fault was in selection state filter. the state filter should be any() instead of new_instance():

 reader.select().state(dds::sub::status::DataState::any()).take();

 

Thanks.