What settings affect DomainParticipant’s liveliness?
Participants are able to know when other participants in the same domain are reachable or not (alive or dead). This is called Participant’s Liveliness. Discovery_Config QosPolicy has three properties that define this Participant’s Liveliness behavior:
participant_liveliness_assert_period is the period of time at which the local participant sends out packets asserting that it is alive (these are DATA(p) packets in Wireshark).
For example, if you set participant_liveliness_assert_period equal to 5 seconds, the local participant will declare itself alive at most every 5 seconds.
participant_liveliness_lease_duration is the time period after which remote participants can consider this local participant dead. If remote participants don't receive any messages from the local participant asserting that the local participant is alive, the remote participants will declare the local participant dead. This value should be greater than the participant_liveliness_assert_period.
For example, if the local participant has participant_liveliness_lease_duration equal to 30 seconds, remote participants may declare this local participant as dead if the remote participants don't receive any messages from the local participant asserting liveliness within a 30 second time period.
max_liveliness_loss_detection_period determines the rate at which the local participant checks if any remote participant should be declared dead or alive.
If this timeframe is lowered, the local participant more frequently checks if remote participants should be declared dead or alive. It will also increase the CPU usage of the application, as the liveliness of remote entities will be examined more frequently.
participant_liveliness_assert_period and max_liveliness_loss_detection_period are applied locally. But participant_liveliness_lease_duration is a property that by definition must be applied at remote participants. Thus, participant_liveliness_lease_duration must be exchanged in order to be applied remotely. This is communicated during the discovery phase.
Example
Consider the following configuration for two DomainParticipants (PartA and PartB):
participant_liveliness_assert_period = 15 seconds
participant_liveliness_lease_duration = 45 seconds
max_liveliness_loss_detection_period = 5 seconds
Figure 1 (above): Participant B receives a liveliness message from PartA
These figures are from the perspective of Participant B. In figure 1, Participant A would normally declare itself alive every 15 seconds by sending a message to Participant B.
Participant B received the first liveliness message designated in blue.
However in this example all future liveliness messages sent by Participant A were not received by Participant B. This could happen for many reasons like a network failure, because the application was shut down, or the application crashed.
In figure 1 above, the dashed gray lines show where Participant A's liveliness messages would normally have been received.
Figure 2 (above): Shows the times where Participant B checks for liveliness of Participant A
Figure 2's green lines show where Participant B checks for the liveliness of Participant A. How frequently Participant B checks is based on the max_liveliness_loss_deteciton_period (in this example it is 5 seconds).
The timing of these liveliness loss checks is not coordinated or aligned with when Participant A sends its liveliness messages. We will explain why that can matter in later figures.
Figure 3 (above): Ideal case where Participant B will quickly determine Participant A is dead
Figure 3 now shows the participant_liveliness_lease_duration time in red. This 45 second window is calculated from the last liveliness message Participant B received from Participant A.
Participant B expects to receive a liveliness message from Participant A at least every 45 seconds, which didn't happen in this example.
However Participant B doesn't instantly declare Participant A dead after 45 seconds from the last liveliness message Participant B received. Participant B will only change the liveliness of Participant A after the next time Participant B checks for liveliness, designated in green.
In this example the rightmost green line [1] showing where Participant B checks for liveliness is shortly after the red line designating the participant_liveliness_lease_duration of Participant A has expired, which is an ideal case to quickly detect a change in liveliness.
The result is the elapsed time between when Participant B received the last liveliness message from Participant A, and when Participant B detects Participant A's loss of liveliness is roughly 45.1 seconds shown in purple.
Figure 4 (above): Worst case liveliness lost detection timing
In Figure 4 the location of when Participant B detects liveliness changes shown in green has shifted (but the liveliness checks continue to occur every 5 seconds per max_liveliness_loss_detection_period).
The participant_liveliness_lease_duration expired shortly after the second to last time Participant B checked for liveliness [2].
This resulted in Participant B not detecting that Participant A has lost liveliness [3] until roughly 49.9 seconds after the last liveliness message that was received from Participant A.
Note that once participant_liveliness_lease_duration expires, Participant B will never take longer than an additional max_liveliness_loss_detection_period to realize Participant A is dead.