What are "heartbeats" and how are they used in RTI Connext 4.x and above?
Note: Applies to RTI Connext 4.x and above. There is a separate FAQ for 3.x.
RTI Connext supports various transports to deliver data samples. Not all of these transports guarantee that all samples are delivered. For example, when using UDP, samples can be lost or can arrive out of order. The OMG wire-protocol specification defines a reliability protocol independent of the underlying transport. Heartbeats play a key role in the reliability model: a reliable DataWriter sends data samples and heartbeats to reliable DataReaders. A DataReader responds to a heartbeat by sending an ACKNACK, which tells the DataWriter what the DataReader has received so far. If a sample is missing, the DataWriter can resend the sample to the reliable DataReader.
Heartbeats (HBs) can be sent periodically or piggy-backed to a data sample. Every ( max_samples
/ DataWriterQos.protocol.rtps_reliable_writer.heartbeats_per_max_samples
) samples, RTI Connext will append a HB to the sample to request that it acknowledge samples. This is what is usually used to allow the publication to commit issues as they are received.
Periodic heartbeats are used when data is sent infrequently. The DataWriterProtocol QosPolicy defines how fast periodic heartbeats are sent. There are two types of heartbeat periods: heartbeat_period
and fast_heartbeat_period
. Until a predefined high_watermark
in the send queue is reached, heartbeats are sent every heartbeat_period
. Once the high_watermark
is reached, the middleware switches over to the fast_heartbeat_period
and keeps using this until a low_watermark
is reached.
There is one more important heartbeat related configuration to be aware of: max_heartbeat_retries. A reliable writer will send a heartbeat to every reliable reader which has unacknowledged samples. What if the reader is hanging or temporarily occupied? This is especially important when using strict reliability (that is, "reliable" Reliability QoS + "keep all" History QoS). In that case, the writer will block when the send queue is full of unacknowledged samples. By using max_heartbeat_retries, you can avoid having a slow reader bring the entire system to its knees. If a DataReader does not respond within max_heartbeat_retries number of heartbeats, it will be dropped by the DataWriter (and the reliable DataWriter’s Listener will be called with a RELIABLE_READER_ACTIVITY_CHANGED status).
It is also important to know that when the DataReader receives a heartbeat from a DataWriter (indicating (a) that the DataWriter still exists on the network and (b) what sequence numbers it has published), the following parameters indicate how long it will wait before replying with a positive (assuming they aren't disabled) or negative acknowledgement: rtps_reliable_reader.min_heartbeat_response_delay
and rtps_reliable_reader.max_heartbeat_response_delay
.
The time the reader waits will be a random duration in between the minimum and maximum values. Narrowing this range, and shifting it towards zero, will make the system more reactive. However, it will increase the chance of (N)ACK spikes. The higher the number of readers on the topic, and the greater the load on your network, the more you should consider specifying a range here.
For further information please see the Reliability chapter in the Core Libraries and Utilities User's Manual.