7.3. Metrics

This section details the metrics you can collect from Connext observable resources. Each metric has a unique name and specifies a general feature of a Connext observable resource. For example, a DataWriter is an observable resource; the metric dds_data_writer_protocol_sent_heartbeats_total specifies the total number of heartbeats sent by a DataWriter. There are two metric types:

Counters. A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart.
Gauges. A gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

Observability Framework uses a Prometheus time-series database to store collected metrics. A time series is an instantiation of a metric and represents a stream of timestamped values (measurements) belonging to the same resource as the metric. For example, we could have a time series for the metric dds_data_writer_protocol_sent_heartbeats_total corresponding to a DataWriter DW1 identified by a resource GUID GUID1.

Labels (in Prometheus) or attributes (in Open Telemetry) identify each metric instantiation or time series. A label is a key/value pair that is associated with a metric. Any given combination of labels for the same metric name identifies a specific instantiation of that metric. For example, the metric dds_data_writer_protocol_sent_heartbeats_total for the DataWriter DW1 will have the label {guid= GUID1}. All metrics have at least one label called guid that uniquely identifies a resource in a Connext system.

In Observability Framework there is a special kind of metric called a presence metric. Presence metrics are used to indicate the existence of a resource in a Connext system. For example, the dds_domain_participant_presence indicates the presence of a DomainParticipant in a Connext system. There will be a time series for each DomainParticipant ever created in the system. The labels associated with a presence metric describe the resource, and they are dependent on the type of resource. For example, a DomainParticipant resource has labels such as `domain_id` and `name`.

For metrics that are not presence metrics, the only label is the guid label identifying the resource to which the metrics apply. You can use the guid label to query the description labels of a resource by looking at the presence metric for the resource class.

Observability Framework provides the ability to create an initial configuration for the collection and forwarding of metrics on each observable resource, as well as the ability to dynamically change this configuration at run time. The initial configuration for the collection of metrics is set in the Monitoring Library 2.0, as explained in Monitoring Library 2.0. Dynamic metric collection configuration changes are done using the REST API as detailed in Collector Service REST API Reference. For an example of how to dynamically change the metric collection configuration using the Observability Dashboards see Change the Metric Configuration.

7.3.1. Metric Pattern Definitions

Observability Framework enables you to select the set of metrics collected and forwarded for a resource both before and during run time. To select metrics, you use metric selector strings. When specifying metric selector strings, POSIX® fnmatch pattern matching should be used as described in Table 7.2. The most common use case is an asterisk (*) to match 0 or more non-special characters. Some example metric selectors using POSIX® fnmatch are shown below.

Table 7.6 POSIX® fnmatch Metric Selector Examples
Metric Selector	Description
dds_application_process_memory_usage_resident_memory_bytes	refers to the metric “dds_application_process_memory_usage_resident_memory_bytes”
dds_application_process_*	refers to all metrics that begin with “dds_application_process_”
dds_*_bytes	refers to metrics that start with “dds_” and end with “_bytes”

7.3.2. Application Metrics

The following tables describe the metrics and labels generated for Connext applications. Only the dds_application_presence metric has all of the application labels listed in the table below. All other application metrics have the guid label only.

Table 7.7 Application Labels
Label or Attribute Name	Description
`controllability_url`	The URL and port for the control server on the Collector Service that forwards data for the application. This URL is used when sending remote commands to the Collector Service to configure the telemetry data for the application. The remote commands use the Collector Service REST API. See Collector Service REST API Reference for details on the Collector Service REST API.
`guid`	Application resource GUID
`hostname`	Name of the host computer for the application
`process_id`	Process ID for the application
`name`	Fully qualified resource name (/applications/<AppName>)

Table 7.8 Application Metrics
Metric Name	Description	Type
`dds_application_presence`	Indicates the presence of the application and provides all label values for an application instance	Gauge
`dds_application_process_memory_usage_resident_memory_bytes`	The application resident memory utilization	Gauge
`dds_application_process_memory_usage_virtual_memory_bytes`	The application virtual memory utilization	Gauge
`dds_application_logging_collection_middleware_level`	The middleware collection syslog logging level. See Logs for valid values.	Gauge
`dds_application_logging_forwarding_middleware_level`	The middleware forwarding syslog logging level. See Logs for valid values.	Gauge

7.3.3. Participant Metrics

The following tables describe the metrics and labels generated for Connext DomainParticipants. Only the dds_domain_participant_presence metric has all of the DomainParticipant labels listed in the table below. All other DomainParticipant metrics have the guid label only.

The DomainParticipant resource contains statistic variable metrics such as dds_domain_participant_udpv4_usage_in_net_pkts_count, dds_domain_participant_udpv4_usage_in_net_pkts_mean, dds_domain_participant_udpv4_usage_in_net_pkts_min, and dds_domain_participant_udpv4_usage_in_net_pkts_max.

These variables are interpreted as follows:

The metrics with suffix _count represent the total number of packets or bytes over the last Prometheus scraping period.
The metrics with suffix _min represent the minimum mean over the last Prometheus scraping period. For example, dds_domain_participant_udpv4_usage_in_net_pkts_min contains the minimum packets/sec over the last scraping period. The min mean is calculated by choosing the minimum of individual mean values reported by Monitoring Library 2.0 every participant_factory_qos.monitoring.distribution_settings.periodic_settings.polling_period.
The metrics with suffix _max represent the maximum mean over the last Prometheus scraping period. For example, dds_domain_participant_udpv4_usage_in_net_pkts_max contains the maximum packets/sec over the last scraping period. The max mean is calculated by choosing the maximum of individual mean values reported by Monitoring Library 2.0 every participant_factory_qos.monitoring.distribution_settings.periodic_settings.polling_period.
The metrics with suffix _mean represent the mean over the last Prometheus scraping period. For example, dds_domain_participant_udpv4_usage_in_net_pkts_mean contains the packets/sec over the last scraping period. If the scraping period is 30 seconds, the metric contains the packets/sec generated within the last 30 seconds. The dds_domain_participant_udpv4_usage_in_net_pkts_mean is calculated by averaging all individual mean metrics sent by Monitoring Library 2.0 to Observability Collector Service over the last scraping period.

Table 7.9 Participant Labels
Label or Attribute Name	Description
`guid`	DomainParticipant resource GUID
`owner_guid`	Resource GUID of the owner entity (application)
`dds_guid`	DomainParticipant DDS GUID
`hostname`	Name of the host computer for the DomainParticipant
`process_id`	Process ID for the DomainParticipant
`domain_id`	DDS domain ID for the DomainParticipant
`platform`	Connext architecture as described in the RTI Architecture Abbreviation column in the Platform Notes.
`product_version`	Connext product version
`name`	Fully qualified resource name (/applications/<AppName> /domain_participants/<ParticipantName>)

Table 7.10 Participant Metrics
Metric Name	Description	Type
`dds_domain_participant_presence`	Indicates the presence of the DomainParticipant and provides all label values for a DomainParticipant instance	Gauge
`dds_domain_participant_udpv4_usage_in_net_pkts_count`	The UDPv4 transport in packets count over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_in_net_pkts_mean`	The UDPv4 transport in packets mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_in_net_pkts_min`	The UDPv4 transport in packets min mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_in_net_pkts_max`	The UDPv4 transport in packets max mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_in_net_bytes_count`	The UDPv4 transport in bytes count over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_in_net_bytes_mean`	The UDPv4 transport in bytes mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_in_net_bytes_min`	The UDPv4 transport in bytes min mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_in_net_bytes_max`	The UDPv4 transport in bytes max mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_out_net_pkts_count`	The UDPv4 transport out packets count over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_out_net_pkts_mean`	The UDPv4 transport out packets mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_out_net_pkts_min`	The UDPv4 transport out packets min mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_out_net_pkts_max`	The UDPv4 transport out packets max mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_out_net_bytes_count`	The UDPv4 transport out bytes count over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_out_net_bytes_mean`	The UDPv4 transport out bytes mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_out_net_bytes_min`	The UDPv4 transport out bytes min mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv4_usage_out_net_bytes_max`	The UDPv4 transport out bytes max mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_in_net_pkts_count`	The UDPv6 transport in packets count over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_in_net_pkts_mean`	The UDPv6 transport in packets mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_in_net_pkts_min`	The UDPv6 transport in packets min mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_in_net_pkts_max`	The UDPv6 transport in packets max mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_in_net_bytes_count`	The UDPv6 transport in bytes count over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_in_net_bytes_mean`	The UDPv6 transport in bytes mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_in_net_bytes_min`	The UDPv6 transport in bytes min mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_in_net_bytes_max`	The UDPv6 transport in bytes max mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_out_net_pkts_count`	The UDPv6 transport out packets count over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_out_net_pkts_mean`	The UDPv6 transport out packets mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_out_net_pkts_min`	The UDPv6 transport out packets min mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_out_net_pkts_max`	The UDPv6 transport out packets max mean (packets/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_out_net_bytes_count`	The UDPv6 transport out bytes count over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_out_net_bytes_mean`	The UDPv6 transport out bytes mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_out_net_bytes_min`	The UDPv6 transport out bytes min mean (bytes/sec) over the last scraping period	Gauge
`dds_domain_participant_udpv6_usage_out_net_bytes_max`	The UDPv6 transport out bytes max mean (bytes/sec) over the last scraping period	Gauge

7.3.4. Topic Metrics

The following tables describe the metrics and labels generated for Connext Topics. Only the dds_topic_presence metric has all of the Topic labels listed in the table below. All other Topic metrics have the guid label only.

Table 7.11 Topic Labels
Label or Attribute Name	Description
`guid`	Topic resource GUID
`owner_guid`	Resource GUID of the owner entity (DomainParticipant)
`dds_guid`	Topic DDS GUID
`hostname`	Name of the host computer for the DomainParticipant this Topic is registered with
`domain_id`	DDS domain ID for the DomainParticipant this Topic is registered with
`topic_name`	The Topic name
`registered_type_name`	The registered type name for this Topic
`name`	Fully qualified resource name (/applications/<AppName>/domain_participants /<ParticipantName>/topics/<TopicName>)

Table 7.12 Topic Metrics
Metric Name	Description	Type
`dds_topic_presence`	Indicates the presence of the Topic and provides all label values for a Topic instance	Gauge
`dds_topic_inconsistent_total`	See total_count field in the INCONSISTENT_TOPIC Status	Counter

7.3.5. DataWriter Metrics

The following tables describe the metrics and labels generated for Connext DataWriters. Only the dds_data_writer_presence metric has all of the DataWriter labels listed in the table below. All other DataWriter metrics have the guid label only.

Table 7.13 DataWriter Labels
Label or Attribute Name	Description
`guid`	DataWriter resource GUID
`owner_guid`	Resource GUID of the owner entity (publisher)
`dds_guid`	DataWriter DDS GUID
`hostname`	Name of the host computer for the DomainParticipant this DataWriter is registered with
`domain_id`	DDS domain ID for the DomainParticipant this DataWriter is registered with
`topic_name`	The Topic name for this DataWriter
`registered_type_name`	The registered type name for this DataWriter
`name`	Fully qualified resource name (/applications/<AppName>/domain_participants /<ParticipantName>/publishers/<PublisherName>/data_writers/<DataWriterName>)
`participant_guid`	Resource GUID of the DomainParticipant this DataWriter is registered with

Table 7.14 DataWriter Metrics
Metric Name	Description	Type
`dds_data_writer_presence`	Indicates the presence of the DataWriter and provides all label values for a DataWriter instance	Gauge
`dds_data_writer_liveliness_lost_total`	See total_count field in the LIVELINESS_LOST Status	Counter
`dds_data_writer_deadline_missed_total`	See total_count field in the OFFERED_DEADLINE_MISSED Status	Counter
`dds_data_writer_incompatible_qos_total`	See total_count field in the OFFERED_INCOMPATIBLE_QOS Status	Counter
`dds_data_writer_reliable_cache_full_total`	See full_reliable_writer_cache field in the RELIABLE_WRITER_CACHE_CHANGED Status	Counter
`dds_data_writer_reliable_cache_high_watermark_total`	See high_watermark_reliable_writer_cache field in the RELIABLE_WRITER_CACHE_CHANGED Status	Counter
`dds_data_writer_reliable_cache_unack_samples`	See unacknowledged_sample_count field in the RELIABLE_WRITER_CACHE_CHANGED Status	Gauge
`dds_data_writer_reliable_cache_unack_samples_peak`	See unacknowledged_sample_count_peak field in the RELIABLE_WRITER_CACHE_CHANGED Status	Gauge
`dds_data_writer_reliable_cache_replaced_unack_samples_total`	See replaced_unacknowledged_sample_count field in the RELIABLE_WRITER_CACHE_CHANGED Status	Counter
`dds_data_writer_reliable_reader_activity_inactive_count`	See inactive_count field in the RELIABLE_READER_ACTIVITY_CHANGED Status	Gauge
`dds_data_writer_cache_samples_peak`	See sample_count_peak field in the DATA_WRITER_CACHE_STATUS	Gauge
`dds_data_writer_cache_samples`	See sample_count field in the DATA_WRITER_CACHE_STATUS	Gauge
`dds_data_writer_cache_alive_instances`	See alive_instance_count field in the DATA_WRITER_CACHE_STATUS	Gauge
`dds_data_writer_cache_alive_instances_peak`	See alive_instance_count_peak field in the DATA_WRITER_CACHE_STATUS	Gauge
`dds_data_writer_protocol_pushed_samples_total`	See pushed_sample_count field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_pushed_sample_bytes_total`	See pushed_sample_bytes field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_sent_heartbeats_total`	See sent_heartbeat_count field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_pulled_samples_total`	See pulled_sample_count field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_pulled_sample_bytes_total`	See pulled_sample_bytes field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_received_nacks_total`	See received_nack_count field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_received_nack_bytes_total`	See received_nack_bytes field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_send_window_size`	See send_window_size field in the DATA_WRITER_PROTOCOL_STATUS	Gauge
`dds_data_writer_protocol_pushed_fragments_total`	See pushed_fragment_count field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_pushed_fragment_bytes_total`	See pushed_fragment_bytes field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_pulled_fragments_total`	See pulled_fragment_count field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_pulled_fragment_bytes_total`	See pulled_fragment_bytes field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_received_nack_fragments_total`	See received_nack_fragment_count field in the DATA_WRITER_PROTOCOL_STATUS	Counter
`dds_data_writer_protocol_received_nack_fragment_bytes_total`	See received_nack_fragment_bytes field in the DATA_WRITER_PROTOCOL_STATUS	Counter

7.3.6. DataReader Metrics

The following tables describe the metrics and labels generated for Connext DataReaders. Only the ddsd_datareader_presence metric has all of the DataReader labels listed in the table below. All other DataReader metrics have the guid label only.

Table 7.15 DataReader Labels
Label or Attribute Name	Description
`guid`	DataReader resource GUID
`owner_guid`	Resource GUID of the owner entity (subscriber)
`dds_guid`	DataReader DDS GUID
`hostname`	Name of the host computer for the DomainParticipant this DataReader is registered with
`domain_id`	DDS domain ID for the DomainParticipant this DataReader is registered with
`topic_name`	The Topic name for this DataReader
`registered_type_name`	The registered type name for this DataReader
`name`	Fully qualified resource name (/applications/<AppName>/domain_participants/<ParticipantName> /subscribers/<SubscriberName>/data_readers/<DataReaderName>)
`participant_guid`	Resource GUID of the DomainParticipant this DataReader is registered with

Table 7.16 DataReader Metrics
Metric Name	Description	Type
`dds_data_reader_presence`	Indicates the presence of the DataReader and provides all label values for a DataReader instance	Gauge
`dds_data_reader_sample_rejected_total`	See total_count field in the SAMPLE_REJECTED Status	Counter
`dds_data_reader_liveliness_not_alive_count`	See not_alive_count field in the LIVELINESS_CHANGED Status	Gauge
`dds_data_reader_deadline_missed_total`	See total_count field in the REQUESTED_DEADLINE_MISSED Status	Counter
`dds_data_reader_incompatible_qos_total`	See total_count field in the REQUESTED_INCOMPATIBLE_QOS Status	Counter
`dds_data_reader_sample_lost_total`	See total_count field in the SAMPLE_LOST Status	Counter
`dds_data_reader_cache_samples_peak`	See sample_count_peak field in the DATA_READER_CACHE_STATUS	Gauge
`dds_data_reader_cache_samples`	See sample_count field in the DATA_READER_CACHE_STATUS	Gauge
`dds_data_reader_cache_old_source_ts_dropped_samples_total`	See old_source_timestamp_dropped_sample_count field in the DATA_READER_CACHE_STATUS	Counter
`dds_data_reader_cache_tolerance_source_ts_dropped_samples_total`	See tolerance_source_timestamp_dropped_sample_count field in the DATA_READER_CACHE_STATUS	Counter
`dds_data_reader_cache_content_filter_dropped_samples_total`	See content_filter_dropped_sample_count field in the DATA_READER_CACHE_STATUS	Counter
`dds_data_reader_cache_replaced_dropped_samples_total`	See replaced_dropped_sample_count field in the DATA_READER_CACHE_STATUS	Counter
`dds_data_reader_cache_samples_dropped_by_instance_replaced_total`	See total_samples_dropped_by_instance_replacement field in the DATA_READER_CACHE_STATUS	Counter
`dds_data_reader_cache_alive_instances`	See alive_instance_count field in the DATA_READER_CACHE_STATUS	Gauge
`dds_data_reader_cache_alive_instances_peak`	See alive_instance_count_peak field in the DATA_READER_CACHE_STATUS	Gauge
`dds_data_reader_cache_no_writers_instances`	See no_writers_instance_count field in the DATA_READER_CACHE_STATUS	Gauge
`dds_data_reader_cache_no_writers_instances_peak`	See no_writers_instance_count_peak field in the DATA_READER_CACHE_STATUS	Gauge
`dds_data_reader_cache_disposed_instances`	See disposed_instance_count field in the DATA_READER_CACHE_STATUS	Gauge
`dds_data_reader_cache_disposed_instances_peak`	See disposed_instance_count_peak field in the DATA_READER_CACHE_STATUS	Gauge
`dds_data_reader_cache_compressed_samples_total`	See compressed_sample_count field in the DATA_READER_CACHE_STATUS	Counter
`dds_data_reader_protocol_received_samples_total`	See received_sample_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_received_sample_bytes_total`	See received_sample_bytes field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_duplicate_samples_total`	See duplicate_sample_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_duplicate_sample_bytes_total`	See duplicate_sample_bytes field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_received_heartbeats_total`	See received_heartbeat_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_sent_nacks_total`	See sent_nack_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_sent_nack_bytes_total`	See sent_nack_bytes field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_rejected_samples_total`	See rejected_sample_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_out_of_range_rejected_samples_total`	See out_of_range_rejected_sample_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_received_fragments_total`	See received_fragment_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_dropped_fragments_total`	See dropped_fragment_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_reassembled_samples_total`	See reassembled_sample_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_sent_nack_fragments_total`	See sent_nack_fragment_count field in the DATA_READER_PROTOCOL_STATUS	Counter
`dds_data_reader_protocol_sent_nack_fragment_bytes_total`	See sent_nack_fragment_bytes field in the DATA_READER_PROTOCOL_STATUS	Counter

7.3.7. Derived Metrics Generated by Prometheus Recording Rules

Prometheus provides a capability called Recording Rules. The following text is an excerpt from the Prometheus documentation.

Recording rules allow you to precompute frequently needed or computationally
expensive expressions and save their result as a new set of time series.
Querying the precomputed result will then often be much faster than executing
the original expression every time it is needed. This is especially useful for
dashboards, which need to query the same expression repeatedly every time they
refresh.

A Prometheus recording rule generates a new metric time series with new values calculated at the frequency at which the rule is run. The recording rules in Observability Framework are run every 10 seconds, meaning there is an evaluation and update to the associated derived metric every 10 seconds. Observability Framework uses Prometheus recording rules to generate three types of derived metrics.

DDS entity proxy metrics
raw error metrics
aggregated error metrics.

Each of these derived metric types is discussed in detail below.

The Grafana dashboards provided with Observability Framework make use of the error metrics generated by Prometheus recording rules. The aggregated error metrics are used on the Alert Home dashboard, while the raw error metrics are used on other dashboards.

7.3.7.1. DDS Entity Proxy Metrics

The DDS entity proxy metrics are used in the recording rules for the raw error metrics and are always 0. The proxy metrics are used to make sure the rules evaluate to known good values in cases where the underlying metrics are not available.

Table 7.17 DDS Entity Proxy Metrics
Metric Name	Description
`dds_application_empty_metric`	A proxy for applications metrics that always provides a value of zero.
`dds_domain_participant_empty_metric`	A proxy for applications metrics that always provides a value of zero.
`dds_topic_empty_metric`	A proxy for applications metrics that always provides a value of zero.
`dds_data_writer_empty_metric`	A proxy for applications metrics that always provides a value of zero.
`dds_data_reader_empty_metric`	A proxy for applications metrics that always provides a value of zero.

7.3.7.2. Raw Error Metrics

Raw error metrics are derived for select metrics by doing a boolean comparison to a predefined limit. The raw error metrics are created by converting the monotonically increasing value of a counter metric into a rate, comparing that rate to a limit, and returning a boolean value. The returned boolean value is 1 if the limit is exceeded, otherwise 0. In the Grafana dashboards, a value of 0 indicates a healthy condition for the error metric, while a value of 1 indicates a fail condition.

Recording rules have been created to generate a derived raw error metric for all of the metrics listed in Table 7.18 and Table 7.19.

7.3.7.2.1. Enabled Raw Error Metrics

A set of recording rules have been created that are useful for detecting failures in all systems. These rules detect conditions that are not expected to occur in a system that is operating correctly. The rules for these “enabled” metrics test if the underlying metric has exceeded a limit of 0. Note the >bool 0 comparison operator in each of the recording rules. A value greater than 0 in any of these metrics will result in an alert indication in the dashboards. This set of metrics is “enabled” because any increase in the underlying metric indicates an unexpected condition in DDS. Table 7.18 lists derived Raw error metrics that are “enabled”.

Table 7.18 Raw Error Metrics (enabled)
Metric Name	Recording Rule
`dds_data_reader_cache_content_filter_dropped_samples_errors`	rate(dds_data_reader_cache_content_filter_dropped_samples_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_replaced_dropped_samples_errors`	rate(dds_data_reader_cache_replaced_dropped_samples_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_samples_dropped_by_instance_replaced_errors`	rate(dds_data_reader_cache_samples_dropped_by_instance_replaced_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_rejected_samples_errors`	rate(dds_data_reader_protocol_rejected_samples_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_out_of_range_rejected_samples_errors`	rate(dds_data_reader_protocol_out_of_range_rejected_samples_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_dropped_fragments_errors`	rate(dds_data_reader_protocol_dropped_fragments_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_topic_inconsistent_errors`	rate(dds_topic_inconsistent_total[1m]) >bool 0 or dds_topic_empty_metric
`dds_data_writer_incompatible_qos_errors`	rate(dds_data_writer_incompatible_qos_total[1m]) >bool 0 or dds_data_writer_empty_metric
`dds_data_reader_incompatible_qos_errors`	rate(dds_data_reader_incompatible_qos_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_writer_liveliness_lost_errors`	rate(dds_data_writer_liveliness_lost_total[1m]) >bool 0 or dds_data_writer_empty_metric
`dds_data_writer_reliable_reader_activity_inactive_count_errors`	rate(dds_data_writer_reliable_reader_activity_inactive_count[1m]) >bool 0 or dds_data_writer_empty_metric
`dds_data_reader_liveliness_not_alive_count_errors`	rate(dds_data_reader_liveliness_not_alive_count[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_tolerance_source_ts_dropped_samples_errors`	rate(dds_data_reader_cache_tolerance_source_ts_dropped_samples_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_writer_deadline_missed_errors`	rate(dds_data_writer_deadline_missed_total[1m]) >bool 0 or dds_data_writer_empty_metric
`dds_data_reader_deadline_missed_errors`	rate(dds_data_reader_deadline_missed_total[1m]) >bool 0 or dds_data_reader_empty_metric
`dds_data_writer_reliable_cache_replaced_unack_samples_errors`	rate(dds_data_writer_reliable_cache_replaced_unack_samples_total[1m]) >bool 0 or dds_data_writer_empty_metric
`dds_data_reader_sample_lost_errors`	rate(dds_data_reader_sample_lost_total[1m]) >bool 0 or dds_data_reader_empty_metric

7.3.7.2.2. Disabled Raw Error Metrics

Additional recording rules have been created that by default are not useful for detecting failures since the meaningful rules depend on comparisons to values that will be dependent on actual system requirements. The rules for the “disabled” metrics test to see if the underlying metric is less than a limit of 0, ensuring that the derived raw error metric never indicates a failure, hence disabled. Note the <bool 0 comparison operator in each of the recording rules. This set of metrics is “disabled” because a meaningful limit that would indicate a fail condition cannot be determined without additional knowledge of the system.

Users may modify a “disabled” rule to compare against a value that is meaningful to their system. For example, if users want to be notified when the number of repaired samples over the last minute exceeds 10, then they would modify the rule

rate(dds_data_writer_protocol_pulled_samples_total[1m]) <bool 0 or dds_data_writer_empty_metric

To

rate(dds_data_writer_protocol_pulled_samples_total[1m]) >bool 10 or dds_data_writer_empty_metric

For complete instructions on how to enable these metrics and display them in the dashboards, see Enable a Raw Error Metric.

The “disabled” rules have been created as a convenience for the user. However, only a few of these rules may be useful for any specific system. Table 7.19 lists derived raw error metrics that are “disabled”.

Table 7.19 Raw Error Metrics (disabled)
Metric Name	Recording Rule
`dds_data_writer_protocol_sent_heartbeats_errors`	rate(dds_data_writer_protocol_sent_heartbeats_total[1m] <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_received_nacks_errors`	rate(dds_data_writer_protocol_received_nacks_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_received_nack_bytes_errors`	rate(dds_data_writer_protocol_received_nack_bytes_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_received_nack_fragments_errors`	rate(dds_data_writer_protocol_received_nack_fragments_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_received_nack_fragment_bytes_errors`	rate(dds_data_writer_protocol_received_nack_fragment_bytes_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_reader_protocol_received_heartbeats_errors`	rate(dds_data_reader_protocol_received_heartbeats_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_sent_nacks_errors`	rate(dds_data_reader_protocol_sent_nacks_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_sent_nack_bytes_errors`	rate(dds_data_reader_protocol_sent_nack_bytes_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_sent_nack_fragments_errors`	rate(dds_data_reader_protocol_sent_nack_fragments_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_sent_nack_fragment_bytes_errors`	rate(dds_data_reader_protocol_sent_nack_fragment_bytes_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_writer_protocol_pulled_samples_errors`	rate(dds_data_writer_protocol_pulled_samples_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_pulled_sample_bytes_errors`	rate(dds_data_writer_protocol_pulled_sample_bytes_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_pulled_fragments_errors`	rate(dds_data_writer_protocol_pulled_fragments_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_pulled_fragment_bytes_errors`	rate(dds_data_writer_protocol_pulled_fragment_bytes_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_pushed_samples_errors`	rate(dds_data_writer_protocol_pushed_samples_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_pushed_sample_bytes_errors`	rate(dds_data_writer_protocol_pushed_sample_bytes_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_pushed_fragments_errors`	rate(dds_data_writer_protocol_pushed_fragments_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_pushed_fragment_bytes_errors`	rate(dds_data_writer_protocol_pushed_fragment_bytes_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_reader_cache_compressed_samples_errors`	rate(dds_data_reader_cache_compressed_samples_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_duplicate_samples_errors`	rate(dds_data_reader_protocol_duplicate_samples_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_duplicate_sample_bytes_errors`	rate(dds_data_reader_protocol_duplicate_sample_bytes_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_received_samples_errors`	rate(dds_data_reader_protocol_received_samples_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_received_sample_bytes_errors`	rate(dds_data_reader_protocol_received_sample_bytes_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_received_fragments_errors`	rate(dds_data_reader_protocol_received_fragments_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_protocol_reassembled_samples_errors`	rate(dds_data_reader_protocol_reassembled_samples_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_application_process_memory_usage_resident_memory_bytes_errors`	rate(dds_application_process_memory_usage_resident_memory_bytes[1m]) <bool 0 or dds_application_empty_metric
`dds_application_process_memory_usage_virtual_memory_bytes_errors`	rate(dds_application_process_memory_usage_virtual_memory_bytes[1m]) <bool 0 or dds_application_empty_metric
`dds_domain_participant_udpv4_usage_in_net_pkts_errors`	rate(dds_domain_participant_udpv4_usage_in_net_pkts_mean[1m]) <bool 0 or dds_domain_participant_empty_metric
`dds_domain_participant_udpv4_usage_in_net_bytes_errors`	rate(dds_domain_participant_udpv4_usage_in_net_bytes_mean[1m]) <bool 0 or dds_domain_participant_empty_metric
`dds_domain_participant_udpv4_usage_out_net_pkts_errors`	rate(dds_domain_participant_udpv4_usage_out_net_pkts_mean[1m]) <bool 0 or dds_domain_participant_empty_metric
`dds_domain_participant_udpv4_usage_out_net_bytes_errors`	rate(dds_domain_participant_udpv4_usage_out_net_bytes_mean[1m]) <bool 0 or dds_domain_participant_empty_metric
`dds_domain_participant_udpv6_usage_in_net_pkts_errors`	rate(dds_domain_participant_udpv6_usage_in_net_pkts_mean[1m]) <bool 0 or dds_domain_participant_empty_metric
`dds_domain_participant_udpv6_usage_in_net_bytes_errors`	rate(dds_domain_participant_udpv6_usage_in_net_bytes_mean[1m]) <bool 0 or dds_domain_participant_empty_metric
`dds_domain_participant_udpv6_usage_out_net_pkts_errors`	rate(dds_domain_participant_udpv6_usage_out_net_pkts_mean[1m]) <bool 0 or dds_domain_participant_empty_metric
`dds_domain_participant_udpv6_usage_out_net_bytes_errors`	rate(dds_domain_participant_udpv6_usage_out_net_bytes_mean[1m]) <bool 0 or dds_domain_participant_empty_metric
`dds_data_writer_reliable_cache_full_errors`	rate(dds_data_writer_reliable_cache_full_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_reliable_cache_high_watermark_errors`	rate(dds_data_writer_reliable_cache_high_watermark_total[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_reliable_cache_unack_samples_errors`	rate(dds_data_writer_reliable_cache_unack_samples[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_reliable_cache_unack_samples_peak_errors`	rate(dds_data_writer_reliable_cache_unack_samples_peak[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_protocol_send_window_size_errors`	rate(dds_data_writer_protocol_send_window_size[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_cache_samples_errors`	rate(dds_data_writer_cache_samples[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_cache_samples_peak_errors`	rate(dds_data_writer_cache_samples_peak[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_cache_alive_instances_errors`	rate(dds_data_writer_cache_alive_instances[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_writer_cache_alive_instances_peak_errors`	rate(dds_data_writer_cache_alive_instances_peak[1m]) <bool 0 or dds_data_writer_empty_metric
`dds_data_reader_sample_rejected_errors`	rate(dds_data_reader_sample_rejected_total[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_samples_errors`	rate(dds_data_reader_cache_samples[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_samples_peak_errors`	rate(dds_data_reader_cache_samples_peak[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_alive_instances_errors`	rate(dds_data_reader_cache_alive_instances[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_alive_instances_peak_errors`	rate(dds_data_reader_cache_alive_instances_peak[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_no_writers_instances_errors`	rate(dds_data_reader_cache_no_writers_instances[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_no_writers_instances_peak_errors`	rate(dds_data_reader_cache_no_writers_instances_peak[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_disposed_instances_errors`	rate(dds_data_reader_cache_disposed_instances[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_disposed_instances_peak_errors`	rate(dds_data_reader_cache_disposed_instances_peak[1m]) <bool 0 or dds_data_reader_empty_metric
`dds_data_reader_cache_old_source_ts_dropped_samples_errors`	rate(dds_data_reader_cache_old_source_ts_dropped_samples_total[1m]) <bool 0 or dds_data_reader_empty_metric

7.3.7.3. Aggregated Error Metrics

The aggregated error metrics create a status roll-up for a group of metrics in a particular category. These aggregated error metrics are used in the Alert Home dashboard to provide a high-level view of alerts grouped by category. The categories are Bandwidth, Saturation, Data Loss, System Errors, and Delays. The aggregated error metrics are created by adding together all of the raw error metrics assigned to a category and clamping the values at 1, the value that indicates a failed condition. Table 7.20 shows all of the aggregated error metrics and the rule used to generate them. Note the use of the raw error metrics in the rules.

Table 7.20 Aggregate Error Metrics
Metric Name	Recording Rule
`dds_excessive_bandwidth_errors`	clamp_max ((sum (dds_custom_excessive_bandwidth_errors) + sum (dds_data_writer_protocol_sent_heartbeats_errors) + sum (dds_data_writer_protocol_received_nacks_errors) + sum (dds_data_writer_protocol_received_nack_bytes_errors) + sum (dds_data_writer_protocol_received_nack_fragments_errors) + sum (dds_data_writer_protocol_received_nack_fragment_bytes_errors) + sum (dds_data_reader_protocol_received_heartbeats_errors) + sum (dds_data_reader_protocol_sent_nacks_errors) + sum (dds_data_reader_protocol_sent_nack_bytes_errors) + sum (dds_data_reader_protocol_sent_nack_fragments_errors) + sum (dds_data_reader_protocol_sent_nack_fragment_bytes_errors) + sum (dds_data_writer_protocol_pulled_samples_errors) + sum (dds_data_writer_protocol_pulled_sample_bytes_errors) + sum (dds_data_writer_protocol_pulled_fragments_errors) + sum (dds_data_writer_protocol_pulled_fragment_bytes_errors) + sum (dds_data_writer_protocol_pushed_samples_errors) + sum (dds_data_writer_protocol_pushed_sample_bytes_errors) + sum (dds_data_writer_protocol_pushed_fragments_errors) + sum (dds_data_writer_protocol_pushed_fragment_bytes_errors) + sum (dds_data_reader_cache_content_filter_dropped_samples_errors) + sum (dds_data_reader_cache_compressed_samples_errors) + sum (dds_data_reader_protocol_duplicate_samples_errors) + sum (dds_data_reader_protocol_duplicate_sample_bytes_errors) + sum (dds_data_reader_protocol_received_samples_errors) + sum (dds_data_reader_protocol_received_sample_bytes_errors) + sum (dds_data_reader_protocol_received_fragments_errors) + sum (dds_data_reader_protocol_reassembled_samples_errors)), 1)
`dds_saturation_errors`	clamp_max ((sum (dds_custom_saturation_errors) + sum (dds_application_process_memory_usage_resident_memory_bytes_errors) + sum (dds_application_process_memory_usage_virtual_memory_bytes_errors) + sum (dds_domain_participant_udpv4_usage_in_net_pkts_errors) + sum (dds_domain_participant_udpv4_usage_in_net_bytes_errors) + sum (dds_domain_participant_udpv4_usage_out_net_pkts_errors) + sum (dds_domain_participant_udpv4_usage_out_net_bytes_errors) + sum (dds_domain_participant_udpv6_usage_in_net_pkts_errors) + sum (dds_domain_participant_udpv6_usage_in_net_bytes_errors) + sum (dds_domain_participant_udpv6_usage_out_net_pkts_errors) + sum (dds_domain_participant_udpv6_usage_out_net_bytes_errors) + sum (dds_data_writer_reliable_cache_full_errors) + sum (dds_data_writer_reliable_cache_high_watermark_errors) + sum (dds_data_writer_reliable_cache_unack_samples_errors) + sum (dds_data_writer_reliable_cache_unack_samples_peak_errors) + sum (dds_data_writer_protocol_send_window_size_errors) + sum (dds_data_writer_cache_samples_errors) + sum (dds_data_writer_cache_samples_peak_errors) + sum (dds_data_writer_cache_alive_instances_errors) + sum (dds_data_writer_cache_alive_instances_peak_errors) + sum (dds_data_reader_sample_rejected_errors) + sum (dds_data_reader_cache_samples_errors) + sum (dds_data_reader_cache_samples_peak_errors) + sum (dds_data_reader_cache_replaced_dropped_samples_errors) + sum (dds_data_reader_cache_samples_dropped_by_instance_replaced_errors) + sum (dds_data_reader_cache_alive_instances_errors) + sum (dds_data_reader_cache_alive_instances_peak_errors) + sum (dds_data_reader_cache_no_writers_instances_errors) + sum (dds_data_reader_cache_no_writers_instances_peak_errors) + sum (dds_data_reader_cache_disposed_instances_errors) + sum (dds_data_reader_cache_disposed_instances_peak_errors) + sum (dds_data_reader_protocol_rejected_samples_errors) + sum (dds_data_reader_protocol_out_of_range_rejected_samples_errors) + sum (dds_data_reader_protocol_dropped_fragments_errors)), 1)
`dds_errors`	clamp_max ((sum (dds_custom_errors) + sum (dds_topic_inconsistent_errors) + sum (dds_data_writer_incompatible_qos_errors) + sum (dds_data_reader_incompatible_qos_errors) + sum (dds_data_writer_liveliness_lost_errors) + sum (dds_data_writer_reliable_reader_activity_inactive_count_errors) + sum (dds_data_reader_liveliness_not_alive_count_errors) + sum (dds_data_reader_cache_old_source_ts_dropped_samples_errors) + sum (dds_data_reader_cache_tolerance_source_ts_dropped_samples_errors)), 1)
`dds_delays_errors`	clamp_max ((sum (dds_custom_delays_errors) + sum (dds_data_writer_deadline_missed_errors) + sum (dds_data_reader_deadline_missed_errors)), 1)
`dds_data_loss_errors`	clamp_max ((sum (dds_custom_data_loss_errors) + sum (dds_data_writer_reliable_cache_replaced_unack_samples_errors) + sum (dds_data_reader_sample_lost_errors) + sum (dds_data_reader_cache_replaced_dropped_samples_errors) + sum (dds_data_reader_cache_samples_dropped_by_instance_replaced_errors) + sum (dds_data_reader_cache_tolerance_source_ts_dropped_samples_errors)), 1)

7.3.7.4. Enable a Raw Error Metric

Note

The Grafana user must have Admin privileges to make any changes to the Grafana dashboards.

Use the following steps to enable any of the “disabled” metrics in your system:

Update the raw error rule to enable the calculation and provide a limit. See Update the Recording Rule for the Derived Metric below.
Update the Alert “Category” dashboard to update the background color of the OK/ERROR and State panels for the enabled metric. See Update the Alert “Category” Dashboard below.
Update the “Entity” status dashboard to update the query and background color in the State panel. See Update the “Entity” Status Dashboard below.

The example that follows uses the dds_data_reader_cache_alive_instances_errors metric to update/enable a rule to detect any DataReader that has more than 3 ALIVE instances in its cache.

7.3.7.4.1. Update the Recording Rule for the Derived Metric

Locate the recording rule for the dds_data_reader_cache_alive_instances_errors metric in the monitoring_recording_rules.yml file located in the rti_workspace/<version>/observability/prometheus directory.

 # User Config Required
   - record: dds_data_reader_cache_alive_instances_errors
     expr: >
       rate(dds_data_reader_cache_alive_instances[1m]) <bool 0 or dds_data_reader_empty_metric

The dds_data_reader_cache_alive_instances metric is a gauge metric, meaning we want to use the absolute value for our limit check rather than the rate. In the following example recording rule, we want to update the limit test so that the error will be active whenever the value is greater than 3.

 # User Config Required
   - record: dds_data_reader_cache_alive_instances_errors
     expr: >
       dds_data_reader_cache_alive_instances >bool 3 or dds_data_reader_empty_metric

Important

After updating the monitoring_recording_rules.yml file, you must restart all Docker containers for Observability Framework by running rtiobservability -t followed by rtiobservability -s. The Prometheus server will read the updated file after restarting the containers.

7.3.7.4.2. Update the Alert “Category” Dashboard

Note

The Grafana images in this section were generated with Grafana version 9.2.1. If you are using a different version of Grafana, the interface may be slightly different.

Locate the Alert “Category” dashboard for the metric rule you are enabling. The metric in our example, dds_data_reader_cache_alive_instances_errors, is in the Saturation group (see Table 7.20), so the Alert Saturation dashboard is used in the following steps.

Go to Dashboards > Browse to open the list of dashboards.
Select the Alert Saturation dashboard from the list.
Once on the Alert Saturation dashboard, scroll down to the Alive Instances row under the Reader Cache section.
Select Alive Instances > Edit from the status indicator panel menu.
In the right panel, scroll down until you find the Value mappings section.
Click the gray color circle next to the OK mapping to select a new color for the panel “OK” indication.
Select the large green circle in the panel. The updated OK value should change from gray to green.
Select Apply at the top right to apply the change and return to the Alert Saturation dashboard.
Select Alive Instances > Edit from the status indicator panel menu.
In the right panel, scroll down to the Thresholds section.
Click the gray circle next to Base to select a new base color for the Thresholds panel.
Select the large green circle in the panel. The updated Threshold base value should change to green.
Select Apply at the top right to apply the changes and return to the Alert Saturation dashboard.
Select the Save Dashboard icon at the top right.
When prompted to confirm, select Save.

The Alive Instances row under the Reader Cache section should now be green, indicating it is enabled.

7.3.7.4.3. Update the “Entity” Status Dashboard

Locate the “Entity” status dashboard for the metric rule you are enabling. For the metric in our example, dds_data_reader_cache_alive_instances_errors, we need to update the Alert DataReader Status dashboard.

Go to Dashboards > Browse to open the list of dashboards.
Select the Alert DataReader Status dashboard from the list.
Once on the Alert DataReader Status dashboard, scroll down to the Alive Instances row under the Saturation/Reader Cache section.
Select Alive Instances > Edit from the status indicator panel menu.

The query for the panel is shown below.
Edit the query to match the rule that was created for the dds_data_reader_cache_alive_instances_errors metric. In the Metrics browser field, remove the irate calculation and set the limit check to >bool 3, as shown below.
In the right panel, scroll down to the Thresholds section.
Click the gray circle next to Base to select a new base color for the Thresholds panel.
Select the large green circle in the panel. The updated Threshold base value should change from gray to green.
Select Apply at the top right to apply the change and return to the Alert DataReader Status dashboard.
Select the Save Dashboard icon at the top right.
When prompted to confirm, select Save.

You have now enabled a rule for dds_data_reader_cache_alive_instances that detects any DataReader that has more than 3 sample instances in its queue with an instance state of ALIVE. The indication of this condition will display on all relevant dashboards.

You can test this rule by running the applications as described in section Start the Applications. Start any combination of publishing applications with the -s, --sensor-count command-line arguments totaling more than 3. Anytime this condition occurs, you will see this error indicated.

7.3.7.5. Custom Error Metrics

Table 7.21 shows metrics that are not fully implemented.

Table 7.21 Custom Error Metrics
Metric Name	Description
`dds_custom_excessive_bandwidth_errors`	Not fully implemented. Not to be modified or used.
`dds_custom_saturation_errors`	Not fully implemented. Not to be modified or used.
`dds_custom_errors`	Not fully implemented. Not to be modified or used.
`dds_custom_delays_errors`	Not fully implemented. Not to be modified or used.
`dds_custom_data_loss_errors`	Not fully implemented. Not to be modified or used.