5. Monitoring Library 2.0

RTI Monitoring Library 2.0 is one component of Connext Observability Framework. It allows collecting and distributing telemetry data, which includes observables (metrics and non-metrics) and logs associated with the resources created by a DDS application. These observable resources are DomainParticipants, Publishers, Subscribers, DataWriters, DataReaders, Topics, types, and applications (refer to Resources). The library also accepts remote commands to change the set of collected and forwarded metrics and logs at runtime.

The observables and logs collected by Monitoring Library 2.0 are distributed to a Collector Service instance. Collector Service forwards the data to other Collector Service instances, or stores it to a third-party observability backend such as Prometheus or Grafana Loki. The collected non-metric observables are used by RTI Admin Console to enable its remote debugging feature.

The non-metric observables required by the remote debugging feature are only collected and forwarded if an Admin Console session requests them.

Monitoring Library 2.0 is a separate library (rtimonitoring2); applications can use it in three different modes:

  • Dynamically loaded: This is the default mode, which does not require linking with your application. The only requirement is that the rtimonitoring2 shared library must be in the library search path. The library is loaded when the monitoring library is enabled. See Enabling Monitoring Library 2.0.

  • Dynamic Linking: The application is linked with the rtimonitoring2 shared library. When the application runs, the rtimonitoring2 shared library must be in the library search path.

  • Static Linking: The application is linked with the rtimonitoring2 static library.

The last two modes (dynamic and static linking) are only supported in C and C++ and require calling the API RTI_Monitoring_initialize in your application before any other Connext APIs. This API is defined in the header file ndds/monitoring/monitoring_monitoringClass.h. If you link with the rtimonitoring2 shared library but do not call RTI_Monitoring_initialize, the library operates in the default dynamically loaded mode. The advantage of calling RTI_Monitoring_initialize is that the application will fail to start if the library is not present in the library search path, which can help detect configuration issues early.

Regardless of the mode, to start monitoring your application, enable monitoring as described in Enabling Monitoring Library 2.0.

Monitoring Library 2.0 creates a dedicated Participant and uses three different built-in Topics to forward telemetry data to Collector Service:

  • Periodic: A best-effort Topic for distributing periodic metric data (for example, dds_data_writer_protocol_pushed_samples_total). The data is sent periodically, with a configurable period.

  • Event: A reliable Topic for distributing event metric data (for example, dds_data_writer_liveliness_lost_total). The data is sent when it changes.

  • Logging: A reliable Topic for distributing log data. The data is sent when a log event occurs.

The library creates one DomainParticipant and three DataWriters, one for each Topic type (periodic, event, and logging). Each DataWriter is created within its own Publisher.

When Monitoring Library 2.0 is enabled for an application (participant_factory_qos.monitoring.enable is true), every DDS Entity created by the application is registered with the library as an observable resource. Monitoring Library 2.0 can monitor all DDS Entities across multiple DomainParticipants.

You can select the metrics and logs that are collected and forwarded for an observable resource via an initial configuration (see Setting Initial Metrics and Log Configuration), and change them at runtime using remote commands. To change metric collection configuration dynamically at runtime, use the REST API as described in REST API Reference. For an example of how to dynamically change the metric collection configuration using the Observability Dashboards, see Change the Metric Configuration.

Monitoring Library 2.0 receives remote commands on the built-in ServiceRequest Topic. The Monitoring Library 2.0 DomainParticipant creates a DataReader for this Topic.

Note

You are not expected to use the built-in Topics directly in your applications. The builtin Topics are internal channels between Monitoring Library 2.0 and Collector Service.

To send remote commands, use the REST API (see REST API Reference). This API sends configuration commands to Collector Service, which forwards the commands to the appropriate Monitoring Library 2.0 instance.

To access the telemetry data, connect to the third-party backends where the data is stored by Collector Service. You can visualize the telemetry data through the reference Grafana dashboards (see Observability Dashboards).

5.1. Enabling Monitoring Library 2.0

Monitoring Library 2.0 is automatically enabled under the following conditions:

  • For Connext applications that link with the Connext shared libraries, as long as the rtimonitoring2 shared library is in the library search path. There is no need to link with rtimonitoring2.

  • For Connext applications that link with the Connext static libraries, including rtimonitoring2, as long as the application calls the API RTI_Monitoring_initialize.

By default, Monitoring Library 2.0 is not automatically enabled for Collector Service and some RTI Tools (Admin Console, DDS Spy, DDS Ping, and Monitor).

You can override the default behavior and control whether Monitoring Library 2.0 is enabled or disabled in two ways in order of precedence:

  • Environment variable: Set RTI_MONITORING2_ENABLE to true or false. This is the simplest way to enable or disable the library without modifying your application or its configuration.

  • QoS policy: Set participant_factory_qos.monitoring.enable to true or false using the MONITORING QosPolicy (DDS Extension). This can be done programmatically or via XML (see the example below).

The environment variable should not be changed at runtime. If you need to enable or disable Monitoring Library 2.0 at runtime, use the QoS policy instead.

Note

Enabling and disabling Monitoring Library 2.0 while DDS Entities are being created or deleted is not a safe operation. The entities created while Monitoring Library 2.0 is being enabled may not be monitored.

The following example shows how to explicitly disable Monitoring Library 2.0 via XML:

<qos_library name="MyQosLibrary">
    <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
        <participant_factory_qos>
            <!-- Explicitly disable monitoring -->
            <monitoring>
                <enable>false</enable>
            </monitoring>
        </participant_factory_qos>
    </qos_profile>
</qos_library>

The default configuration of Monitoring Library 2.0 uses the SHMEM and UDPv4 transports, along with the default Connext initial peers (builtin.udpv4://127.0.0.1, builtin.shmem://, builtin.udpv4://239.255.0.1) to connect to Collector Service.

For details on how to overwrite the initial peers, see Setting Collector Service initial peers.

For details on how to secure the monitoring data between your Connext application and Collector Service, see the Support for RTI Observability Framework section in the RTI Security Plugins User’s Manual.

If you want to connect to a Collector Service instance over the WAN, see Connecting to Collector Service Over WAN.

The following sections describe in detail the most common configuration options for Monitoring Library 2.0. For a complete list of configuration options, refer to the MONITORING QosPolicy (DDS Extension).

5.2. Setting Initial Metrics and Log Configuration

The initial set of metrics and logs collected and forwarded by Monitoring Library 2.0 is configured through the participant_factory_qos.monitoring.telemetry_data structure in the MONITORING QosPolicy (DDS Extension). This QoS policy can be configured programmatically or via XML. It is also changeable at runtime, allowing you to change the set of collected metrics and logs at runtime.

For details on how to set the resource_selection fields, see Resource Pattern Definitions. For details on how to set the enabled_metrics_selection and disabled_metrics_selection fields, see Metric Pattern Definitions.

The following XML snippet shows the default value of telemetry_data:

<telemetry_data>
    <metrics>
        <element>
            <resource_selection>//applications/*</resource_selection>
            <enabled_metrics_selection>
                <!-- Periodic metrics -->
                <element>dds_application_process_memory_usage_*</element>
            </enabled_metrics_selection>
        </element>
        <element>
            <resource_selection>//domain_participants/*</resource_selection>
            <enabled_metrics_selection>
                <!-- Periodic metrics -->
                <element>dds_domain_participant_udpv4_usage_in_net_pkts_*</element>
                <element>dds_domain_participant_udpv4_usage_in_net_bytes_*</element>
                <element>dds_domain_participant_udpv4_usage_out_net_pkts_*</element>
                <element>dds_domain_participant_udpv4_usage_out_net_bytes_*</element>
            </enabled_metrics_selection>
        </element>
        <element>
            <resource_selection>//topics/*</resource_selection>
            <enabled_metrics_selection>
                <!-- Event metrics -->
                <element>dds_topic_inconsistent_total</element>
            </enabled_metrics_selection>
        </element>
        <element>
            <resource_selection>//data_writers/*</resource_selection>
            <enabled_metrics_selection>
                <!-- Periodic metrics -->
                <element>dds_data_writer_reliable_cache_*</element>
                <!-- Event metrics -->
                <element>dds_data_writer_liveliness_lost_total</element>
                <element>dds_data_writer_deadline_missed_total</element>
                <element>dds_data_writer_incompatible_qos_total</element>
                <element>dds_data_writer_publication_matched_*</element>
                <element>dds_data_writer_reliable_reader_activity_*</element>
            </enabled_metrics_selection>
        </element>
        <element>
            <resource_selection>//data_readers/*</resource_selection>
            <enabled_metrics_selection>
                <!-- Periodic metrics -->
                <element>dds_data_reader_cache_*</element>
                <element>dds_data_reader_protocol_*</element>
                <!-- Event metrics -->
                <element>dds_data_reader_liveliness_*</element>
                <element>dds_data_reader_deadline_missed_total</element>
                <element>dds_data_reader_incompatible_qos_total</element>
                <element>dds_data_reader_sample_lost_total</element>
                <element>dds_data_reader_subscription_matched_*</element>
            </enabled_metrics_selection>
        </element>
    </metrics>
    <logs>
        <middleware_forwarding_level>WARNING</middleware_forwarding_level>
        <security_event_forwarding_level>WARNING</security_event_forwarding_level>
        <service_forwarding_level>WARNING</service_forwarding_level>
        <user_forwarding_level>WARNING</user_forwarding_level>
    </logs>
</telemetry_data>

Note

The default metrics do not include all the available metrics but only the metrics that are used for alerting in the reference Grafana dashboards (see Observability Dashboards). To enable all metrics see Enable all metrics.

Note

When you redefine <metrics> in XML, the entire value is replaced, not merged with the default. If you specify a custom <metrics> configuration, you must include all the metrics you want collected—any default metrics not explicitly listed will no longer be collected.

5.2.1. Enable all metrics

The following XML example shows how to configure Monitoring Library 2.0 parameters to collect all metrics. Because <telemetry_data> completely overwrites the default configuration (it is not merged), this example replaces the default per-resource metric selections with a single rule that enables every metric for every resource.

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <!-- Enable monitoring -->
                 <enable>true</enable>
                 <!-- Enable all metrics -->
                 <telemetry_data>
                     <metrics>
                         <element>
                             <resource_selection>//*</resource_selection>
                             <enabled_metrics_selection>
                                 <element>*</element>
                             </enabled_metrics_selection>
                         </element>
                     </metrics>
                 </telemetry_data>
                 <!-- Change the application name -->
                 <application_name>MyApplication</application_name>
                 <distribution_settings>
                     <dedicated_participant>
                         <!-- Change the Observability Domain ID -->
                         <domain_id>7</domain_id>
                         <!-- Change the initial peers of the
                              Observability DomainParticipant -->
                         <collector_initial_peers>
                             <element>192.168.1.2</element>
                         </collector_initial_peers>
                     </dedicated_participant>
                 </distribution_settings>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>

5.2.2. Enable a custom set of metrics

The following XML example shows how to configure Monitoring Library 2.0 parameters to collect a custom set of metrics. For a list of all available metrics, see Telemetry Data.

<qos_library name="MyQosLibrary">
    <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
        <participant_factory_qos>
            <monitoring>
                <enable>true</enable>
                <telemetry_data>
                    <metrics>
                        <element>
                            <!-- enable all application metrics -->
                            <resource_selection>/applications/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                        </element>
                        <element>
                            <!-- enable all domain_participant metrics -->
                            <resource_selection>//domain_participants/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                        </element>
                        <element>
                            <!-- enable all topic metrics -->
                            <resource_selection>//topics/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                        </element>
                        <element>
                            <!-- enable all data_writer metrics except those that end in "_bytes" -->
                            <resource_selection>//data_writers/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                            <disabled_metrics_selection>
                                <element>dds_data_writer_*_bytes</element>
                            </disabled_metrics_selection>
                        </element>
                        <element>
                            <!-- enable all data_reader metrics except those related to "protocol" -->
                            <resource_selection>//data_readers/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                            <disabled_metrics_selection>
                                <element>dds_data_reader_protocol_*</element>
                            </disabled_metrics_selection>
                        </element>
                    </metrics>
                    <logs>
                        <!-- set initial MIDDLEWARE forwarding level to ERROR -->
                        <middleware_forwarding_level>ERROR</middleware_forwarding_level>
                        <!-- set initial SECURITY_EVENT forwarding level to ERROR -->
                        <security_event_forwarding_level>ERROR</security_event_forwarding_level>
                        <!-- set initial SERVICE forwarding level to ERROR -->
                        <service_forwarding_level>ERROR</service_forwarding_level>
                        <!-- set initial USER forwarding level to ERROR -->
                        <user_forwarding_level>ERROR</user_forwarding_level>
                    </logs>
                </telemetry_data>
            </monitoring>
        </participant_factory_qos>
    </qos_profile>
</qos_library>

5.3. Configuring Distribution Settings

In a typical application, after enabling Monitoring Library 2.0, you can configure the following parameters in your XML configuration file:

  • A name for the application being monitored

  • The DDS domain ID to use for observability

  • The locator (address), as an initial_peer, of the Collector Service instance to which the telemetry data will be forwarded

5.3.1. Setting application name

To modify the application name used by Monitoring Library 2.0, use the participant_factory_qos.monitoring.application_name field. For example:

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <enable>true</enable>
                 <application_name>MyApplication</application_name>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>

Assigning an application name is important because it helps identify the resource that represents your Connext application. The resource identifier representing the application will be:

/applications/<application_name>

This is the resource identifier that will be used to send commands to this application from the Observability Dashboards.

The application_name should be unique across the Connext system; however, Monitoring Library 2.0 does not currently enforce uniqueness.

When application_name is not set, Monitoring Library 2.0 will automatically assign a resource identifier with this format:

/applications/<host_name:process_id:uuid>

5.3.2. Changing the default observability domain ID

To modify the domain used by Monitoring Library 2.0’s DomainParticipant to connect to Collector Service, use the participant_factory_qos.monitoring.distribution_settings.dedicated_participant.domain_id field. The default value is 101.

By default, Monitoring Library 2.0’s DomainParticipant also sets the domain tag RTI_o11y to isolate observability traffic from application traffic. Changing the domain tag is not recommended. However, if you need to change it, you can do so by configuring the <domain_participant_qos> as described in Configuring QoS for Entities.

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <enable>true</enable>
                 <distribution_settings>
                     <dedicated_participant>
                         <domain_id>7</domain_id>
                     </dedicated_participant>
                 </distribution_settings>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>

5.3.3. Setting Collector Service initial peers

To connect Monitoring Library 2.0 to Collector Service, configure the library with the locator/address of the Collector Service via the Monitoring Library 2.0’s DomainParticipant initial peers list. Set this list (usually just a single locator) using the participant_factory_qos.monitoring.distribution_settings.dedicated_participant.collector_initial_peers field in the Monitoring Library 2.0 XML QoS configuration. The locator/address of the collector service uses the same format as the DISCOVERY QosPolicy (DDS Extension) initial_peers field.

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <enable>true</enable>
                 <distribution_settings>
                     <dedicated_participant>
                         <collector_initial_peers>
                             <element>192.168.1.2</element>
                         </collector_initial_peers>
                     </dedicated_participant>
                 </distribution_settings>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>

If collector_initial_peers is not specified, or if it is explicitly set to an empty list, Monitoring Library 2.0 will use the value set in the domain_participant_qos.discovery.initial_peers of the QoS profile specified by participant_factory_qos.monitoring.distribution_settings.dedicated_participant.participant_qos_profile_name as the initial peers for the Monitoring Library 2.0’s DomainParticipant.

If neither collector_initial_peers nor domain_participant_qos.discovery.initial_peers is set, Monitoring Library 2.0 will use the default Connext initial peers (builtin.udpv4://127.0.0.1, builtin.shmem://, builtin.udpv4://239.255.0.1), unless overridden with the environment variable NDDS_DISCOVERY_PEERS.

If both collector_initial_peers and initial_peers are present, the value in collector_initial_peers in the Monitoring QosPolicy will be used instead of the value of initial_peers in the Discovery QosPolicy for the Monitoring Library 2.0’s DomainParticipant.

Warning

Monitoring Library 2.0 only supports connecting to a single Collector Service instance. Because the default initial peers include a multicast address, more than one Collector Service instance on the network may be discovered. If that happens, Monitoring Library 2.0 will print a warning: Multiple active Collector Services detected. To avoid this situation, set collector_initial_peers to the unicast address of the specific Collector Service instance you want to connect to.

5.4. Configuring QoS for Entities

You may want to change the QoS of the Entities responsible for distributing the monitoring data. By default, the DDS Entities created by Monitoring Library 2.0 use the built-in profile BuiltinQosLib::Generic.Monitoring2 (as documented in <install dir>/resource/resource/xml/BuiltinProfiles.documentationONLY.xml) to configure their QoS. You can provide a different profile name (MyObservabilityProfile in the example below) for each Entity by changing the Monitoring QoS Policy. If you provide a different profile name, you must create this profile to inherit from BuiltinQosLib::Generic.Monitoring2 or BuiltinQosLib::Generic.Monitoring2.WAN, depending on whether you want to connect to a Collector Service instance over a LAN or WAN.

The BuiltinQosLib::Generic.Monitoring2.WAN profile replaces the UDPv4 and SHMEM transports with the RTI Real-Time WAN Transport in the Monitoring Library 2.0 DomainParticipant.

The following example demonstrates the mechanism for changing the QoS for monitoring Entities by changing the name of the Entities.

<qos_library name="MyQosLibrary">
    <qos_profile name="MyObservabilityProfile" base_name="BuiltinQosLib::Generic.Monitoring2">
        <domain_participant_qos>
            <participant_name>
                <!-- Change the name of the Observability
                     DomainParticipant
                -->
                <name>Monitoring Participant</name>
            </participant_name>
        </domain_participant_qos>

        <datawriter_qos topic_filter="DCPSEventStatusMonitoring">
            <publication_name>
                <!-- Change the name of the Observability
                     Event DataWriter
                -->
                <name>Monitoring Event DataWriter</name>
            </publication_name>
        </datawriter>

        <datawriter_qos topic_filter="DCPSPeriodicStatusMonitoring">
            <publication_name>
                <!-- Change the name of the Observability
                     Periodic DataWriter
                -->
                <name>Monitoring Periodic DataWriter</name>
            </publication_name>
        </datawriter>

        <datawriter_qos topic_filter="DCPSLoggingStatusMonitoring">
            <publication_name>
                <!-- Change the name of the Observability
                     Logging DataWriter
                -->
                <name>Monitoring Logging DataWriter</name>
            </publication_name>
        </datawriter>
    </qos_profile>

    <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
        <participant_factory_qos>
            <monitoring>
                <enable>true</enable>
                <distribution_settings>
                    <dedicated_participant>
                        <!-- Change the configuration of the
                             Observability DomainParticipant -->
                        <participant_qos_profile_name>
                            MyQosLibrary::MyObservabilityProfile
                        </participant_qos_profile_name>
                    </dedicated_participant>
                    <!-- Change the configuration of the
                         Observability Publishers -->
                    <publisher_qos_profile_name>
                        MyQosLibrary::MyObservabilityProfile
                    </publisher_qos_profile_name>
                    <event_settings>
                        <!-- Change the configuration of the
                             Observability Event DataWriter -->
                        <datawriter_qos_profile_name>
                            MyQosLibrary::MyObservabilityProfile
                        </datawriter_qos_profile_name>
                    </event_settings>
                    <periodic_settings>
                        <!-- Change the configuration of the
                             Observability Periodic DataWriter -->
                        <datawriter_qos_profile_name>
                            MyQosLibrary::MyObservabilityProfile
                        </datawriter_qos_profile_name>
                    </periodic_settings>
                    <logging_settings>
                        <!-- Change the configuration of the
                             Observability Logging DataWriter -->
                        <datawriter_qos_profile_name>
                            MyQosLibrary::MyObservabilityProfile
                        </datawriter_qos_profile_name>
                    </logging_settings>
                </distribution_settings>
            </monitoring>
        </participant_factory_qos>
    </qos_profile>
</qos_library>

5.5. Connecting to Collector Service Over WAN

To connect to a Collector Service instance over a WAN, set participant_factory_qos.monitoring.distribution_settings.dedicated_participant.participant_qos_profile_name to the built-in profile BuiltinQosLib::Generic.Monitoring2.WAN. This profile is configured to use the the Real-Time WAN Transport, and it inherits from BuiltinQosLib::Generic.Monitoring2.

In addition, you must set participant_factory_qos.monitoring.distribution_settings.dedicated_participant.collector_initial_peers to the locator/address of the Collector Service instance you want to connect to.

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <enable>true</enable>
                 <distribution_settings>
                     <dedicated_participant>
                         <collector_initial_peers>
                             <element>udpv4_wan://38.21.45.34:4500</element>
                         </collector_initial_peers>
                         <participant_qos_profile_name>
                             BuiltinQosLib::Generic.Monitoring2.WAN
                         </participant_qos_profile_name>
                     </dedicated_participant>
                 </distribution_settings>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>