8. Monitoring Library 2.0

RTI Monitoring Library 2.0 is one component of Connext Observability Framework. It allows collecting and distributing telemetry data (metrics and logs) associated with the resources created by a DDS application. These observable resources are DomainParticipants, Publishers, Subscribers, DataWriters, DataReaders, Topics, and applications (refer to Resources). The library also accepts remote commands to change the set of collected and forwarded telemetry data at runtime.

The data collected by Monitoring Library 2.0 is distributed to an Observability Collector Service instance. Observability Collector Service forwards the data to other Observability Collector Service instances, or stores it to a third-party observability backend such as Prometheus or Grafana Loki.

Monitoring Library 2.0 is a separate library (rtimonitoring2); applications can use it in three different modes:

  • Dynamically loaded: This is the default mode, which does not require linking with your application. The only requirement is that the rtimonitoring2 shared library must be in the library search path. The library is loaded when the monitoring library is enabled. See Enabling Monitoring Library 2.0.

  • Dynamic Linking: The application is linked with the rtimonitoring2 shared library. When the application runs, the rtimonitoring2 shared library must be in the library search path.

  • Static Linking: The application is linked with the rtimonitoring2 static library.

The last two modes (dynamic and static linking) are only supported in C and C++ and require calling the API RTI_Monitoring_initialize in your application before any other Connext APIs. This API is defined in the header file ndds/monitoring/monitoring_monitoringClass.h.

Regardless of the mode, to start monitoring your application, enable monitoring as described in Enabling Monitoring Library 2.0.

Monitoring Library 2.0 creates a dedicated Participant and uses three different built-in Topics to forward telemetry data to Observability Collector Service:

  • Periodic: A best-effort Topic for distributing periodic metric data (for example, dds_data_writer_protocol_pushed_samples_total). The data is sent periodically, with a configurable period.

  • Event: A reliable Topic for distributing event metric data (for example, dds_data_writer_liveliness_lost_total). The data is sent when it changes.

  • Logging: A reliable Topic for distributing log data. The data is sent when a log event occurs.

The library creates one DomainParticipant and three DataWriters, one for each Topic type (periodic, event, and logging). Each DataWriter is created within its own Publisher.

When Monitoring Library 2.0 is enabled for an application (participant_factory_qos.monitoring.enable is TRUE), every DDS Entity created by the application will be “registered” with the library as an observable resource. Monitoring Library 2.0 is able to monitor all DDS Entities across multiple DomainParticipants. You can select the telemetry data that you want collected and forwarded for an observable resource via an initial configuration, and/or change that data at runtime using remote commands. To set the initial configuration for the collection of metrics in Monitoring Library 2.0, see Setting the Initial Metrics and Log Configuration. To change metric collection configuration dynamically at runtime, use the REST API as described in Collector Service REST API Reference. For an example of how to dynamically change the metric collection configuration using the Observability Dashboards, see Change the Metric Configuration.

Monitoring Library 2.0 receives remote commands on the built-in ServiceRequest Topic. The Monitoring Library 2.0 DomainParticipant creates a DataReader for this Topic.

Note

You are not expected to use the built-in Topics directly in your applications. The builtin Topics are internal channels between Monitoring Library 2.0 and Observability Collector Service.

To send remote commands, use the REST API (see Collector Service REST API Reference). This API sends configuration commands to Observability Collector Service, which forwards the commands to the appropriate Monitoring Library 2.0 instance.

To access the telemetry data, connect to the third-party backends where the data is stored by Observability Collector Service. You can visualize the telemetry data through the reference Grafana dashboards (see Observability Dashboards).

8.1. Enabling Monitoring Library 2.0

To enable usage of Monitoring Library 2.0 and to configure its behavior, you have to use the MONITORING QosPolicy (DDS Extension) on the DomainParticipantFactory and set participant_factory_qos.monitoring.enable to true. This QoS policy can be configured programmatically or via XML. Next, there is an example that shows how to enable Monitoring Library 2.0 in your XML configuration file:

<qos_library name="MyQosLibrary">
    <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
        <participant_factory_qos>
            <!-- Enable monitoring -->
            <monitoring>
                <enable>true</enable>
            </monitoring>
        </participant_factory_qos>
    </qos_profile>
</qos_library>

In a typical application, after enabling Monitoring Library 2.0, you can also configure which metrics to collect from which resources; the DDS domain ID to use for observability; a name for the application being monitored; and the locator (address), as an initial_peer, of the Observability Collector Service instance to which the telemetry data will be forwarded. The following XML example shows how to configure these parameters:

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <!-- Enable monitoring -->
                 <enable>true</enable>
                 <!-- Enable all metrics -->
                 <telemetry_data>
                     <metrics>
                         <element>
                             <resource_selection>//*</resource_selection>
                             <enabled_metrics_selection>
                                 <element>*</element>
                             </enabled_metrics_selection>
                         </element>
                     </metrics>
                 </telemetry_data>
                 <!-- Change the application name -->
                 <application_name>MyApplication</application_name>
                 <distribution_settings>
                     <dedicated_participant>
                         <!-- Change the Observability Domain ID -->
                         <domain_id>7</domain_id>
                         <!-- Change the initial peers of the
                              Observability DomainParticipant -->
                         <collector_initial_peers>
                             <element>192.168.1.2</element>
                         </collector_initial_peers>
                     </dedicated_participant>
                 </distribution_settings>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>

Alternatively, you can use the snippet BuiltinQosSnippetLib::Feature.Monitoring2.Enable in your XML configuration file. This snippet enables Monitoring Library 2.0 and all metrics for collection and forwarding:

<qos_library name="MyQosLibrary">
    <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
        <base_name>
            <element>BuiltinQosSnippetLib::Feature.Monitoring2.Enable</element>
        </base_name>
        <participant_factory_qos>
            <monitoring>
                <application_name>MyApplication</application_name>
                <distribution_settings>
                    <dedicated_participant>
                        <!-- Change the Observability Domain ID -->
                        <domain_id>7</domain_id>
                        <!-- Change the initial peers of the
                             Observability DomainParticipant -->
                        <collector_initial_peers>
                            <element>192.168.1.2</element>
                        </collector_initial_peers>
                    </dedicated_participant>
                </distribution_settings>
            </monitoring>
        </participant_factory_qos>
    </qos_profile>
</qos_library>

The MONITORING QosPolicy (DDS Extension) is changeable at runtime. This means that you can enable or disable Monitoring Library 2.0 at runtime.

The following sections describe in detail the most common configuration options for Monitoring Library 2.0. For a complete list of configuration options, refer to the MONITORING QosPolicy (DDS Extension).

8.2. Setting the Initial Metrics and Log Configuration

By default all metric collection is disabled, and all log forwarding is set to level WARNING. To configure the initial behavior of telemetry data in Monitoring Library 2.0, you have to use the MONITORING QosPolicy (DDS Extension) on the DomainParticipantFactory and configure the participant_factory_qos.monitoring.telemetry_data structure. This QoS policy can be configured programmatically or via XML. For details on how to set the resource_selection fields, see Resource Pattern Definitions. For details on how to set the enabled_metrics_selection and disabled_metrics_selection fields, see Metric Pattern Definitions. The following example shows how to configure the initial metric and log collection and forwarding for Monitoring Library 2.0 in your XML configuration file:

<qos_library name="MyQosLibrary">
    <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
        <participant_factory_qos>
            <monitoring>
                <enable>true</enable>
                <telemetry_data>
                    <metrics>
                        <element>
                            <!-- enable all application metrics -->
                            <resource_selection>/applications/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                        </element>
                        <element>
                            <!-- enable all domain_participant metrics -->
                            <resource_selection>//domain_participants/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                        </element>
                        <element>
                            <!-- enable all topic metrics -->
                            <resource_selection>//topics/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                        </element>
                        <element>
                            <!-- enable all data_writer metrics except those that end in "_bytes" -->
                            <resource_selection>//data_writers/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                            <disabled_metrics_selection>
                                <element>dds_data_writer_*_bytes</element>
                            </disabled_metrics_selection>
                        </element>
                        <element>
                            <!-- enable all data_reader metrics except those related to "protocol" -->
                            <resource_selection>//data_readers/*</resource_selection>
                            <enabled_metrics_selection>
                                <element>*</element>
                            </enabled_metrics_selection>
                            <disabled_metrics_selection>
                                <element>dds_data_reader_protocol_*</element>
                            </disabled_metrics_selection>
                        </element>
                    </metrics>
                    <logs>
                        <!-- set initial MIDDLEWARE forwarding level to ERROR -->
                        <middleware_forwarding_level>ERROR</middleware_forwarding_level>
                        <!-- set initial SECURITY_EVENT forwarding level to ERROR -->
                        <security_event_forwarding_level>ERROR</security_event_forwarding_level>
                        <!-- set initial SERVICE forwarding level to ERROR -->
                        <service_forwarding_level>ERROR</service_forwarding_level>
                        <!-- set initial USER forwarding level to ERROR -->
                        <user_forwarding_level>ERROR</user_forwarding_level>
                    </logs>
                </telemetry_data>
            </monitoring>
        </participant_factory_qos>
    </qos_profile>
</qos_library>

8.3. Setting the Application Name

To modify the application name used by Monitoring Library 2.0, use the participant_factory_qos.monitoring.application_name field. For example:

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <enable>true</enable>
                 <application_name>MyApplication</application_name>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>

Assigning an application name is important because it helps identify the resource that represents your Connext application. The resource identifier representing the application will be:

/applications/<application_name>

This is the resource identifier that will be used to send commands to this application from the Observability Dashboards.

The application_name should be unique across the Connext system; however, Monitoring Library 2.0 does not currently enforce uniqueness.

When application_name is not set, Monitoring Library 2.0 will automatically assign a resource identifier with this format:

/applications/<host_name:process_id:uuid>

8.4. Changing the Default Observability Domain ID

To modify the domain used by Monitoring Library 2.0’s DomainParticipant to connect to Observability Collector Service, use the participant_factory_qos.monitoring.distribution_settings.dedicated_participant.domain_id field. The default value is 2.

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <enable>true</enable>
                 <distribution_settings>
                     <dedicated_participant>
                         <domain_id>7</domain_id>
                     </dedicated_participant>
                 </distribution_settings>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>

8.5. Configuring QoS for Monitoring Library 2.0 Entities

By default, the DDS entities created by Monitoring Library 2.0 use the built-in profile BuiltinQosLib::Generic.Monitoring2 (as documented in <install dir>/resource/resource/xml/BuiltinProfiles.documentationONLY.xml) to configure their QoS. You can provide a different profile name (MyObservabilityProfile in the example below) for each entity by changing the Monitoring QoS Policy. It is recommended that if you provide a different profile name, you create this profile to inherit from the BuiltinQosLib::Generic.Monitoring2 profile. For example:

<qos_library name="MyQosLibrary">
    <qos_profile name="MyObservabilityProfile" base_name="BuiltinQosLib::Generic.Monitoring2">
        <domain_participant_qos>
            <participant_name>
                <!-- Change the name of the Observability
                     DomainParticipant
                -->
                <name>Monitoring Participant</name>
            </participant_name>
        </domain_participant_qos>

        <datawriter_qos topic_filter="DCPSEventStatusMonitoring">
            <publication_name>
                <!-- Change the name of the Observability
                     Event DataWriter
                -->
                <name>Monitoring Event DataWriter</name>
            </publication_name>
        </datawriter>

        <datawriter_qos topic_filter="DCPSPeriodicStatusMonitoring">
            <publication_name>
                <!-- Change the name of the Observability
                     Periodic DataWriter
                -->
                <name>Monitoring Periodic DataWriter</name>
            </publication_name>
        </datawriter>

        <datawriter_qos topic_filter="DCPSLoggingStatusMonitoring">
            <publication_name>
                <!-- Change the name of the Observability
                     Logging DataWriter
                -->
                <name>Monitoring Logging DataWriter</name>
            </publication_name>
        </datawriter>
    </qos_profile>

    <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
        <participant_factory_qos>
            <monitoring>
                <enable>true</enable>
                <distribution_settings>
                    <dedicated_participant>
                        <!-- Change the configuration of the
                             Observability DomainParticipant -->
                        <participant_qos_profile_name>
                            MyQosLibrary::MyObservabilityProfile
                        </participant_qos_profile_name>
                    </dedicated_participant>
                    <!-- Change the configuration of the
                         Observability Publishers -->
                    <publisher_qos_profile_name>
                        MyQosLibrary::MyObservabilityProfile
                    </publisher_qos_profile_name>
                    <event_settings>
                        <!-- Change the configuration of the
                             Observability Event DataWriter -->
                        <datawriter_qos_profile_name>
                            MyQosLibrary::MyObservabilityProfile
                        </datawriter_qos_profile_name>
                    </event_settings>
                    <periodic_settings>
                        <!-- Change the configuration of the
                             Observability Periodic DataWriter -->
                        <datawriter_qos_profile_name>
                            MyQosLibrary::MyObservabilityProfile
                        </datawriter_qos_profile_name>
                    </periodic_settings>
                    <logging_settings>
                        <!-- Change the configuration of the
                             Observability Logging DataWriter -->
                        <datawriter_qos_profile_name>
                            MyQosLibrary::MyObservabilityProfile
                        </datawriter_qos_profile_name>
                    </logging_settings>
                </distribution_settings>
            </monitoring>
        </participant_factory_qos>
    </qos_profile>
</qos_library>

Note

The BuiltinQosLib::Generic.Monitoring2 profile disables the use of multicast discovery by setting the <multicast_receive_addresses/> element for the Monitoring Library 2.0’s DomainParticipant. Using multicast may lead to multiple Observability Collector Service instances receiving the same data. Your applications (that is, each instance of Monitoring Library 2.0), should configure the address (initial_peer) of the Observability Collector Service that they connect to explicitly as described in Setting Collector Service Initial Peers.

8.6. Setting Collector Service Initial Peers

To connect Monitoring Library 2.0 to Observability Collector Service, configure the library with the locator/address of the Observability Collector Service via the Monitoring Library 2.0’s DomainParticipant initial peers list. Set this list (usually just a single locator) using the participant_factory_qos.monitoring.distribution_settings.dedicated_participant.collector_initial_peers field in the Monitoring Library 2.0 XML QoS configuration. The locator/address of the collector service uses the same format as the DISCOVERY QosPolicy (DDS Extension) initial_peers field.

 <qos_library name="MyQosLibrary">
     <qos_profile name="MyApplicationProfile" is_default_participant_factory_profile="true">
         <participant_factory_qos>
             <monitoring>
                 <enable>true</enable>
                 <distribution_settings>
                     <dedicated_participant>
                         <collector_initial_peers>
                             <element>192.168.1.2</element>
                         </collector_initial_peers>
                     </dedicated_participant>
                 </distribution_settings>
             </monitoring>
         </participant_factory_qos>
     </qos_profile>
 </qos_library>

If collector_initial_peers is not specified, or if it is explicitly set to an empty list, Monitoring Library 2.0 will use the value set in the domain_participant_qos.discovery.initial_peers of the QoS profile specified by participant_factory_qos.monitoring.distribution_settings.dedicated_participant.participant_qos_profile_name as the initial peers for the Monitoring Library 2.0’s DomainParticipant.

If both values are present, the value in collector_initial_peers in the Monitoring QosPolicy will be used instead of the value of initial_peers in the Discovery QosPolicy for the Monitoring Library 2.0’s DomainParticipant.