.. _section-what-is-observability: .. note:: *RTI® Connext® Observability Framework* is now considered production-ready. |OBSERVABILITY| includes example configuration files for use with several third-party components (Prometheus®, Grafana Loki™, Grafana®, NGINX®, and OpenTelemetry™ Collector). This release supports |CONNEXT| applications with new observability features. For support, you may contact support@rti.com. Feel confident deploying |OBSERVABILITY| components in production environments. What is Connext Observability Framework? **************************************** *RTI® Connext® Observability Framework* is a holistic solution that uses telemetry data to provide deep visibility into the current and past states of your |Connext| applications. This visibility makes it easier to proactively identify and resolve potential system issues, providing a higher level of confidence in the reliable operation of the system. |OBSERVABILITY| use cases include: - **Debugging**. Find the cause of an undesired behavior, or determine if the feature meets performance needs during development. - **CI/CD monitoring**. Assess the performance impact of code or configuration changes. - **Monitoring deployed applications**. Confirm that your systems are running as expected and proactively fix potential performance issues. Telemetry Data ============== Telemetry data can be generated at three different levels: - **Application**. Telemetry data generated when you instrument your own applications. - **Middleware**. Telemetry data generated by |Connext| DDS entities and infrastructure services. - **System**. DevOps telemetry such as CPU, memory, and disk I/O usage. In this release, |OBSERVABILITY| supports middleware telemetry (metrics and logs) and application logs. Future releases could support application metrics and system telemetry. Regardless of the level, telemetry data can be categorized as: - **Metrics**. Collections of application statistics that are analyzed to understand application behavior. There are two types of metrics: - Counters count the number of events of a specific type; for example, the number of ACK messages sent. - Gauges describe the state of some part of an application as a numeric value within a specified time frame; for example, the number of samples in a queue. - **Logs**. Events captured as text or structured data. - **Security Events**. Events related to securing a distributed system. - Notification of **Security Events** in |OBSERVABILITY| are communicated as **Logs** with a Syslog Facility of **SECURITY_EVENT**. See :ref:`section-telemetry-logs` for more information. - **Traces**. A representation of a series of causally-related events that encode the end-to-end flow of a piece of information in a software system. The traces in a distributed system are called *distributed traces*. In this release, |OBSERVABILITY| supports metrics, logs, and security events. Future releases could support traces. See :ref:`Telemetry Data` for more information. Distribution of Telemetry Data ============================== |OBSERVABILITY| enables you to scalably generate and forward telemetry data from individual |Connext| applications to third-party telemetry backends like Prometheus and Grafana Loki. For more information on the distribution of telemetry data see :ref:`section-library-component` and :ref:`section-collector-service-component`. Flexible Storage ================ |OBSERVABILITY| provides native integration with Prometheus as the time-series database to store |Connext| metrics and Grafana Loki as the log aggregation system to store |Connext| logs. Integration with other backends is possible through the use of `OpenTelemetry `_ and the `OpenTelemetry Collector `_. Visualization of Telemetry Data =============================== In this release, |OBSERVABILITY| provides a way to visualize the telemetry data collected from |Connext| applications using a set of Grafana dashboards. You can customize these dashboards or use them as an example to enhance and build dashboards in your preferred platform. The |DASHBOARDS| only work with the Prometheus and Grafana Loki backends. Future releases could support other backends. For more information, see :ref:`section-dashboards-component`. Control and Selection of Telemetry Data ======================================= Your distributed system components can produce a large amount of data, but not all of this data is required for problem detection. |OBSERVABILITY| enables you to control the amount of telemetry data that is generated, forwarded, and stored. You can manage these settings at run-time and via an initial configuration. See :ref:`section-setting-initial-metrics` for information on the initial configuration of telemetry collection. See :ref:`section-collector-service-rest-api-reference` for information on remote commands provided by the |OCA| to support changing the configuration of telemetry collection at run-time. See :ref:`section-change-verbosity` and :ref:`section-change-metric-configuration` for examples of how |OBSERVABILITY| provides the ability to change the configuration of telemetry collection at run-time. Security ======== |OBSERVABILITY| provides a way to secure the telemetry data generated by the |Connext| applications and stored in the telemetry backends. Data in transit is secured by using the |RTI_SP_PRODUCT| and BASIC-Auth over HTTPS. Data at rest is secured by the third-party telemetry backends. For more information see :ref:`section-security`.