.. _section-what-is-observability: About Connext Observability Framework ************************************* |RTI_TM_CONNEXT_TM| |OBSERVABILITY| is a holistic solution that uses telemetry data to provide deep visibility into the current and past states of your |CONNEXT| applications. This visibility makes it easier to proactively identify and resolve potential system issues, providing a higher level of confidence in the reliable operation of the system. |OBSERVABILITY| is part of |PRO|. .. _section-use-cases: Use Cases ========= **Real-Time Remote Debugging**. The framework delivers real-time access to telemetry data generated by |CONNEXT| applications, including logs, Quality of Service (QoS) configurations, entity status changes (for example, liveliness or dropped samples), and discovery events. Developers can quickly correlate system behavior with this data to identify the precise cause and context of anomalies and failures. Debugging data is typically distributed within milliseconds and is not stored long-term, as it is primarily used for immediate troubleshooting in :link_tools_admin_console:`RTI Admin Console `. **Operational Monitoring**. The framework enables continuous system health and performance monitoring through metrics (for example, throughput), logs, and security events. It integrates seamlessly with observability backends and dashboard tools like Prometheus® and Grafana®, offering intuitive visualizations and proactive alerting to maintain operational excellence. Operational monitoring data is typically generated every few seconds to minutes and stored in time-series databases or log aggregators for long-term analysis. Components ========== The |OBSERVABILITY| consists of three RTI components: * :ref:`section-monitoring-library-2` instruments |CONNEXT| applications to collect and forward telemetry data. * :ref:`section-collector-service-component` aggregates telemetry data from multiple applications and either forwards it to backends for storage, makes it available to *RTI* |ADMINCONSOLE| for real-time debugging, or relays it to other |OCS| instances for broader distribution. * :ref:`section-observability-dashboards` A set of hierarchical Grafana dashboards that display alerts when a problem occurs and provides visualizations to help perform root cause analysis. The dashboards get the telemetry data from a Prometheus server and the logs from a Grafana Loki server. Telemetry Data ============== Telemetry data provides insight into the internal state and behavior of |CONNEXT| applications. Telemetry data can be generated at the application, middleware, and system levels. Regardless of the level, telemetry data can be categorized as observables (such as metrics), logs, security events, or traces. .. Important:: In this release, |OBSERVABILITY| only supports middleware telemetry (observables, logs, and security events) and application logs. For details, see the :ref:`section-telemetry-index` chapter. How Observability Framework Works ================================= The RTI components that make up the framework work together to collect and distribute |CONNEXT| application telemetry data. .. figure:: static/how_framework_works.png :alt: How Observability Works :name: Figure - How Observability Works :align: center :figWidth: 100% Distribution of Telemetry Data ------------------------------ Each |CONNEXT| application is instrumented with |MONITORINGLIBRARY2|, which collects telemetry data and sends it to a |OCS| instance. This service can forward the data directly to telemetry backends and monitoring tools, or relay it to other |OCS| instances for further distribution. For details on telemetry data distribution, see the :ref:`section-monitoring-library-2` and :ref:`section-collector-service-component` chapters. Telemetry Backends ------------------ |OCS| provides native integration with `Prometheus `_, as the time-series database that stores |CONNEXT| metrics, and `Grafana Loki® `_, as the log aggregation system that stores |CONNEXT| logs and security events. Integration with other backends is possible using `OpenTelemetry `_ and the `OpenTelemetry Collector `_. For more information, see the :ref:`section-collector-service-component` chapter. Remote Debugging ---------------- |OCS| can be configured to provide telemetry data to :link_tools_admin_console:`RTI Admin Console ` via a WebSocket API. This configuration enables real-time debugging of |CONNEXT| systems, even when they are running across different networks. For more information, see :link_tools_admin_console:`Remote Debugging ` in the *RTI* |ADMINCONSOLE_UM|. Control and Selection of Telemetry Data --------------------------------------- Distributed system components can generate large volumes of telemetry data, but not all of it is necessary for effective problem detection. |OBSERVABILITY| enables you to control the amount of telemetry data that is generated, forwarded, and stored. You can manage these settings at run-time and via an initial configuration. For metrics, logs, and security events sent to third-party backends, you can manage the data flow using the :ref:`Collector Service REST API `. For telemetry data sent to |ADMINCONSOLE|, data is collected and forwarded only when |ADMINCONSOLE| is actively consuming it. Security -------- The |OBSERVABILITY| provides mechanisms to secure the telemetry data generated by |CONNEXT| applications and forwarded to telemetry backends or |ADMINCONSOLE|. * Data in transit is secured using the :link_connext_dds_secure_um:`RTI Security Plugins ` and Basic Authentication over HTTPS. * Data at rest is secured by the respective third-party telemetry backends. For more details, see the :ref:`section-security` chapter.