1. About Connext Observability Framework
RTI® Connext® Observability Framework is a holistic solution that uses telemetry data to provide deep visibility into the current and past states of your Connext applications. This visibility makes it easier to proactively identify and resolve potential system issues, providing a higher level of confidence in the reliable operation of the system.
Observability Framework is part of Connext Professional.
1.1. Use Cases
Real-Time Remote Debugging. The framework delivers real-time access to telemetry data generated by Connext applications, including logs, Quality of Service (QoS) configurations, entity status changes (for example, liveliness or dropped samples), and discovery events. Developers can quickly correlate system behavior with this data to identify the precise cause and context of anomalies and failures. Debugging data is typically distributed within milliseconds and is not stored long-term, as it is primarily used for immediate troubleshooting in RTI Admin Console.
Operational Monitoring. The framework enables continuous system health and performance monitoring through metrics (for example, throughput), logs, and security events. It integrates seamlessly with observability backends and dashboard tools like Prometheus® and Grafana®, offering intuitive visualizations and proactive alerting to maintain operational excellence. Operational monitoring data is typically generated every few seconds to minutes and stored in time-series databases or log aggregators for long-term analysis.
1.2. Components
The Observability Framework consists of three RTI components:
Monitoring Library 2.0 instruments Connext applications to collect and forward telemetry data.
Collector Service aggregates telemetry data from multiple applications and either forwards it to backends for storage, makes it available to RTI Admin Console for real-time debugging, or relays it to other Collector Service instances for broader distribution.
Observability Dashboards A set of hierarchical Grafana dashboards that display alerts when a problem occurs and provides visualizations to help perform root cause analysis. The dashboards get the telemetry data from a Prometheus server and the logs from a Grafana Loki server.
1.3. Telemetry Data
Telemetry data provides insight into the internal state and behavior of Connext applications. Telemetry data can be generated at the application, middleware, and system levels. Regardless of the level, telemetry data can be categorized as observables (such as metrics), logs, security events, or traces.
Important
In this release, Observability Framework only supports middleware telemetry (observables, logs, and security events) and application logs.
For details, see the Telemetry Data chapter.
1.4. How Observability Framework Works
The RTI components that make up the framework work together to collect and distribute Connext application telemetry data.
1.4.1. Distribution of Telemetry Data
Each Connext application is instrumented with Monitoring Library 2.0, which collects telemetry data and sends it to a Collector Service instance. This service can forward the data directly to telemetry backends and monitoring tools, or relay it to other Collector Service instances for further distribution.
For details on telemetry data distribution, see the Monitoring Library 2.0 and Collector Service chapters.
1.4.2. Telemetry Backends
Collector Service provides native integration with Prometheus, as the time-series database that stores Connext metrics, and Grafana Loki®, as the log aggregation system that stores Connext logs and security events. Integration with other backends is possible using OpenTelemetry and the OpenTelemetry Collector.
For more information, see the Collector Service chapter.
1.4.3. Remote Debugging
Collector Service can be configured to provide telemetry data to RTI Admin Console via a WebSocket API. This configuration enables real-time debugging of Connext systems, even when they are running across different networks.
For more information, see Remote Debugging in the RTI Admin Console User’s Manual.
1.4.4. Control and Selection of Telemetry Data
Distributed system components can generate large volumes of telemetry data, but not all of it is necessary for effective problem detection. Observability Framework enables you to control the amount of telemetry data that is generated, forwarded, and stored. You can manage these settings at run-time and via an initial configuration.
For metrics, logs, and security events sent to third-party backends, you can manage the data flow using the Collector Service REST API.
For telemetry data sent to Admin Console, data is collected and forwarded only when Admin Console is actively consuming it.
1.4.5. Security
The Observability Framework provides mechanisms to secure the telemetry data generated by Connext applications and forwarded to telemetry backends or Admin Console.
Data in transit is secured using the RTI Security Plugins and Basic Authentication over HTTPS.
Data at rest is secured by the respective third-party telemetry backends.
For more details, see the Security chapter.