1. What is Connext Observability Framework?

RTI® Connext® Observability Framework is a holistic solution that uses telemetry data to provide deep visibility into the current and past states of your Connext applications. This visibility makes it easier to proactively identify and resolve potential system issues, providing a higher level of confidence in the reliable operation of the system.

Observability Framework use cases include:

  • Debugging. Find the cause of an undesired behavior, or determine if the feature meets performance needs during development.

  • CI/CD monitoring. Assess the performance impact of code or configuration changes.

  • Monitoring deployed applications. Confirm that your systems are running as expected and proactively fix potential performance issues.

Important

Observability Framework is an experimental product that includes example configuration files for use with several third-party components (Prometheus®, Grafana Loki™, and Grafana®, NGINX®, and OpenTelemetry™ Collector). This release is an evaluation distribution; use it to explore the new observability features that support Connext applications. For support, you may contact support@rti.com.

Do not deploy any Observability Framework components in production. A production-ready version is expected to be available in a future Connext 7.3.x maintenance release.

1.1. Telemetry Data

Telemetry data can be generated at three different levels:

  • Application. Telemetry data generated when you instrument your own applications.

  • Middleware. Telemetry data generated by Connext DDS entities and infrastructure services.

  • System. DevOps telemetry such as CPU, memory, and disk I/O usage.

In this release, Observability Framework supports middleware telemetry (metrics and logs) and application logs. Future releases could support application metrics and system telemetry.

Regardless of the level, telemetry data can be categorized as:

  • Metrics. Collections of application statistics that are analyzed to understand application behavior. There are two types of metrics:

    • Counters count the number of events of a specific type; for example, the number of ACK messages sent.

    • Gauges describe the state of some part of an application as a numeric value within a specified time frame; for example, the number of samples in a queue.

  • Logs. Events captured as text or structured data.

  • Security Events. Events related to securing a distributed system.

    • Notification of Security Events in Observability Framework are communicated as Logs with a Syslog Facility of SECURITY_EVENT. See Logs for more information.

  • Traces. A representation of a series of causally-related events that encode the end-to-end flow of a piece of information in a software system. The traces in a distributed system are called distributed traces.

In this release, Observability Framework supports metrics, logs, and security events. Future releases could support traces. See Telemetry Data for more information.

1.2. Distribution of Telemetry Data

Observability Framework enables you to scalably generate and forward telemetry data from individual Connext applications to third-party telemetry backends like Prometheus and Grafana Loki. For more information on the distribution of telemetry data see Monitoring Library 2.0 and Observability Collector Service.

1.3. Flexible Storage

Observability Framework provides native integration with Prometheus as the time-series database to store Connext metrics and Grafana Loki as the log aggregation system to store Connext logs. Integration with other backends is possible through the use of OpenTelemetry and the OpenTelemetry Collector.

1.4. Visualization of Telemetry Data

In this release, Observability Framework provides a way to visualize the telemetry data collected from Connext applications using a set of Grafana dashboards. You can customize these dashboards or use them as an example to enhance and build dashboards in your preferred platform.

The Observability Dashboards only work with the Prometheus and Grafana Loki backends. Future releases could support other backends. For more information, see Observability Dashboards.

1.5. Control and Selection of Telemetry Data

Your distributed system components can produce a large amount of data, but not all of this data is required for problem detection. Observability Framework enables you to control the amount of telemetry data that is generated, forwarded, and stored. You can manage these settings at run-time and via an initial configuration.

See Setting the Initial Metrics and Log Configuration for information on the initial configuration of telemetry collection. See Collector Service REST API Reference for information on remote commands provided by the Observability Collector Service to support changing the configuration of telemetry collection at run-time. See Change the Application Logging Verbosity and Change the Metric Configuration for examples of how Observability Framework provides the ability to change the configuration of telemetry collection at run-time.

1.6. Security

Observability Framework provides a way to secure the telemetry data generated by the Connext applications and stored in the telemetry backends. Data in transit is secured by using the Security Plugins and BASIC-Auth over HTTPS. Data at rest is secured by the third-party telemetry backends. For more information see Security.