1. What is Connext Observability Framework?

RTI® Connext® Observability Framework™ is a holistic solution that uses telemetry data to provide deep visibility into the current and past states of your Connext applications. This visibility makes it easier to proactively identify and resolve potential system issues, providing a higher level of confidence in the reliable operation of the system.

Observability Framework use cases include:

  • Debugging. Find the cause of an undesired behavior, or determine if the feature meets performance needs during development.

  • CI/CD monitoring. Assess the performance impact of code or configuration changes.

  • Monitoring deployed applications. Confirm that your systems are running as expected and proactively fix potential performance issues.

Important

Observability Framework is an experimental product that includes example configuration files for use with several third-party components (Prometheus®, Grafana Loki™, and Grafana®, NGINX®, and OpenTelemetry™ Collector). This release is an evaluation distribution; use it to explore the new observability features that support Connext applications. For support, you may contact support@rti.com.

Do not deploy any Observability Framework components in production.

1.1. Telemetry Data

Telemetry data can be generated at three different levels:

  • Application. Telemetry data generated when you instrument your own applications.

  • Middleware. Telemetry data generated by Connext DDS entities and infrastructure services.

  • System. DevOps telemetry such as CPU, memory, and disk I/O usage.

In this release, Observability Framework supports middleware telemetry. Future releases could support application and system telemetry.

Regardless of the level, telemetry data can be categorized as:

  • Metrics. Collections of application statistics that are analyzed to understand application behavior. There are two types of metrics:

    • Counters count the number of events of a specific type; for example, the number of ACK messages sent.

    • Gauges describe the state of some part of an application as a numeric value within a specified time frame; for example, the number of samples in a queue.

  • Logs. Events captured as text or structured data.

  • Security Events. Events related to securing a distributed system.

  • Traces. A representation of a series of causally-related events that encode the end-to-end flow of a piece of information in a software system. The traces in a distributed system are called distributed traces.

In this release, Observability Framework only supports metrics and logs. Future releases could support security events and traces.

1.2. Distribution of Telemetry Data

Observability Framework enables you to scalably generate and forward telemetry data from individual Connext applications to third-party telemetry backends like Prometheus and Grafana Loki.

1.3. Flexible Storage

Observability Framework provides native integration with Prometheus as the time-series database to store Connext metrics and Grafana Loki as the log aggregation system to store Connext logs. Integration with other backends is possible through the use of OpenTelemetry and the OpenTelemetry Collector.

1.4. Visualization of Telemetry Data

In this release, Observability Framework provides a way to visualize the telemetry data collected from Connext applications using a set of reference Grafana dashboards. You can customize these dashboards or use them as an example to enhance and build dashboards in your preferred platform.

The reference dashboards only work with the Prometheus and Grafana Loki backends. Future releases could support other backends.

1.5. Control and Selection of Telemetry Data

Your distributed system components can produce a large amount of data, but not all of this data is required for problem detection. Observability Framework enables you to control the amount of telemetry data that is generated, forwarded, and stored. You can manage these settings at run-time and via an initial configuration.

This release of Observability Framework provides a way to control the amount of logs that are generated and forwarded by Connext applications. Future releases could support control of metrics.

1.6. Security

Observability Framework provides a way to secure the telemetry data generated by the Connext applications and stored in the telemetry backends. Data in transit is secured by using the RTI Security Plugins and BASIC-Auth over HTTPS. Data at rest is secured by the third-party telemetry backends.