7.1. What is Telemetry Data
Telemetry data provides insight into the internal state and behavior of Connext applications. Connext Observability Framework enables you to instrument your Connext applications to generate and forward this data, which is then collected, aggregated, and stored in third-party observability backends such as Prometheus (metrics) or Grafana Loki (logs). The data is also made available to the RTI Admin Console for remote debugging.
For operational monitoring, you can then visualize metrics and logs (including security events) by using RTI’s Grafana Observability Dashboards, or your own custom Grafana dashboards, to get a holistic view of your distributed system.
For remote debugging, you can use the RTI Admin Console to remotely debug Connext systems where it is impractical or impossible for Admin Console to directly participate in the distributed system - for example, when a distributed system is running over a wide-area network (WAN).
7.1.1. Levels
Telemetry data can be generated at three different levels:
Application. Telemetry data generated when you instrument your own applications.
Middleware. Telemetry data generated by Connext DDS entities and infrastructure services.
System. DevOps telemetry such as CPU, memory, and disk I/O usage.
7.1.2. Categories
Regardless of the level, telemetry data can be categorized as:
Observables. Data that can be observed and analyzed to understand the behavior of a distributed system. Observables can be categorized as:
Metrics. Collections of application-measured statistics that are analyzed to understand application behavior. The metric data generated meet industry standards and are intended for use with third-party backends like servers compatible with Prometheus or OpenTelemetry. There are two types of metrics:
Counters count the number of events of a specific type; for example, the number of ACK messages sent.
Gauges describe the state of some part of an application as a numeric value within a specified time frame; for example, the number of samples in a queue.
Non-Metric Data. A type of observable that may be more complex data structures that are not easily represented as a single numeric value. These observables could include strings, structures, and numeric values that may not represent a measured value. Examples of non-metric data include process IDs, entity names, type specifications, and QoS policies. Currently, the only consumer of these non-metric observables is Admin Console when used in remote mode. For more information on how to enable this feature, see Remote Debugging in the RTI Admin Console User’s Manual.
Logs. Events captured as text or structured data.
Security Events. Events related to securing a distributed system.
Notification of Security Events in Observability Framework are communicated as Logs with a Syslog Facility of SECURITY_EVENT. See Logs for more information.
Traces. A representation of a series of causally-related events that encode the end-to-end flow of a piece of information in a software system. The traces in a distributed system are called distributed traces.
Important
In this release, Observability Framework only supports middleware telemetry (observables, logs, and security events) and application logs.