.. _section-what-is-observability:

.. note::

   *RTI® Connext® Observability Framework* is now considered production-ready.

   |OBSERVABILITY| includes example configuration files for 
   use with several third-party components (Prometheus®, Grafana Loki™, 
   Grafana®, NGINX®, and OpenTelemetry™ Collector). This release 
   supports |CONNEXT| applications with new observability features. For 
   support, you may contact support@rti.com.

   Feel confident deploying |OBSERVABILITY| components in 
   production environments.

What is Connext Observability Framework?
****************************************

*RTI® Connext® Observability Framework* is a holistic solution that uses 
telemetry data to provide deep visibility into the current and past states of 
your |Connext| applications. This visibility makes it easier to proactively 
identify and resolve potential system issues, providing a higher
level of confidence in the reliable operation of the system.

|OBSERVABILITY| use cases include:

-  **Debugging**. Find the cause of an undesired behavior, or determine if
   the feature meets performance needs during development.

-  **CI/CD monitoring**. Assess the performance impact of code or
   configuration changes.

-  **Monitoring deployed applications**. Confirm that your systems are
   running as expected and proactively fix potential performance issues.


Telemetry Data
==============

Telemetry data can be generated at three different levels:

-  **Application**. Telemetry data generated when you instrument your own
   applications.

-  **Middleware**. Telemetry data generated by |Connext| DDS entities and
   infrastructure services.

-  **System**. DevOps telemetry such as CPU, memory, and disk I/O usage.

In this release, |OBSERVABILITY| supports middleware telemetry (metrics
and logs) and application logs. Future releases could support application
metrics and system telemetry.

Regardless of the level, telemetry data can be categorized as:

-  **Metrics**. Collections of application statistics that are analyzed to
   understand application behavior. There are two types of metrics:

   -  Counters count the number of events of a specific type; for
      example, the number of ACK messages sent.

   -  Gauges describe the state of some part of an application as a
      numeric value within a specified time frame; for example, the
      number of samples in a queue.

-  **Logs**. Events captured as text or structured data.

-  **Security Events**. Events related to securing a distributed system.

   - Notification of **Security Events** in |OBSERVABILITY| are communicated
     as **Logs** with a Syslog Facility of **SECURITY_EVENT**. See
     :ref:`section-telemetry-logs` for more information.

-  **Traces**. A representation of a series of causally-related events that
   encode the end-to-end flow of a piece of information in a software
   system. The traces in a distributed system are called *distributed
   traces*.

In this release, |OBSERVABILITY| supports metrics, logs, and security events.
Future releases could support traces.  See :ref:`Telemetry Data<section-telemetry-intro>` for
more information.

Distribution of Telemetry Data
==============================

|OBSERVABILITY| enables you to scalably generate and
forward telemetry data from individual |Connext| applications to 
third-party telemetry backends like Prometheus and Grafana Loki. For more
information on the distribution of telemetry data see :ref:`section-library-component`
and :ref:`section-collector-service-component`.

Flexible Storage
================

|OBSERVABILITY| provides native integration with Prometheus as the
time-series database to store |Connext| metrics and Grafana Loki as the
log aggregation system to store |Connext| logs. Integration with other
backends is possible through the use of `OpenTelemetry 
<https://opentelemetry.io/>`_ and the `OpenTelemetry Collector 
<https://opentelemetry.io/docs/collector/>`_.

Visualization of Telemetry Data
===============================

In this release, |OBSERVABILITY| provides a way to visualize
the telemetry data collected from |Connext| applications using a set of
Grafana dashboards. You can customize these dashboards
or use them as an example to enhance and build dashboards in your
preferred platform.

The |DASHBOARDS| only work with the Prometheus and Grafana Loki
backends. Future releases could support other backends. For more
information, see :ref:`section-dashboards-component`.

Control and Selection of Telemetry Data
=======================================

Your distributed system components can produce a large amount of data,
but not all of this data is required for problem detection. 
|OBSERVABILITY| enables you to control the amount of telemetry 
data that is generated, forwarded, and stored. You can manage these 
settings at run-time and via an initial configuration.

See :ref:`section-setting-initial-metrics` for information on the initial
configuration of telemetry collection. See :ref:`section-collector-service-rest-api-reference`
for information on remote commands provided by the |OCA| to support changing
the configuration of telemetry collection at run-time. See
:ref:`section-change-verbosity` and :ref:`section-change-metric-configuration`
for examples of how |OBSERVABILITY| provides the ability to change the
configuration of telemetry collection at run-time.

Security
========

|OBSERVABILITY| provides a way to secure the telemetry data generated
by the |Connext| applications and stored in the telemetry backends. Data in 
transit is secured by using the |RTI_SP_PRODUCT| and BASIC-Auth over HTTPS. 
Data at rest is secured by the third-party telemetry backends. For more
information see :ref:`section-security`.