RTI Connext Observability Framework
7.7.0
1. Introduction to Connext Observability Framework
1.1. Use Cases
1.2. How Observability Framework Works
1.2.1. Distribution of Telemetry Data
1.2.2. Telemetry Backends
1.2.3. Remote Debugging
1.2.4. Control and Selection of Telemetry Data
1.2.5. Security
1.3. Components
1.3.1. Monitoring Library 2.0
1.3.2. Collector Service
1.3.3. Observability Dashboards
1.4. Telemetry Data
2. Deployments
2.1. Before You Begin
2.2. Evaluation Deployment
2.3. Production Deployments
2.3.1. Single Collector Service instance
2.3.2. Single layer of Collector Service instances
2.3.3. Multiple layers of Collector Service instances
2.3.4. Multiple layers of Collector Service instances with OpenTelemetry Collector
3. Installation
3.1. Monitoring Library 2.0
3.2. Collector Service
4. Usage
4.1. Observability Framework for Production
4.1.1. Monitoring Library 2.0
4.1.2. Collector Service
4.2. Observability Framework for Evaluation
4.2.1. Components used for evaluation
4.2.2. Defining the JSON configuration file
4.2.3. Using the Observability script
4.2.3.1. Create a Docker workspace
4.2.3.2. Initialize and run Docker containers
4.2.3.3. Stop Docker containers
4.2.3.4. Start existing Docker containers
4.2.3.5. Stop and remove Docker containers
4.2.4. Configuring the Docker workspace
4.2.5. Configuring Grafana
4.2.5.1. Initial login
4.2.5.2. Configuration options
4.2.5.3. Create accounts (optional)
4.2.5.4. Change the default time range (optional)
4.2.6. Removing the Observability Framework Docker workspace
5. Monitoring Library 2.0
5.1. Enabling Monitoring Library 2.0
5.2. Setting Initial Metrics and Log Configuration
5.2.1. Enable all metrics
5.2.2. Enable a custom set of metrics
5.3. Configuring Distribution Settings
5.3.1. Setting application name
5.3.2. Changing the default observability domain ID
5.3.3. Setting Collector Service initial peers
5.4. Configuring QoS for Entities
5.5. Connecting to Collector Service Over WAN
6. Collector Service
6.1. Installation
6.1.1. HTTPS/WSS support
6.2. Usage
6.2.1. Executable
6.2.1.1. Starting Collector Service executable
6.2.1.2. Stopping Collector Service executable
6.2.1.3. Collector Service executable command-line parameters
6.2.2. Docker image
6.2.2.1. Starting Collector Service Docker image
6.2.2.2. Stopping Collector Service Docker image
6.2.3. Observability Framework evaluation
6.3. Configuration
6.3.1. Builtin configuration profiles
6.3.2. Configuration parameters
6.3.2.1. General parameters
6.3.2.2. WAN parameters
6.3.2.3. OpenTelemetry parameters
6.3.2.4. Prometheus parameters
6.3.2.5. Grafana Loki parameters
6.3.2.6. Security parameters
6.4. REST API Reference
6.4.1. Definitions
6.4.2. Root endpoint (base URL)
6.4.3. API overview
6.4.4. API reference
7. Observability Dashboards
7.1. System Status Dashboards
7.1.1. System Status Dashboard Common Elements
7.1.2. Alert Home Dashboard
7.1.3. Alert Category Dashboards
7.2. Entity List Dashboards
7.3. Entity Status List Dashboards
7.4. Entity Status Dashboards
7.5. Log Dashboards
7.5.1. Log Dashboard
7.5.2. Entity Log Dashboards
7.6. Control Dashboards
7.6.1. Log Control Dashboard
7.6.2. Metric Control Dashboards
7.6.2.1. Single Entity Metric Control Dashboards
7.6.2.2. Multiple Entity Metric Control Dashboards
8. Security
8.1. Secure Communication between Connext Applications and Collector Service
8.2. Secure Communication with Collector Service HTTP Servers
8.2.1. Secure Collector Service HTTP servers (evaluation deployment)
8.2.2. Secure Collector Service HTTP servers (production deployment)
8.2.2.1. Collector Service Docker image
8.2.2.2. Collector Service executable
8.3. Secure Communication with Third-Party Component HTTP Servers
8.3.1. Secure third-party component HTTP servers (evaluation deployment)
8.3.2. Secure third-party component HTTP servers (production deployment)
8.3.2.1. Collector Service Docker image
8.3.2.2. Collector Service executable
8.4. Generating the Observability Framework Security Artifacts
8.4.1. Generating DDS security artifacts
8.4.2. Generating HTTPS security artifacts
8.4.2.1. Preliminary steps
8.4.2.2. Generating a new root CA
8.4.2.3. Generating server certificates
8.4.2.4. BASIC-Auth password file
9. Telemetry Data
9.1. What is Telemetry Data
9.1.1. Levels
9.1.2. Categories
9.2. Resources
9.2.1. Resource Pattern Definitions
9.3. Metrics
9.3.1. Metric Pattern Definitions
9.3.2. Application Metrics
9.3.3. Participant Metrics
9.3.4. Topic Metrics
9.3.5. DataWriter Metrics
9.3.6. DataReader Metrics
9.3.7. Derived Metrics Generated by Prometheus Recording Rules
9.3.7.1. DDS Entity Proxy Metrics
9.3.7.2. Raw Error Metrics
9.3.7.3. Aggregated Error Metrics
9.3.7.4. Enable a Raw Error Metric
9.3.7.5. Custom Error Metrics
9.4. Non-Metric Observables
9.4.1. Application Observables
9.4.2. Participant Observables
9.4.3. Type Observables
9.4.4. Topic Observables
9.4.5. Publisher Observables
9.4.6. DataWriter Observables
9.4.7. Subscriber Observables
9.4.8. DataReader Observables
9.5. Logs
9.5.1. Syslog Levels and Facilities
9.5.2. Activity Context
9.5.3. Log Labels
9.5.4. Collection and Forwarding Verbosity
9.5.4.1. Changing Verbosity Levels Locally
9.5.4.2. Changing Verbosity Levels Remotely
10. Tutorial
10.1. About the Observability Example
10.1.1. Applications
10.1.2. Data Model
10.1.3. DDS Entity Mapping
10.1.4. Command-Line Parameters
10.1.4.1. Publishing Application
10.1.4.2. Subscribing Application
10.2. Before Running the Example
10.2.1. Set Up Environment Variables
10.2.2. Compile the Example
10.2.2.1. Non-Windows Systems
10.2.2.2. Windows Systems
10.2.3. Install Observability Framework
10.2.3.1. Configure Observability Framework for the Appropriate Operation Mode
10.2.4. Start the Collection, Storage, and Visualization Docker Containers
10.3. Running the Example
10.3.1. Start the Applications
10.3.2. Changing the Time Range in Dashboards
10.3.3. Simulate Sensor Failure
10.3.4. Simulate Slow Sensor Data Consumption
10.3.5. Simulate Time Synchronization Failures
10.3.6. Change the Application Logging Verbosity
10.3.7. Change the Metric Configuration
10.3.7.1. Resources used in this example
10.3.7.2. Changing metrics collected for a single DataWriter
10.3.7.3. Changing metrics collected for all DataWriters of an application
10.3.8. Remote Debugging with Admin Console
10.3.9. Close the Applications
11. Troubleshooting Observability Framework
11.1. Docker Container[s] Failed to Start
11.1.1. Check for port conflicts
11.1.2. Check that you have the correct file permissions
11.2. No Data in Dashboards
11.2.1. Check that Collector Service has discovered your applications
11.2.2. Check that Prometheus can access Collector Service
11.2.3. Check that Grafana can access Prometheus
11.2.4. Check that Grafana can access Loki
11.3. Can Collector Service run in Windows or macOS?
12. Glossary
13. Release Notes
13.1. Supported Platforms
13.2. Compatibility
13.3. Supported Docker Compose Environments
13.4. Supported Docker Environments for Collector Service
13.5. What’s New in 7.7.0
13.5.1. Improved discovery and reduced overhead with Collector Service
13.5.2. Support for Multicast Discovery
13.5.3. Reduced default metrics set in Monitoring Library 2.0, minimizing bandwidth utilization
13.5.4. Improved flexibility in security configuration for Collector Service
13.5.5. Improved isolation for observability traffic
13.5.6. New standalone Collector Service executable simplifies Observability Framework deployments
13.5.7. Monitoring Library 2.0 now enabled by default under specific conditions, making it easier to use remote debugging
13.5.8. Disallow unrealistic values of polling_period, to avoid errors collecting data for Observability Framework
13.5.9. Improved security configuration provides more robust protection for Collector Service
13.5.10. Collector Service 7.7.0 works with Monitoring Library 2.0 in release 7.3.x to improve interoperability
13.5.11. Flexible control over Monitoring Library 2.0 enablement using new environment variable
13.5.12. Monitoring Library 2.0 security enforced when using an RTI security plugin, elevating security standards
13.5.13. Third-Party Software Changes
13.5.13.1. Observability Collector Service
13.5.13.2. Docker containers for Observability Collector Service
13.6. What’s Fixed in 7.7.0
13.6.1.
[Critical]
Collector Service could crash if application periodic data received before application event *
13.6.2.
[Major]
Collector Service did not delete application event queues from WebSocket event handler when application removed *
13.6.3.
[Minor]
Collector Service could not use builtin QoS profiles when configured to skip default files
13.6.4.
[Minor]
RTI Collector Service sent samples that should have been discarded to the endpoints *
13.6.5. Vulnerabilities
13.6.5.1.
[Critical]
Potential crash in Collector Service when processing a malicious Set-Cookie header in an HTTP response
13.6.5.2.
[Critical]
Potential stack buffer overflow in Collector Service when parsing malicious XML
13.6.5.3.
[Critical]
Potential unauthorized local file system read in Collector Service when parsing a malicious XML configuration document
13.7. Previous Releases
13.7.1. What’s New in 7.6.0
13.7.1.1. Implemented scalable remote system debugging
13.7.1.2. More flexibility when configuring Collector Service for remote debugging
13.7.1.3. New built-in profile makes it easier to monitor Connext applications across geographically separated networks
13.7.1.4. Decreased bandwidth usage when not actively debugging remote systems
13.7.1.5. Third-Party Software Changes
13.7.2. What’s Fixed in 7.6.0
13.7.2.1.
[Critical]
Potential crash when removing unready resources
13.7.2.2.
[Critical]
Possible memory leak due to error in processing child resources
13.7.2.3.
[Critical]
Unbounded memory growth when using remote administration commands in Observability Framework
13.7.2.4.
[Major]
Query condition errors on discovering more than four applications simultaneously
13.7.2.5.
[Major]
Potential timeout on commands addressed to more than one application
13.7.2.6.
[Minor]
RTI Collector Service sent samples that should have been discarded to the endpoints
13.7.3. What’s New in 7.5.0
13.7.3.1. Visualize discovery data from applications running on remote systems (experimental)
13.7.4. What’s Fixed in 7.5.0
13.7.4.1. Crashes
13.7.4.2. Other
13.7.5. What’s Fixed in 7.4.0
13.7.5.1. Hangs
13.7.5.2. Other
13.7.5.3. Vulnerabilities
13.8. Known Issues
13.8.1. Connext applications may crash when using multiple language bindings
Copyrights and Notices
RTI Connext Observability Framework
9.
Telemetry Data
9.
Telemetry Data
9.1. What is Telemetry Data
9.1.1. Levels
9.1.2. Categories
9.2. Resources
9.2.1. Resource Pattern Definitions
9.3. Metrics
9.3.1. Metric Pattern Definitions
9.3.2. Application Metrics
9.3.3. Participant Metrics
9.3.4. Topic Metrics
9.3.5. DataWriter Metrics
9.3.6. DataReader Metrics
9.3.7. Derived Metrics Generated by Prometheus Recording Rules
9.3.7.1. DDS Entity Proxy Metrics
9.3.7.2. Raw Error Metrics
9.3.7.3. Aggregated Error Metrics
9.3.7.4. Enable a Raw Error Metric
9.3.7.5. Custom Error Metrics
9.4. Non-Metric Observables
9.4.1. Application Observables
9.4.2. Participant Observables
9.4.3. Type Observables
9.4.4. Topic Observables
9.4.5. Publisher Observables
9.4.6. DataWriter Observables
9.4.7. Subscriber Observables
9.4.8. DataReader Observables
9.5. Logs
9.5.1. Syslog Levels and Facilities
9.5.2. Activity Context
9.5.3. Log Labels
9.5.4. Collection and Forwarding Verbosity
9.5.4.1. Changing Verbosity Levels Locally
9.5.4.2. Changing Verbosity Levels Remotely