11.3. Monitoring Distribution Platform

Monitoring refers to the distribution of health status information metrics from instrumented RTI services. This section describes the architecture of the monitoring capability supported in RTI Routing Service and RTI Recording Service. You will learn what type of information these application can provide and how to access it.

RTI services provide monitoring information through a Distribution Topic, which is a DDS Topic responsible for distributing information with certain characteristics about the service resources. An RTI service provides monitoring information through the following three distribution topics:

  • ConfigDistributionTopic: Distributes metrics related to the description and configuration of a Resource. This information may be immutable or change rarely.
  • EventDistributionTopic: Distributes metrics related to Resource status notifications of asynchronous nature. This information is provided asynchronously when Resources change after the occurrence of an event.
  • PeriodicDistributionTopic: Distribute metrics related to periodic, sampling-based updates of a Resource. Information is provided periodically at a configurable publication period.

These three Topics are shared across all services for the distribution of the monitoring information. Table 11.6 provides a summary of these topics.

Table 11.6 Monitoring Distribution Topics
Topic Name Top-level Type Name
ConfigDistributionTopic rti/service/monitoring/config rti::service::monitoring::Config
EventDistributionTopic rti/service/monitoring/event rti::service::monitoring::Event
PeriodicDistributionTopic rti/service/monitoring/periodic rti::service::monitoring::Periodic

Figure 11.5 shows the mapping of the monitoring information into the distribution Topics. A distribution Topic is keyed on service resources categorized as keyed Resources. These are resources whose related monitoring information is provided as an instance on the distribution Topic.

../_images/CommonMonitoringDistributionTopics.png

Figure 11.5 Monitoring Distribution Topics of RTI Services

11.3.1. Distribution Topic Definition

All distribution Topics have a common type structure that is composed of two parts: a base type that identifies a resource object and a resource-specific type that contains actual status monitoring information.

The definition of a distribution Topic is shown in Figure 11.6.

../_images/CommonMonDistTopicDefinition.png

Figure 11.6 Monitoring Distribution Topic Definition

Keyed Resource Base Type Fields

This is the base type of all distribution Topics and consists of two fields:

  • object_guid: Key field. It represents a 16-byte sequence that uniquely identifies a Keyed Resource across all the available services in the monitoring domain. Hence, the associated instance handle key hash will be the same for all distribution Topics, allowing easy correlation of a resource. It will also facilitate, as we will discuss later, easy instance data manipulation in a DataReader.
  • parent_guid: It contains the object GUID of the parent resource. This field will be set to all zeros if the object is a top-level resource thus with no parent.

This base type, KeyedResource, is defined in [NDDSHOME]/resource/idl/ServiceCommon.idl.

Resource-Specific Type Fields

This is the type that conveys monitoring information for a concrete resource object. Since a distribution Topic is responsible for providing information about different resource classes, the resource-specific type consists of a single field that is a Union of all the possible representations for the keyed resources that provide that on the topic.

As expected, there must be consistency between the two parts of the distribution topic type. That is, a sample for a concrete resource object must contain the resource-specific union discriminator corresponding to the resource object’s class.

11.3.1.1. Example: Monitoring of Generic Application

Assume a generic application that provides monitoring information about the modes of transports Car, Boat and Plane. Each mode is mapped to a keyed resource, each with a custom type that contains metrics specific to each class.

The monitoring distribution Topic top-level type, TransportModeDistribution, would be defined as follows, using IDL v4 notation:

#include "ServiceCommon.idl"

@nested
struct CarType {
    float speed;
    String color;
    String plate_number;
};

@nested
struct BoatType {
    float knots;
    float latitude;
    float longitude;
};

@nested
struct PlaneType {
    float ground_speed;
    int32 air_track;
};

enum TransportModeKind {
    CAR_TRANSPORT_MODE,
    BOAT_TRANSPORT_MODE,
    PLANE_TRANSPORT_MODE
};

@nested
union TransportModeUnion switch (TransportModeKind) {
    case CAR_TRANSPORT_MODE:
    CarType car;

    case BOAT_TRANSPORT_MODE:
    BoatType boat;

    case PLANE_TRANSPORT_MODE:
    PlaneType plane;
}

struct TransportModeDistribution : KeyedResource {
    TransportModeUnion value;

};

Assume now that in the monitoring domain there are three resource objects, one for each resource class: a Car object ‘CarA’, a Boat object ‘Boat1’, and a Plane object ‘PlaneX’. They all have unique resource GUIDs and each object represents an instance in the distribution Topic. The table shows the example of potential sample values:

Table 11.7 Samples in TransportModeDistribution Topic
  CarA Boat1 PlaneX
object_guid 0x0C 0xAB 0xf2
parent_guid 0x00 0x00 0x00
value discriminator CAR_TRANSPORT_MODE BOAT_TRANSPORT_MODE PLANE_TRANSPORT_MODE

11.3.2. DDS Entities

RTI services allow you to distribute monitoring information in any domain. For that, they create the following DDS entities:

  • A DomainParticipant on the monitoring domain.
  • A single Publisher for all DataWriters.
  • A DataWriter for each distribution Topic.

A service will create these entities with default QoS or otherwise the corresponding service user’s manual will specify the actual values. Services allow you to customize the QoS of the DDS entities, typically in the service monitoring configuration under the <monitoring> tag. You will need to refer to each service’s user’s manual.

11.3.3. Monitoring Metrics Publication

How services publish monitoring samples depends on the distribution Topic.

11.3.3.1. Configuration Distribution Topic

There are two events that cause the publication of samples in this topic:

  • As soon as a Resource object is created. This event generates the first sample in the Topic for the resource object just created. Since these first samples are published as resources are created, it is guaranteed to be in hierarchical order; that is, the sample for a parent Resource is published before its children. When Resources are created depends on the service. Typically, Resources are created on service startup. Other cases include manual creation (e.g., through remote administration) or external event-driven creation (e.g., discovery of matching streams, in the case of AutoRoute in Routing Service).
  • On Resource object update. This event occurs when the properties of the object change due to a set or update operation (e.g., through remote administration).

11.3.3.2. Event Distribution Topic

Services publish samples in this Topic in reaction to an internal event, such as a Resource state change. Which events and their associated information and when they occur is highly dependent on concrete service implementations.

11.3.3.3. Periodic Distribution Topic

Samples in this Topic are published periodically, according to a fixed configurable period. The metrics provided in this Topic are generated in two different ways:

  • As a snapshot of the current value, taken at the publication time (e.g., current number of matching DataReaders). This represents a simple case and the metric is typically represented with an adequate primitive member.
  • As a statistic variable generated from a set of discreet measurements, obtained periodically. This represents a continous flow of metrics, represented with the StatisticVariable type (see Section 11.3.4.1).

There are two activities involved in the generation of the statistic variables: Calculation and Publication. All the configuration elements for these activities are available under the <monitoring> tag.

11.3.3.3.1. Calculation

The instrumented service periodically performs measurements on the metric. This activity is also known as sampling (don’t confuse with data samples). The frequency of the measurements can be configured with the tag <statistics_sampling_period>. As a general recommendation, the sampling period should be a few times smaller than the publication period. A small sampling period provides more accurate statistics generation at the expense of increasing memory and CPU consumption.

11.3.3.3.2. Publication

The service periodically publishes a data sample containing a snapshot of the statistics generated during the calculation phase. The publication period can be configured with the tag <statistics_publication_period>.The value of a statistic variable corresponds to the time window of a publication period.

11.3.4. Monitoring Metrics Reference

This section describes the types used as common metrics across services. All the type definitions listed here are in [NDDSHOME]/resource/idl/ServiceCommon.idl.

11.3.4.1. Statistic Variable

Listing 11.3 Statistics
            @appendable @nested
            struct StatisticMetrics {
                int64 period_ms;
                int64 count;
                float mean;
                float minimum;
                float maximum;
                float std_dev;
            };            

            @appendable @nested
            struct StatisticVariable {
                StatisticMetrics publication_period_metrics;
            };

Table 11.8 StatisticMetrics
Field Name Description
period_ms Period in milliseconds at which the metrics are published.
count Sum of all the measurement values obtained during the publication period.
mean Arithmetic mean of all the measurement values during publication period.
For aggregated metrics, this value is the mean of all the aggregated metrics means.
min Minimum of all the measurement values during publication period.
For aggregated metrics, this value is the minimum of all the aggregated metrics minimums.
max Maximum of all the measurement values during publication period.
For aggregated metrics, this value is the maximum of all the aggregated metrics minimums.
std_dev Standard deviation of all the measurement values during publication period.
For aggregated metrics, this value is the standard deviation of all the aggregated metrics minimums.

11.3.4.2. Host Metrics

Listing 11.4 Host Types
            @appendable @nested
            struct HostPeriodic {
                @optional StatisticVariable cpu_usage_percentage;
                @optional StatisticVariable free_memory_kb;                   
                @optional StatisticVariable free_swap_memory_kb;                    
                int32 uptime_sec;
            };            
           
            @appendable @nested
            struct HostConfig {
                BoundedString name;
                uint32 id;
                int64 total_memory_kb;
                int64 total_swap_memory_kb;
            };

Table 11.9 HostConfig
Field Name Description
name Name of the host where the service is running.
id ID of the host where the service is running.
total_memory_kb Total memory in KiloBytes of the host where the service is running.
Availability of this value is platform dependent.
total_swap_memory_kb Total swap memory in KiloBytes of the host where the service is running.
Availability of this value is platform dependent.

Table 11.10 HostPeriodic
Field Name Description
cpu_usage_percentage Statistic variable that provides the global percentage of CPU usage on the host where the service is running.
Availability of this value is platform dependent.
free_memory_kb Statistic variable that provides the amount of free memory in KiloBytes of the host where the service is running.
Availability of this value is platform dependent.
free_wap_memory_kb Statistic variable that provides the amount of free swap memory in KiloBytes of the host where the service is running.
Availability of this value is platform dependent.
uptime_sec Time in seconds elapsed since the host on which the running service started.
Availability of this value is platform dependent.

11.3.4.3. Process Metrics

Listing 11.5 Process Types
            @appendable @nested
            struct ProcessConfig {                   
                uint64 id;
            };  
            @mutable @nested
            struct ProcessPeriodic {
                @optional StatisticVariable cpu_usage_percentage;
                @optional StatisticVariable physical_memory_kb;
                @optional StatisticVariable total_memory_kb;     
                int32 uptime_sec;
            };                       
            
Table 11.11 ProcessConfig
Field Name Description
id Identifies the process where the service is running. The meaning of this value is platform dependent.

Table 11.12 ProcessPeriodic
Field Name Description
cpu_usage_percentage Statistic variable that provides the percentage of CPU usage of the process where the service is running.
The field count of the variable contains the total CPU time that the processor spent during the publication period.
Availability of this value is platform dependent.
physical_memory_kb Statistic variable that provides the physical memory utilization in KiloBytes of the process where the service is running.
Availability of this value is platform dependent.
total_memory_kb Statistic variable that provides the virtual memory utilization in KiloBytes of the process where the service is running.
Availability of this value is platform dependent.
uptime_sec Time in seconds elapsed since the running service process started.
Availability of this value is platform dependent.

11.3.4.4. Base Entity Resource Metrics

Listing 11.6 Base Entity Types
            @mutable @nested
            struct EntityConfig {
                ResourceId resource_id;
                XmlString configuration;
            };
            @mutable @nested
            struct EntityEvent{
                EntityStateKind state;
            };

Table 11.13 EntityConfig
Field Name Description
resource_id String representation of the resource identifier associated with the entity resource.
configuration String representation of the XML configuration of the entity resource. The XML contains only children elements that are not entity resources.

Table 11.14 EntityEvent
Field Name Description
state State of the resource entity expressed as an enumeration of type EntityStateKind.

11.3.4.5. Network Performance Metrics

Listing 11.7 Network Performance Type
            @appendable @nested
            struct NetworkPerformance {
                @optional StatisticVariable samples_per_sec;
                @optional StatisticVariable bytes_per_sec;
                @optional StatisticVariable latency_millisec;
            };

Table 11.15 NetworkPerformance
Field Name Description
samples_per_sec Statistic variable that provides information about the number of samples processed (received or sent) per second.
bytes_per_sec Statistic variable that provides information about the number of bytes processed (received or sent) per second.
latency_millisec Statistic variable that provides information about the latency in milliseconds for the data processed.
The latency in a refers to the total time elapsed during the associated processing of the data, which depends on the type of application.