.. include:: ../../../router.1.0/srcDoc/vars.rst .. _section-Common-Mon: Monitoring Distribution Platform ******************************** *Monitoring* refers to the distribution of health status information metrics from instrumented |RTI_SERVICE|\s. This section describes the architecture of the *monitoring* capability supported in |RTI_RS| and |RTI| *Recording Service*. You will learn what type of information these application can provide and how to access it. |RTI_SERVICE|\s provide monitoring information through a *Distribution Topic*, which is a DDS |TOPIC| responsible for distributing information with certain characteristics about the service resources. An |RTI_SERVICE| provides monitoring information through the following **three distribution topics**: - *ConfigDistributionTopic*: Distributes metrics related to the description and configuration of a Resource. This information may be immutable or change rarely. - *EventDistributionTopic*: Distributes metrics related to Resource status notifications of asynchronous nature. This information is provided asynchronously when Resources change after the occurrence of an event. - *PeriodicDistributionTopic*: Distribute metrics related to periodic, sampling-based updates of a Resource. Information is provided periodically at a configurable publication period. These three |TOPIC|\s are shared across all services for the distribution of the monitoring information. :numref:`TableMonitoringDistTopics` provides a summary of these topics. .. list-table:: Monitoring Distribution |TOPIC|\s :name: TableMonitoringDistTopics :widths: 30 40 30 :header-rows: 1 * - Topic - Name - Top-level Type Name * - *ConfigDistributionTopic* - rti/service/monitoring/config - :litrep:`rti::service::monitoring::Config` * - *EventDistributionTopic* - rti/service/monitoring/event - :litrep:`rti::service::monitoring::Event` * - *PeriodicDistributionTopic* - rti/service/monitoring/periodic - :litrep:`rti::service::monitoring::Periodic` :numref:`FigureMonDistributionTopics` shows the mapping of the monitoring information into the distribution |TOPIC|\s. A distribution |TOPIC| **is keyed** on service resources categorized as *keyed Resources*. These are resources whose related monitoring information is provided as an instance on the distribution |TOPIC|. .. figure:: ../../../router.1.0/srcDoc/static/CommonMonitoringDistributionTopics.svg :figwidth: 70 % :name: FigureMonDistributionTopics :align: center Monitoring Distribution |TOPIC|\s of |RTI| Services .. _section-Common-Mon-Topic-Definition: Distribution Topic Definition ============================= All distribution |TOPIC|\s have a common type structure that is composed of two parts: a base type that identifies a resource object and a resource-specific type that contains actual status monitoring information. The definition of a distribution |TOPIC| is shown in :numref:`FigureMonDistTopicDefinition`. .. figure:: ../../../router.1.0/srcDoc/static/CommonMonDistTopicDefinition.svg :name: FigureMonDistTopicDefinition :align: center Monitoring Distribution |TOPIC| Definition .. rubric:: Keyed Resource Base Type Fields This is the base type of all distribution |TOPIC|\s and consists of two fields: - ``object_guid``: Key field. It represents a 16-byte sequence that uniquely identifies a *Keyed Resource* across all the available services in the monitoring domain. Hence, the associated instance handle key hash will be the same for all distribution |TOPIC|\s, allowing easy correlation of a resource. It will also facilitate, as we will discuss later, easy instance data manipulation in a |DR|. - ``parent_guid``: It contains the object GUID of the parent resource. This field will be set to all zeros if the object is a top-level resource thus with no parent. This base type, ``KeyedResource``, is defined in ``[NDDSHOME]/resource/idl/ServiceCommon.idl``. .. rubric:: Resource-Specific Type Fields This is the type that conveys monitoring information for a concrete resource object. Since a distribution |TOPIC| is responsible for providing information about different resource classes, the resource-specific type consists of a single field that is a **Union of all the possible representations** for the keyed resources that provide that on the topic. As expected, there must be consistency between the two parts of the distribution topic type. That is, a sample for a concrete resource object must contain the resource-specific union discriminator corresponding to the resource object's class. Example: Monitoring of Generic Application ------------------------------------------ Assume a generic application that provides monitoring information about the modes of transports ``Car``, ``Boat`` and ``Plane``. Each mode is mapped to a keyed resource, each with a custom type that contains metrics specific to each class. The monitoring distribution |TOPIC| top-level type, ``TransportModeDistribution``, would be defined as follows, using IDL v4 notation: .. code-block:: idl #include "ServiceCommon.idl" @nested struct CarType { float speed; String color; String plate_number; }; @nested struct BoatType { float knots; float latitude; float longitude; }; @nested struct PlaneType { float ground_speed; int32 air_track; }; enum TransportModeKind { CAR_TRANSPORT_MODE, BOAT_TRANSPORT_MODE, PLANE_TRANSPORT_MODE }; @nested union TransportModeUnion switch (TransportModeKind) { case CAR_TRANSPORT_MODE: CarType car; case BOAT_TRANSPORT_MODE: BoatType boat; case PLANE_TRANSPORT_MODE: PlaneType plane; } struct TransportModeDistribution : KeyedResource { TransportModeUnion value; }; Assume now that in the monitoring domain there are three resource objects, one for each resource class: a ``Car`` object 'CarA', a ``Boat`` object 'Boat1', and a ``Plane`` object 'PlaneX'. They all have unique resource GUIDs and each object represents an instance in the distribution |TOPIC|. The table shows the example of potential sample values: .. list-table:: Samples in TransportModeDistribution |TOPIC| :name: TableExampleTransportModeSamples :widths: 20 20 30 30 :header-rows: 1 * - - CarA - Boat1 - PlaneX * - ``object_guid`` - 0x0C - 0xAB - 0xf2 * - ``parent_guid`` - 0x00 - 0x00 - 0x00 * - ``value`` discriminator - CAR_TRANSPORT_MODE - BOAT_TRANSPORT_MODE - PLANE_TRANSPORT_MODE .. _section-Common-Mon-DdsEntities: DDS Entities ============ |RTI_SERVICE|\s allow you to distribute monitoring information in any domain. For that, they create the following DDS entities: - A |DP| on the monitoring domain. - A single |PUB| for all |DW|\s. - A |DW| for each distribution |TOPIC|. A service will create these entities with default QoS or otherwise the corresponding service user's manual will specify the actual values. Services allow you to customize the QoS of the DDS entities, typically in the service monitoring configuration under the ```` tag. You will need to refer to each service's user's manual. .. _section-Common-Mon-Publication: Monitoring Metrics Publication ============================== How services publish monitoring samples depends on the distribution |TOPIC|. .. _section-Common-Mon-Publication-Config: Configuration Distribution Topic -------------------------------- There are two events that cause the publication of samples in this topic: - As soon as a *Resource* object is created. This event generates the first sample in the |TOPIC| for the resource object just created. Since these first samples are published as resources are created, it is guaranteed to be in hierarchical order; that is, the sample for a parent *Resource* is published before its children. When *Resources* are created depends on the service. Typically, *Resources* are created on service startup. Other cases include manual creation (e.g., through remote administration) or external event-driven creation (e.g., discovery of matching streams, in the case of |AR| in |RS|). - On *Resource* object update. This event occurs when the properties of the object change due to a set or update operation (e.g., through remote administration). Event Distribution Topic ------------------------ Services publish samples in this |TOPIC| in reaction to an internal event, such as a *Resource* state change. Which events and their associated information and when they occur is highly dependent on concrete service implementations. Periodic Distribution Topic --------------------------- Samples in this |TOPIC| are published periodically, according to a fixed configurable period. The metrics provided in this |TOPIC| are generated in two different ways: - As a snapshot of the current value, taken at the publication time (e.g., current number of matching |DRs|). This represents a simple case and the metric is typically represented with an adequate primitive member. - As a *statistic variable* generated from a set of discreet measurements, obtained periodically. This represents a *continous* flow of metrics, represented with the ``StatisticVariable`` type (see :numref:`section-Common-Mon-Metrics-StatVar`). There are two activities involved in the generation of the statistic variables: Calculation and Publication. All the configuration elements for these activities are available under the ```` tag. Calculation ^^^^^^^^^^^ The instrumented service periodically performs measurements on the metric. This activity is also known as *sampling* (don't confuse with data samples). The frequency of the measurements can be configured with the tag ````. As a general recommendation, the sampling period should be a few times smaller than the publication period. A small sampling period provides more accurate statistics generation at the expense of increasing memory and CPU consumption. Publication ^^^^^^^^^^^ The service periodically publishes a data sample containing a snapshot of the statistics generated during the calculation phase. The publication period can be configured with the tag ````.The value of a statistic variable corresponds to the time window of a publication period. Monitoring Metrics Reference ============================ This section describes the types used as common metrics across services. All the type definitions listed here are in ``[NDDSHOME]/resource/idl/ServiceCommon.idl``. .. _section-Common-Mon-Metrics-StatVar: Statistic Variable ------------------ .. literalinclude:: ../../resource/idl/ServiceCommon.idl :caption: Statistics :start-after: /* Statistics */ :end-before: /* CountStatus */ .. list-table:: ``StatisticMetrics`` :name: TableCommonStatVar :widths: 20 80 :header-rows: 1 * - Field Name - Description * - :litrep:`period_ms` - Period in milliseconds at which the metrics are published. * - :litrep:`count` - Sum of all the measurement values obtained during the publication period. * - :litrep:`mean` - Arithmetic mean of all the measurement values during publication period. |br| For aggregated metrics, this value is the mean of all the aggregated metrics means. * - :litrep:`min` - Minimum of all the measurement values during publication period. |br| For aggregated metrics, this value is the minimum of all the aggregated metrics minimums. * - :litrep:`max` - Maximum of all the measurement values during publication period. |br| For aggregated metrics, this value is the maximum of all the aggregated metrics minimums. * - :litrep:`std_dev` - Standard deviation of all the measurement values during publication period. |br| For aggregated metrics, this value is the standard deviation of all the aggregated metrics minimums. .... Host Metrics ------------ .. literalinclude:: ../../resource/idl/ServiceCommon.idl :caption: Host Types :start-after: /* Host */ :end-before: /* NetworkPerformance */ .. list-table:: ``HostConfig`` :name: TableCommonHostConfig :widths: 20 80 :header-rows: 1 * - Field Name - Description * - :litrep:`name` - Name of the host where the service is running. * - :litrep:`id` - ID of the host where the service is running. * - :litrep:`total_memory_kb` - Total memory in KiloBytes of the host where the service is running. |br| Availability of this value is platform dependent. * - :litrep:`total_swap_memory_kb` - Total swap memory in KiloBytes of the host where the service is running. |br| Availability of this value is platform dependent. .... .. list-table:: ``HostPeriodic`` :name: TableCommonHostPeriodic :widths: 20 80 :header-rows: 1 * - Field Name - Description * - :litrep:`cpu_usage_percentage` - Statistic variable that provides the global percentage of CPU usage on the host where the service is running. |br| Availability of this value is platform dependent. * - :litrep:`free_memory_kb` - Statistic variable that provides the amount of free memory in KiloBytes of the host where the service is running. |br| Availability of this value is platform dependent. * - :litrep:`free_wap_memory_kb` - Statistic variable that provides the amount of free swap memory in KiloBytes of the host where the service is running. |br| Availability of this value is platform dependent. * - :litrep:`uptime_sec` - Time in seconds elapsed since the host on which the running service started. |br| Availability of this value is platform dependent. Process Metrics --------------- .. literalinclude:: ../../resource/idl/ServiceCommon.idl :caption: Process Types :start-after: /* Process */ :end-before: /* Host */ .. list-table:: ``ProcessConfig`` :name: TableCommonProcessConfig :widths: 20 80 :header-rows: 1 * - Field Name - Description * - :litrep:`id` - Identifies the process where the service is running. The meaning of this value is platform dependent. .... .. list-table:: ``ProcessPeriodic`` :name: TableCommonProcessPeriodic :widths: 20 80 :header-rows: 1 * - Field Name - Description * - :litrep:`cpu_usage_percentage` - Statistic variable that provides the percentage of CPU usage of the process where the service is running. |br| The field ``count`` of the variable contains the total CPU time that the processor spent during the publication period. |br| Availability of this value is platform dependent. * - :litrep:`physical_memory_kb` - Statistic variable that provides the physical memory utilization in KiloBytes of the process where the service is running. |br| Availability of this value is platform dependent. * - :litrep:`total_memory_kb` - Statistic variable that provides the virtual memory utilization in KiloBytes of the process where the service is running. |br| Availability of this value is platform dependent. * - :litrep:`uptime_sec` - Time in seconds elapsed since the running service process started. |br| Availability of this value is platform dependent. .. _section-Common-Mon-Metrics-Entity: Base Entity Resource Metrics ----------------------------- .. literalinclude:: ../../resource/idl/ServiceCommon.idl :caption: Base Entity Types :start-after: /* Entity */ :end-before: /* DistributionTopicKind */ .. list-table:: ``EntityConfig`` :name: TableCommonEntityConfig :widths: 20 80 :header-rows: 1 * - Field Name - Description * - :litrep:`resource_id` - String representation of the resource identifier associated with the entity resource. * - :litrep:`configuration` - String representation of the XML configuration of the entity resource. The XML contains only children elements that are not entity resources. .... .. list-table:: ``EntityEvent`` :name: TableCommonEntityEvent :widths: 20 80 :header-rows: 1 * - Field Name - Description * - :litrep:`state` - State of the resource entity expressed as an enumeration of type ``EntityStateKind``. .. _section-Common-Mon-Metrics-NetworkPerformance: Network Performance Metrics --------------------------- .. literalinclude:: ../../resource/idl/ServiceCommon.idl :caption: Network Performance Type :start-after: /* NetworkPerformance */ :end-before: /* Entity */ .. list-table:: ``NetworkPerformance`` :name: TableCommonNetworkPerformance :widths: 20 80 :header-rows: 1 * - Field Name - Description * - :litrep:`samples_per_sec` - Statistic variable that provides information about the number of samples processed (received or sent) per second. * - :litrep:`bytes_per_sec` - Statistic variable that provides information about the number of bytes processed (received or sent) per second. * - :litrep:`latency_millisec` - Statistic variable that provides information about the latency in milliseconds for the data processed. |br| The latency in a refers to the total time elapsed during the associated processing of the data, which depends on the type of application.