Even in the cloud industry, providers must use a multi-step process
for billing. The required steps to bill for usage in a cloud
environment are metering, rating, and billing. Because the provider's
requirements may be far too specific for a shared solution, rating
and billing solutions cannot be designed in a common module that
satisfies all. Providing users with measurements on cloud services is
required to meet the measured service
definition of cloud computing.
The Telemetry service was originally designed to support billing systems for OpenStack cloud resources. This project only covers the metering portion of the required processing for billing. This service collects information about the system and stores it in the form of samples in order to provide data about anything that can be billed.
In addition to system measurements, the Telemetry service also captures event notifications triggered when various actions are executed in the OpenStack system. This data is captured as Events and stored alongside metering data.
The list of meters is continuously growing, which makes it possible to use the data collected by Telemetry for different purposes, other than billing. For example, the autoscaling feature in the Orchestration service can be triggered by alarms this module sets and then gets notified within Telemetry.
The sections in this document contain information about the architecture and usage of Telemetry. The first section contains a brief summary about the system architecture used in a typical OpenStack deployment. The second section describes the data collection mechanisms. You can also read about alarming to understand how alarm definitions can be posted to Telemetry and what actions can happen if an alarm is raised. The last section contains a troubleshooting guide, which mentions error situations and possible solutions to the problems.
You can retrieve the collected samples in three different ways: with the REST API, with the command-line interface, or with the Metering tab on an OpenStack dashboard.
The Telemetry service uses an agent-based architecture. Several modules combine their responsibilities to collect data, store samples in a database, or provide an API service for handling incoming requests.
The Telemetry service is built from the following agents and services:
Presents aggregated metering data to consumers (such as billing engines and analytics tools).
Polls for different kinds of meter data by using the polling plug-ins (pollsters) registered in different namespaces. It provides a single polling interface across different namespaces.
Polls the public RESTful APIs of other OpenStack services such as Compute service and Image service, in order to keep tabs on resource existence, by using the polling plug-ins (pollsters) registered in the central polling namespace.
Polls the local hypervisor or libvirt daemon to acquire performance data for the local instances, messages and emits the data as AMQP messages, by using the polling plug-ins (pollsters) registered in the compute polling namespace.
Polls the local node with IPMI support, in order to acquire IPMI sensor data and Intel Node Manager data, by using the polling plug-ins (pollsters) registered in the IPMI polling namespace.
Consumes AMQP messages from other OpenStack services.
Consumes AMQP notifications from the agents, then dispatches these data to the appropriate data store.
Determines when alarms fire due to the associated statistic trend crossing a threshold over a sliding time window.
Initiates alarm actions, for example calling out to a webhook with a description of the alarm state transition.
The ceilometer-polling
service is available since the Kilo release.
It is intended to replace ceilometer-agent-central
,
ceilometer-agent-compute
, and ceilometer-agent-ipmi
.
The ceilometer-api
and ceilometer-collector
are no longer
supported since the Ocata release.
The ceilometer-alarm-evaluator
and ceilometer-alarm-notifier
services are removed in Mitaka release.
Except for the ceilometer-agent-compute
and the ceilometer-agent-ipmi
services, all the other services are placed on one or more controller
nodes.
The Telemetry architecture highly depends on the AMQP service both for consuming notifications coming from OpenStack services and internal communication.
The other key external component of Telemetry is the database, where events, samples, alarm definitions, and alarms are stored.
Multiple database back ends can be configured in order to store events, samples, and alarms separately. We recommend Gnocchi for time-series storage.
The list of supported database back ends:
The Telemetry service collects information about the virtual machines, which requires close connection to the hypervisor that runs on the compute hosts.
The following is a list of supported hypervisors.
The following hypervisors are supported via libvirt
Telemetry is able to retrieve information from OpenStack Networking and external networking services:
OpenStack Networking:
Basic network meters
Firewall-as-a-Service (FWaaS) meters
Load-Balancer-as-a-Service (LBaaS) meters
VPN-as-a-Service (VPNaaS) meters
SDN controller meters:
This service of OpenStack uses OpenStack Identity for authenticating and authorizing users. The required configuration options are listed in the Telemetry section in the OpenStack Configuration Reference.
The system uses two roles:admin
and non-admin
. The authorization
happens before processing each API request. The amount of returned data
depends on the role the requestor owns.
The creation of alarm definitions also highly depends on the role of the user, who initiated the action. Further details about Section 10.5, “Alarms” handling can be found in this guide.
The main responsibility of Telemetry in OpenStack is to collect information about the system that can be used by billing systems or interpreted by analytic tooling. Telemetry in OpenStack originally focused on the counters used for billing, and the recorded range is continuously growing wider.
Collected data can be stored in the form of samples or events in the supported databases, which are listed in Section 10.1.1, “Supported databases”.
Samples can have various sources. Sample sources depend on, and adapt to, the needs and configuration of Telemetry. The Telemetry service requires multiple methods to collect data samples.
The available data collection mechanisms are:
Processing notifications from other OpenStack services, by consuming messages from the configured message queue system.
Retrieve information directly from the hypervisor or from the host machine using SNMP, or by using the APIs of other OpenStack services.
Pushing samples via the RESTful API of Telemetry.
All OpenStack services send notifications about the executed operations or system state. Several notifications carry information that can be metered. For example, CPU time of a VM instance created by OpenStack Compute service.
The notification agent works alongside, but separately, from the Telemetry service. The agent is responsible for consuming notifications. This component is responsible for consuming from the message bus and transforming notifications into events and measurement samples.
Since the Liberty release, the notification agent is responsible for all data processing such as transformations and publishing. After processing, the data is sent via AMQP to the collector service or any external service. These external services persist the data in configured databases.
The different OpenStack services emit several notifications about the various types of events that happen in the system during normal operation. Not all these notifications are consumed by the Telemetry service, as the intention is only to capture the billable events and notifications that can be used for monitoring or profiling purposes. The notification agent filters by the event type. Each notification message contains the event type. The following table contains the event types by each OpenStack service that Telemetry transforms into samples.
OpenStack service |
Event types |
Note |
---|---|---|
OpenStack Compute |
scheduler.run_instance.scheduled scheduler.select_destinations compute.instance.* |
For a more detailed list of Compute notifications please check the System Usage Data wiki page. |
Bare metal service |
hardware.ipmi.* | |
OpenStack Image |
image.update image.upload image.delete image.send |
The required configuration for Image service can be * - service found in Configure the Image service for Telemetry section in the Installation Tutorials and Guides. |
OpenStack Networking |
floatingip.create.end floatingip.update.* floatingip.exists network.create.end network.update.* network.exists port.create.end port.update.* port.exists router.create.end router.update.* router.exists subnet.create.end subnet.update.* subnet.exists l3.meter | |
Orchestration service |
orchestration.stack.create.end orchestration.stack.update.end orchestration.stack.delete.end orchestration.stack.resume.end orchestration.stack.suspend.end | |
OpenStack Block Storage |
volume.exists volume.create.* volume.delete.* volume.update.* volume.resize.* volume.attach.* volume.detach.* snapshot.exists snapshot.create.* snapshot.delete.* snapshot.update.* volume.backup.create.* volume.backup.delete.* volume.backup.restore.* |
The required configuration for Block Storage service can be found in the Add the Block Storage service agent for Telemetry section in the Installation Tutorials and Guides. |
Some services require additional configuration to emit the notifications using the correct control exchange on the message queue and so forth. These configuration needs are referred in the above table for each OpenStack service that needs it.
Specific notifications from the Compute service are important for
administrators and users. Configuring nova_notifications
in the
nova.conf
file allows administrators to respond to events
rapidly. For more information on configuring notifications for the
compute service, see
Telemetry services in the
Installation Tutorials and Guides.
When the store_events
option is set to True
in
ceilometer.conf
, Prior to the Kilo release, the notification agent
needed database access in order to work properly.
This agent is responsible for collecting resource usage data of VM instances on individual compute nodes within an OpenStack deployment. This mechanism requires a closer interaction with the hypervisor, therefore a separate agent type fulfills the collection of the related meters, which is placed on the host machines to retrieve this information locally.
A Compute agent instance has to be installed on each and every compute node, installation instructions can be found in the Install the Compute agent for Telemetry section in the Installation Tutorials and Guides.
Just like the central agent, this component also does not need a direct database connection. The samples are sent via AMQP to the notification agent.
The list of supported hypervisors can be found in Section 10.1.2, “Supported hypervisors”. The Compute agent uses the API of the hypervisor installed on the compute hosts. Therefore, the supported meters may be different in case of each virtualization back end, as each inspection tool provides a different set of meters.
The list of collected meters can be found in Section 10.6.1, “OpenStack Compute”. The support column provides the information about which meter is available for each hypervisor supported by the Telemetry service.
Telemetry supports Libvirt, which hides the hypervisor under it.
A subset of Object Store statistics requires additional middleware to
be installed behind the proxy of Object Store. This additional component
emits notifications containing data-flow-oriented meters, namely the
storage.objects.(incoming|outgoing).bytes values
. The list of these
meters are listed in Section 10.6.7, “OpenStack Object Storage”, marked with
notification
as origin.
The instructions on how to install this middleware can be found in Configure the Object Storage service for Telemetry section in the Installation Tutorials and Guides.
Telemetry provides HTTP request and API endpoint counting
capability in OpenStack. This is achieved by
storing a sample for each event marked as audit.http.request
,
audit.http.response
, http.request
or http.response
.
It is recommended that these notifications be consumed as events rather
than samples to better index the appropriate values and avoid massive
load on the Metering database. If preferred, Telemetry can consume these
events as samples if the services are configured to emit http.*
notifications.
The Telemetry service is intended to store a complex picture of the infrastructure. This goal requires additional information than what is provided by the events and notifications published by each service. Some information is not emitted directly, like resource usage of the VM instances.
Therefore Telemetry uses another method to gather this data by polling the infrastructure including the APIs of the different OpenStack services and other assets, like hypervisors. The latter case requires closer interaction with the compute hosts. To solve this issue, Telemetry uses an agent based architecture to fulfill the requirements against the data collection.
There are three types of agents supporting the polling mechanism, the
compute agent
, the central agent
, and the IPMI agent
. Under
the hood, all the types of polling agents are the same
ceilometer-polling
agent, except that they load different polling
plug-ins (pollsters) from different namespaces to gather data. The following
subsections give further information regarding the architectural and
configuration details of these components.
Running ceilometer-agent-compute
is exactly the same as:
$ ceilometer-polling --polling-namespaces compute
Running ceilometer-agent-central
is exactly the same as:
$ ceilometer-polling --polling-namespaces central
Running ceilometer-agent-ipmi
is exactly the same as:
$ ceilometer-polling --polling-namespaces ipmi
In addition to loading all the polling plug-ins registered in the
specified namespaces, the ceilometer-polling
agent can also specify the
polling plug-ins to be loaded by using the pollster-list
option:
$ ceilometer-polling --polling-namespaces central \ --pollster-list image image.size storage.*
HA deployment is NOT supported if the pollster-list
option is
used.
The ceilometer-polling
service is available since Kilo release.
This agent is responsible for polling public REST APIs to retrieve additional information on OpenStack resources not already surfaced via notifications, and also for polling hardware resources over SNMP.
The following services can be polled with this agent:
OpenStack Networking
OpenStack Object Storage
OpenStack Block Storage
Hardware resources via SNMP
Energy consumption meters via Kwapi framework
To install and configure this service use the Add the Telemetry service section in the Installation Tutorials and Guides.
The central agent does not need direct database connection. The samples collected by this agent are sent via AMQP to the notification agent to be processed.
Prior to the Liberty release, data from the polling agents was processed locally and published accordingly rather than by the notification agent.
This agent is responsible for collecting IPMI sensor data and Intel Node Manager data on individual compute nodes within an OpenStack deployment. This agent requires an IPMI capable node with the ipmitool utility installed, which is commonly used for IPMI control on various Linux distributions.
An IPMI agent instance could be installed on each and every compute node
with IPMI support, except when the node is managed by the Bare metal
service and the conductor.send_sensor_data
option is set to true
in the Bare metal service. It is no harm to install this agent on a
compute node without IPMI or Intel Node Manager support, as the agent
checks for the hardware and if none is available, returns empty data. It
is suggested that you install the IPMI agent only on an IPMI capable
node for performance reasons.
Just like the central agent, this component also does not need direct database access. The samples are sent via AMQP to the notification agent.
The list of collected meters can be found in Section 10.6.2, “Bare metal service”.
Do not deploy both the IPMI agent and the Bare metal service on one
compute node. If conductor.send_sensor_data
is set, this
misconfiguration causes duplicated IPMI sensor samples.
Both the polling agents and notification agents can run in an HA deployment, which means that multiple instances of these services can run in parallel with workload partitioning among these running instances.
The Tooz library provides the coordination within the groups of service instances. It provides an API above several back ends that can be used for building distributed applications.
Tooz supports various drivers including the following back end solutions:
You must configure a supported Tooz driver for the HA deployment of the Telemetry services.
For information about the required configuration options that have to be
set in the ceilometer.conf
configuration file for both the central
and Compute agents, see the Coordination section
in the OpenStack Configuration Reference.
In the Kilo release, workload partitioning support was added to the notification agent. This is particularly useful as the pipeline processing is handled exclusively by the notification agent now which may result in a larger amount of load.
To enable workload partitioning by notification agent, the backend_url
option must be set in the ceilometer.conf
configuration file.
Additionally, workload_partitioning
should be enabled in the
Notification section in the OpenStack Configuration Reference.
In Liberty, the notification agent creates multiple queues to divide the
workload across all active agents. The number of queues can be controlled by
the pipeline_processing_queues
option in the ceilometer.conf
configuration file. A larger value will result in better distribution of
tasks but will also require more memory and longer startup time. It is
recommended to have a value approximately three times the number of active
notification agents. At a minimum, the value should be equal to the number
of active agents.
Without the backend_url
option being set only one instance of
both the central and Compute agent service is able to run and
function correctly.
The availability check of the instances is provided by heartbeat messages. When the connection with an instance is lost, the workload will be reassigned within the remained instances in the next polling cycle.
Memcached
uses a timeout
value, which should always be set
to a value that is higher than the heartbeat
value set for
Telemetry.
For backward compatibility and supporting existing deployments, the
central agent configuration also supports using different configuration
files for groups of service instances of this type that are running in
parallel. For enabling this configuration set a value for the
partitioning_group_prefix
option in the polling section
in the OpenStack Configuration Reference.
For each sub-group of the central agent pool with the same
partitioning_group_prefix
a disjoint subset of meters must be
polled, otherwise samples may be missing or duplicated. The list of
meters to poll can be set in the /etc/ceilometer/pipeline.yaml
configuration file. For more information about pipelines see
Section 10.3, “Data collection, processing, and pipelines”.
To enable the Compute agent to run multiple instances simultaneously
with workload partitioning, the workload_partitioning
option has to
be set to True
under the Compute section
in the ceilometer.conf
configuration file.
While most parts of the data collection in the Telemetry service are automated, Telemetry provides the possibility to submit samples via the REST API to allow users to send custom samples into this service.
This option makes it possible to send any kind of samples without the need of writing extra code lines or making configuration changes.
The samples that can be sent to Telemetry are not limited to the actual existing meters. There is a possibility to provide data for any new, customer defined counter by filling out all the required fields of the POST request.
If the sample corresponds to an existing meter, then the fields like
meter-type
and meter name should be matched accordingly.
The required fields for sending a sample using the command-line client are:
ID of the corresponding resource. (--resource-id
)
Name of meter. (--meter-name
)
Type of meter. (--meter-type
)
Predefined meter types:
Gauge
Delta
Cumulative
Unit of meter. (--meter-unit
)
Volume of sample. (--sample-volume
)
To send samples to Telemetry using the command-line client, the following command should be invoked:
$ ceilometer sample-create -r 37128ad6-daaa-4d22-9509-b7e1c6b08697 \ -m memory.usage --meter-type gauge --meter-unit MB --sample-volume 48 +-------------------+--------------------------------------------+ | Property | Value | +-------------------+--------------------------------------------+ | message_id | 6118820c-2137-11e4-a429-08002715c7fb | | name | memory.usage | | project_id | e34eaa91d52a4402b4cb8bc9bbd308c1 | | resource_id | 37128ad6-daaa-4d22-9509-b7e1c6b08697 | | resource_metadata | {} | | source | e34eaa91d52a4402b4cb8bc9bbd308c1:openstack | | timestamp | 2014-08-11T09:10:46.358926 | | type | gauge | | unit | MB | | user_id | 679b0499e7a34ccb9d90b64208401f8e | | volume | 48.0 | +-------------------+--------------------------------------------+
The Telemetry service collects a subset of the meters by filtering
notifications emitted by other OpenStack services. Starting with the Liberty
release, you can find the meter definitions in a separate configuration file,
called ceilometer/meter/data/meter.yaml
. This enables
operators/administrators to add new meters to Telemetry project by updating
the meter.yaml
file without any need for additional code changes.
The meter.yaml
file should be modified with care. Unless intended
do not remove any existing meter definitions from the file. Also, the
collected meters can differ in some cases from what is referenced in the
documentation.
A standard meter definition looks like:
---
metric:
- name: 'meter name'
event_type: 'event name'
type: 'type of meter eg: gauge, cumulative or delta'
unit: 'name of unit eg: MB'
volume: 'path to a measurable value eg: $.payload.size'
resource_id: 'path to resource id eg: $.payload.id'
project_id: 'path to project id eg: $.payload.owner'
The definition above shows a simple meter definition with some fields,
from which name
, event_type
, type
, unit
, and volume
are required. If there is a match on the event type, samples are generated
for the meter.
If you take a look at the meter.yaml
file, it contains the sample
definitions for all the meters that Telemetry is collecting from
notifications. The value of each field is specified by using JSON path in
order to find the right value from the notification message. In order to be
able to specify the right field you need to be aware of the format of the
consumed notification. The values that need to be searched in the notification
message are set with a JSON path starting with $.
For instance, if you need
the size
information from the payload you can define it like
$.payload.size
.
A notification message may contain multiple meters. You can use *
in
the meter definition to capture all the meters and generate samples
respectively. You can use wild cards as shown in the following example:
---
metric:
- name: $.payload.measurements.[*].metric.[*].name
event_type: 'event_name.*'
type: 'delta'
unit: $.payload.measurements.[*].metric.[*].unit
volume: payload.measurements.[*].result
resource_id: $.payload.target
user_id: $.payload.initiator.id
project_id: $.payload.initiator.project_id
In the above example, the name
field is a JSON path with matching
a list of meter names defined in the notification message.
You can even use complex operations on JSON paths. In the following example,
volume
and resource_id
fields perform an arithmetic
and string concatenation:
---
metric:
- name: 'compute.node.cpu.idle.percent'
event_type: 'compute.metrics.update'
type: 'gauge'
unit: 'percent'
volume: payload.metrics[?(@.name='cpu.idle.percent')].value * 100
resource_id: $.payload.host + "_" + $.payload.nodename
You can use the timedelta
plug-in to evaluate the difference in seconds
between two datetime
fields from one notification.
---
metric:
- name: 'compute.instance.booting.time'
event_type: 'compute.instance.create.end'
type: 'gauge'
unit: 'sec'
volume:
fields: [$.payload.created_at, $.payload.launched_at]
plugin: 'timedelta'
project_id: $.payload.tenant_id
resource_id: $.payload.instance_id
You will find some existence meters in the meter.yaml
. These
meters have a volume
as 1
and are at the bottom of the yaml file
with a note suggesting that these will be removed in Mitaka release.
For example, the meter definition for existence meters is as follows:
---
metric:
- name: 'meter name'
type: 'delta'
unit: 'volume'
volume: 1
event_type:
- 'event type'
resource_id: $.payload.volume_id
user_id: $.payload.user_id
project_id: $.payload.tenant_id
These meters are not loaded by default. To load these meters, flip
the disable_non_metric_meters
option in the ceilometer.conf
file.
If you want to collect OpenStack Block Storage notification on demand,
you can use cinder-volume-usage-audit
from OpenStack Block Storage.
This script becomes available when you install OpenStack Block Storage,
so you can use it without any specific settings and you don't need to
authenticate to access the data. To use it, you must run this command in
the following format:
$ cinder-volume-usage-audit \ --start_time='YYYY-MM-DD HH:MM:SS' --end_time='YYYY-MM-DD HH:MM:SS' --send_actions
This script outputs what volumes or snapshots were created, deleted, or exists in a given period of time and some information about these volumes or snapshots. Information about the existence and size of volumes and snapshots is store in the Telemetry service. This data is also stored as an event which is the recommended usage as it provides better indexing of data.
Using this script via cron you can get notifications periodically, for example, every 5 minutes:
*/5 * * * * /path/to/cinder-volume-usage-audit --send_actions
The Telemetry service has a separate service that is responsible for persisting the data that comes from the pollsters or is received as notifications. The data can be stored in a file or a database back end, for which the list of supported databases can be found in Section 10.1.1, “Supported databases”. The data can also be sent to an external data store by using an HTTP dispatcher.
The ceilometer-collector
service receives the data as messages from the
message bus of the configured AMQP service. It sends these datapoints
without any modification to the configured target. The service has to
run on a host machine from which it has access to the configured
dispatcher.
Multiple dispatchers can be configured for Telemetry at one time.
Multiple ceilometer-collector
processes can be run at a time. It is also
supported to start multiple worker threads per collector process. The
collector_workers
configuration option has to be modified in the
Collector section
of the ceilometer.conf
configuration file.
When the database dispatcher is configured as data store, you have the
option to set a time_to_live
option (ttl) for samples. By default
the time to live value for samples is set to -1, which means that they
are kept in the database forever.
The time to live value is specified in seconds. Each sample has a time
stamp, and the ttl
value indicates that a sample will be deleted
from the database when the number of seconds has elapsed since that
sample reading was stamped. For example, if the time to live is set to
600, all samples older than 600 seconds will be purged from the
database.
Certain databases support native TTL expiration. In cases where this is
not possible, a command-line script, which you can use for this purpose
is ceilometer-expirer
. You can run it in a cron job, which helps to keep
your database in a consistent state.
The level of support differs in case of the configured back end:
Database |
TTL value support |
Note |
---|---|---|
MongoDB |
Yes |
MongoDB has native TTL support for deleting samples that are older than the configured ttl value. |
SQL-based back ends |
Yes |
|
HBase |
No |
Telemetry's HBase support does not include native TTL
nor |
DB2 NoSQL |
No |
DB2 NoSQL does not have native TTL
nor |
The Telemetry service supports sending samples to an external HTTP
target. The samples are sent without any modification. To set this
option as the collector's target, the dispatcher
has to be changed
to http
in the ceilometer.conf
configuration file. For the list
of options that you need to set, see the see the dispatcher_http
section
in the OpenStack Configuration Reference.
You can store samples in a file by setting the dispatcher
option in the
ceilometer.conf
file. For the list of configuration options,
see the dispatcher_file section
in the OpenStack Configuration Reference.
The mechanism by which data is collected and processed is called a pipeline. Pipelines, at the configuration level, describe a coupling between sources of data and the corresponding sinks for transformation and publication of data.
A source is a producer of data: samples
or events
. In effect, it is a
set of pollsters or notification handlers emitting datapoints for a set
of matching meters and event types.
Each source configuration encapsulates name matching, polling interval determination, optional resource enumeration or discovery, and mapping to one or more sinks for publication.
Data gathered can be used for different purposes, which can impact how frequently it needs to be published. Typically, a meter published for billing purposes needs to be updated every 30 minutes while the same meter may be needed for performance tuning every minute.
Rapid polling cadences should be avoided, as it results in a huge amount of data in a short time frame, which may negatively affect the performance of both Telemetry and the underlying database back end. We strongly recommend you do not use small granularity values like 10 seconds.
A sink, on the other hand, is a consumer of data, providing logic for the transformation and publication of data emitted from related sources.
In effect, a sink describes a chain of handlers. The chain starts with zero or more transformers and ends with one or more publishers. The first transformer in the chain is passed data from the corresponding source, takes some action such as deriving rate of change, performing unit conversion, or aggregating, before passing the modified data to the next step that is described in Section 10.4.3, “Publishers”.
The pipeline configuration is, by default stored in separate configuration
files called pipeline.yaml
and event_pipeline.yaml
next to
the ceilometer.conf
file. The meter pipeline and event pipeline
configuration files can be set by the pipeline_cfg_file
and
event_pipeline_cfg_file
options listed in the Description of
configuration options for api table
section in the OpenStack Configuration Reference respectively. Multiple
pipelines can be defined in one pipeline configuration file.
The meter pipeline definition looks like:
---
sources:
- name: 'source name'
interval: 'how often should the samples be injected into the pipeline'
meters:
- 'meter filter'
resources:
- 'list of resource URLs'
sinks
- 'sink name'
sinks:
- name: 'sink name'
transformers: 'definition of transformers'
publishers:
- 'list of publishers'
The interval parameter in the sources section should be defined in seconds. It determines the polling cadence of sample injection into the pipeline, where samples are produced under the direct control of an agent.
There are several ways to define the list of meters for a pipeline source. The list of valid meters can be found in Section 10.6, “Measurements”. There is a possibility to define all the meters, or just included or excluded meters, with which a source should operate:
To include all meters, use the *
wildcard symbol. It is highly
advisable to select only the meters that you intend on using to avoid
flooding the metering database with unused data.
To define the list of meters, use either of the following:
To define the list of included meters, use the meter_name
syntax.
To define the list of excluded meters, use the !meter_name
syntax.
For meters, which have variants identified by a complex name
field, use the wildcard symbol to select all, for example,
for instance:m1.tiny
, use instance:\*
.
The OpenStack Telemetry service does not have any duplication check between pipelines, and if you add a meter to multiple pipelines then it is assumed the duplication is intentional and may be stored multiple times according to the specified sinks.
The above definition methods can be used in the following combinations:
Use only the wildcard symbol.
Use the list of included meters.
Use the list of excluded meters.
Use wildcard symbol with the list of excluded meters.
At least one of the above variations should be included in the meters section. Included and excluded meters cannot co-exist in the same pipeline. Wildcard and included meters cannot co-exist in the same pipeline definition section.
The optional resources section of a pipeline source allows a static list of resource URLs to be configured for polling.
The transformers section of a pipeline sink provides the possibility to add a list of transformer definitions. The available transformers are:
Name of transformer |
Reference name for configuration |
---|---|
Accumulator |
accumulator |
Aggregator |
aggregator |
Arithmetic |
arithmetic |
Rate of change |
rate_of_change |
Unit conversion |
unit_conversion |
Delta |
delta |
The publishers section contains the list of publishers, where the samples data should be sent after the possible transformations.
Similarly, the event pipeline definition looks like:
---
sources:
- name: 'source name'
events:
- 'event filter'
sinks
- 'sink name'
sinks:
- name: 'sink name'
publishers:
- 'list of publishers'
The event filter uses the same filtering logic as the meter pipeline.
The definition of transformers can contain the following fields:
Name of the transformer.
Parameters of the transformer.
The parameters section can contain transformer specific fields, like source and target fields with different subfields in case of the rate of change, which depends on the implementation of the transformer.
In the case of the transformer that creates the cpu_util
meter, the
definition looks like:
transformers:
- name: "rate_of_change"
parameters:
target:
name: "cpu_util"
unit: "%"
type: "gauge"
scale: "100.0 / (10**9 * (resource_metadata.cpu_number or 1))"
The rate of change the transformer generates is the cpu_util
meter
from the sample values of the cpu
counter, which represents
cumulative CPU time in nanoseconds. The transformer definition above
defines a scale factor (for nanoseconds and multiple CPUs), which is
applied before the transformation derives a sequence of gauge samples
with unit %
, from sequential values of the cpu
meter.
The definition for the disk I/O rate, which is also generated by the rate of change transformer:
transformers:
- name: "rate_of_change"
parameters:
source:
map_from:
name: "disk\\.(read|write)\\.(bytes|requests)"
unit: "(B|request)"
target:
map_to:
name: "disk.\\1.\\2.rate"
unit: "\\1/s"
type: "gauge"
Transformer to apply a unit conversion. It takes the volume of the meter
and multiplies it with the given scale
expression. Also supports
map_from
and map_to
like the rate of change transformer.
Sample configuration:
transformers:
- name: "unit_conversion"
parameters:
target:
name: "disk.kilobytes"
unit: "KB"
scale: "volume * 1.0 / 1024.0"
With map_from
and map_to
:
transformers:
- name: "unit_conversion"
parameters:
source:
map_from:
name: "disk\\.(read|write)\\.bytes"
target:
map_to:
name: "disk.\\1.kilobytes"
scale: "volume * 1.0 / 1024.0"
unit: "KB"
A transformer that sums up the incoming samples until enough samples have come in or a timeout has been reached.
Timeout can be specified with the retention_time
option. If you want
to flush the aggregation, after a set number of samples have been
aggregated, specify the size parameter.
The volume of the created sample is the sum of the volumes of samples
that came into the transformer. Samples can be aggregated by the
attributes project_id
, user_id
and resource_metadata
. To aggregate
by the chosen attributes, specify them in the configuration and set which
value of the attribute to take for the new sample (first to take the
first sample's attribute, last to take the last sample's attribute, and
drop to discard the attribute).
To aggregate 60s worth of samples by resource_metadata
and keep the
resource_metadata
of the latest received sample:
transformers:
- name: "aggregator"
parameters:
retention_time: 60
resource_metadata: last
To aggregate each 15 samples by user_id
and resource_metadata
and keep
the user_id
of the first received sample and drop the
resource_metadata
:
transformers:
- name: "aggregator"
parameters:
size: 15
user_id: first
resource_metadata: drop
This transformer simply caches the samples until enough samples have arrived and then flushes them all down the pipeline at once:
transformers:
- name: "accumulator"
parameters:
size: 15
This transformer enables us to perform arithmetic calculations over one or more meters and/or their metadata, for example:
memory_util = 100 * memory.usage / memory
A new sample is created with the properties described in the target
section of the transformer's configuration. The sample's
volume is the result of the provided expression. The calculation is
performed on samples from the same resource.
The calculation is limited to meters with the same interval.
Example configuration:
transformers:
- name: "arithmetic"
parameters:
target:
name: "memory_util"
unit: "%"
type: "gauge"
expr: "100 * $(memory.usage) / $(memory)"
To demonstrate the use of metadata, the following implementation of a novel meter shows average CPU time per core:
transformers:
- name: "arithmetic"
parameters:
target:
name: "avg_cpu_per_core"
unit: "ns"
type: "cumulative"
expr: "$(cpu) / ($(cpu).resource_metadata.cpu_number or 1)"
Expression evaluation gracefully handles NaNs and exceptions. In such a case it does not create a new sample but only logs a warning.
This transformer calculates the change between two sample datapoints of a resource. It can be configured to capture only the positive growth deltas.
Example configuration:
transformers:
- name: "delta"
parameters:
target:
name: "cpu.delta"
growth_only: True
The Telemetry service offers several mechanisms from which the persisted data can be accessed. As described in Section 10.1, “System architecture” and in Section 10.2, “Data collection”, the collected information can be stored in one or more database back ends, which are hidden by the Telemetry RESTful API.
It is highly recommended not to access the database directly and read or modify any data in it. The API layer hides all the changes in the actual database schema and provides a standard interface to expose the samples, alarms and so forth.
The Telemetry service provides a RESTful API, from which the collected samples and all the related information can be retrieved, like the list of meters, alarm definitions and so forth.
The Telemetry API URL can be retrieved from the service catalog provided by OpenStack Identity, which is populated during the installation process. The API access needs a valid token and proper permission to retrieve data, as described in Section 10.1.4, “Users, roles, and projects”.
Further information about the available API endpoints can be found in the Telemetry API Reference.
The API provides some additional functionalities, like querying the collected data set. For the samples and alarms API endpoints, both simple and complex query styles are available, whereas for the other endpoints only simple queries are supported.
After validating the query parameters, the processing is done on the database side in the case of most database back ends in order to achieve better performance.
Simple query
Many of the API endpoints accept a query filter argument, which should be a list of data structures that consist of the following items:
field
op
value
type
Regardless of the endpoint on which the filter is applied on, it will always target the fields of the Sample type.
Several fields of the API endpoints accept shorter names than the ones defined in the reference. The API will do the transformation internally and return the output with the fields that are listed in the API reference. The fields are the following:
project_id
: project
resource_id
: resource
user_id
: user
When a filter argument contains multiple constraints of the above form,
a logical AND
relation between them is implied.
Complex query
The filter expressions of the complex query feature operate on the
fields of Sample
, Alarm
and AlarmChange
types. The following
comparison operators are supported:
=
!=
<
<=
>
>=
The following logical operators can be used:
and
or
not
The not
operator has different behavior in MongoDB and in the
SQLAlchemy-based database engines. If the not
operator is
applied on a non existent metadata field then the result depends on
the database engine. In case of MongoDB, it will return every sample
as the not
operator is evaluated true for every sample where the
given field does not exist. On the other hand the SQL-based database
engine will return an empty result because of the underlying
join
operation.
Complex query supports specifying a list of orderby
expressions.
This means that the result of the query can be ordered based on the
field names provided in this list. When multiple keys are defined for
the ordering, these will be applied sequentially in the order of the
specification. The second expression will be applied on the groups for
which the values of the first expression are the same. The ordering can
be ascending or descending.
The number of returned items can be bounded using the limit
option.
The filter
, orderby
and limit
fields are optional.
As opposed to the simple query, complex query is available via a separate API endpoint. For more information see the Telemetry v2 Web API Reference.
The sample data can be used in various ways for several purposes, like billing or profiling. In external systems the data is often used in the form of aggregated statistics. The Telemetry API provides several built-in functions to make some basic calculations available without any additional coding.
Telemetry supports the following statistics and aggregation functions:
avg
Average of the sample volumes over each period.
cardinality
Count of distinct values in each period identified by a key specified as the parameter of this aggregate function. The supported parameter values are:
project_id
resource_id
user_id
The aggregate.param
option is required.
count
Number of samples in each period.
max
Maximum of the sample volumes in each period.
min
Minimum of the sample volumes in each period.
stddev
Standard deviation of the sample volumes in each period.
sum
Sum of the sample volumes over each period.
The simple query and the statistics functionality can be used together in a single API request.
The Telemetry service provides a command-line client, with which the collected data is available just as the alarm definition and retrieval options. The client uses the Telemetry RESTful API in order to execute the requested operations.
To be able to use the ceilometer
command, the
python-ceilometerclient package needs to be installed and configured
properly. For details about the installation process, see the Telemetry
chapter
in the Installation Tutorials and Guides.
The Telemetry service captures the user-visible resource usage data. Therefore the database will not contain any data without the existence of these resources, like VM images in the OpenStack Image service.
Similarly to other OpenStack command-line clients, the ceilometer
client uses OpenStack Identity for authentication. The proper
credentials and --auth_url
parameter have to be defined via command
line parameters or environment variables.
This section provides some examples without the aim of completeness. These commands can be used for instance for validating an installation of Telemetry.
To retrieve the list of collected meters, the following command should be used:
$ ceilometer meter-list +------------------------+------------+------+------------------------------------------+----------------------------------+----------------------------------+ | Name | Type | Unit | Resource ID | User ID | Project ID | +------------------------+------------+------+------------------------------------------+----------------------------------+----------------------------------+ | cpu | cumulative | ns | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | cpu | cumulative | ns | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | cpu_util | gauge | % | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | cpu_util | gauge | % | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.device.read.bytes | cumulative | B | bb52e52b-1e42-4751-b3ac-45c52d83ba07-hdd | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.device.read.bytes | cumulative | B | bb52e52b-1e42-4751-b3ac-45c52d83ba07-vda | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.device.read.bytes | cumulative | B | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b-hdd | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.device.read.bytes | cumulative | B | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b-vda | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | ... | +------------------------+------------+------+------------------------------------------+----------------------------------+----------------------------------+
The ceilometer
command was run with admin
rights, which means
that all the data is accessible in the database. For more information
about access right see Section 10.1.4, “Users, roles, and projects”. As it can be seen
in the above example, there are two VM instances existing in the system, as
there are VM instance related meters on the top of the result list. The
existence of these meters does not indicate that these instances are running at
the time of the request. The result contains the currently collected meters per
resource, in an ascending order based on the name of the meter.
Samples are collected for each meter that is present in the list of
meters, except in case of instances that are not running or deleted from
the OpenStack Compute database. If an instance no longer exists and
there is a time_to_live
value set in the ceilometer.conf
configuration file, then a group of samples are deleted in each
expiration cycle. When the last sample is deleted for a meter, the
database can be cleaned up by running ceilometer-expirer and the meter
will not be present in the list above anymore. For more information
about the expiration procedure see Section 10.2.6, “Storing samples”.
The Telemetry API supports simple query on the meter endpoint. The query functionality has the following syntax:
--query <field1><operator1><value1>;...;<field_n><operator_n><value_n>
The following command needs to be invoked to request the meters of one VM instance:
$ ceilometer meter-list --query resource=bb52e52b-1e42-4751-b3ac-45c52d83ba07 +-------------------------+------------+-----------+--------------------------------------+----------------------------------+----------------------------------+ | Name | Type | Unit | Resource ID | User ID | Project ID | +-------------------------+------------+-----------+--------------------------------------+----------------------------------+----------------------------------+ | cpu | cumulative | ns | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | cpu_util | gauge | % | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | cpu_l3_cache | gauge | B | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.ephemeral.size | gauge | GB | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.read.bytes | cumulative | B | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.read.bytes.rate | gauge | B/s | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.read.requests | cumulative | request | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.read.requests.rate | gauge | request/s | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.root.size | gauge | GB | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.write.bytes | cumulative | B | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.write.bytes.rate | gauge | B/s | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.write.requests | cumulative | request | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | disk.write.requests.rate| gauge | request/s | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | instance | gauge | instance | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | instance:m1.tiny | gauge | instance | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | memory | gauge | MB | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | | vcpus | gauge | vcpu | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | b6e62aad26174382bc3781c12fe413c8 | cbfa8e3dfab64a27a87c8e24ecd5c60f | +-------------------------+------------+-----------+--------------------------------------+----------------------------------+----------------------------------+
As it was described above, the whole set of samples can be retrieved
that are stored for a meter or filtering the result set by using one of
the available query types. The request for all the samples of the
cpu
meter without any additional filtering looks like the following:
$ ceilometer sample-list --meter cpu +--------------------------------------+-------+------------+------------+------+---------------------+ | Resource ID | Meter | Type | Volume | Unit | Timestamp | +--------------------------------------+-------+------------+------------+------+---------------------+ | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b | cpu | cumulative | 5.4863e+11 | ns | 2014-08-31T11:17:03 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.7848e+11 | ns | 2014-08-31T11:17:03 | | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b | cpu | cumulative | 5.4811e+11 | ns | 2014-08-31T11:07:05 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.7797e+11 | ns | 2014-08-31T11:07:05 | | c8d2e153-a48f-4cec-9e93-86e7ac6d4b0b | cpu | cumulative | 5.3589e+11 | ns | 2014-08-31T10:27:19 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.6397e+11 | ns | 2014-08-31T10:27:19 | | ... | +--------------------------------------+-------+------------+------------+------+---------------------+
The result set of the request contains the samples for both instances ordered by the timestamp field in the default descending order.
The simple query makes it possible to retrieve only a subset of the
collected samples. The following command can be executed to request the
cpu
samples of only one of the VM instances:
$ ceilometer sample-list --meter cpu --query resource=bb52e52b-1e42-4751- b3ac-45c52d83ba07 +--------------------------------------+------+------------+------------+------+---------------------+ | Resource ID | Name | Type | Volume | Unit | Timestamp | +--------------------------------------+------+------------+------------+------+---------------------+ | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.7906e+11 | ns | 2014-08-31T11:27:08 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.7848e+11 | ns | 2014-08-31T11:17:03 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.7797e+11 | ns | 2014-08-31T11:07:05 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.6397e+11 | ns | 2014-08-31T10:27:19 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.6207e+11 | ns | 2014-08-31T10:17:03 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 5.3831e+11 | ns | 2014-08-31T08:41:57 | | ... | +--------------------------------------+------+------------+------------+------+---------------------+
As it can be seen on the output above, the result set contains samples for only one instance of the two.
The ceilometer query-samples
command is used to execute rich
queries. This command accepts the following parameters:
--filter
Contains the filter expression for the query in the form of:
{complex_op: [{simple_op: {field_name: value}}]}
.
--orderby
Contains the list of orderby
expressions in the form of:
[{field_name: direction}, {field_name: direction}]
.
--limit
Specifies the maximum number of samples to return.
For more information about complex queries see Section 10.4.1.1, “Query”.
As the complex query functionality provides the possibility of using
complex operators, it is possible to retrieve a subset of samples for a
given VM instance. To request for the first six samples for the cpu
and disk.read.bytes
meters, the following command should be invoked:
$ ceilometer query-samples --filter '{"and": \ [{"=":{"resource":"bb52e52b-1e42-4751-b3ac-45c52d83ba07"}},{"or":[{"=":{"counter_name":"cpu"}}, \ {"=":{"counter_name":"disk.read.bytes"}}]}]}' --orderby '[{"timestamp":"asc"}]' --limit 6 +--------------------------------------+-----------------+------------+------------+------+---------------------+ | Resource ID | Meter | Type | Volume | Unit | Timestamp | +--------------------------------------+-----------------+------------+------------+------+---------------------+ | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | disk.read.bytes | cumulative | 385334.0 | B | 2014-08-30T13:00:46 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 1.2132e+11 | ns | 2014-08-30T13:00:47 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 1.4295e+11 | ns | 2014-08-30T13:10:51 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | disk.read.bytes | cumulative | 601438.0 | B | 2014-08-30T13:10:51 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | disk.read.bytes | cumulative | 601438.0 | B | 2014-08-30T13:20:33 | | bb52e52b-1e42-4751-b3ac-45c52d83ba07 | cpu | cumulative | 1.4795e+11 | ns | 2014-08-30T13:20:34 | +--------------------------------------+-----------------+------------+------------+------+---------------------+
Ceilometer also captures data as events, which represents the state of a
resource. Refer to /telemetry-events
for more information regarding
Events.
To retrieve a list of recent events that occurred in the system, the following command can be executed:
$ ceilometer event-list +--------------------------------------+---------------+----------------------------+-----------------------------------------------------------------+ | Message ID | Event Type | Generated | Traits | +--------------------------------------+---------------+----------------------------+-----------------------------------------------------------------+ | dfdb87b6-92c6-4d40-b9b5-ba308f304c13 | image.create | 2015-09-24T22:17:39.498888 | +---------+--------+-----------------+ | | | | | | name | type | value | | | | | | +---------+--------+-----------------+ | | | | | | service | string | image.localhost | | | | | | +---------+--------+-----------------+ | | 84054bc6-2ae6-4b93-b5e7-06964f151cef | image.prepare | 2015-09-24T22:17:39.594192 | +---------+--------+-----------------+ | | | | | | name | type | value | | | | | | +---------+--------+-----------------+ | | | | | | service | string | image.localhost | | | | | | +---------+--------+-----------------+ | | 2ec99c2c-08ee-4079-bf80-27d4a073ded6 | image.update | 2015-09-24T22:17:39.578336 | +-------------+--------+--------------------------------------+ | | | | | | name | type | value | | | | | | +-------------+--------+--------------------------------------+ | | | | | | created_at | string | 2015-09-24T22:17:39Z | | | | | | | name | string | cirros-0.3.4-x86_64-uec-kernel | | | | | | | project_id | string | 56ffddea5b4f423496444ea36c31be23 | | | | | | | resource_id | string | 86eb8273-edd7-4483-a07c-002ff1c5657d | | | | | | | service | string | image.localhost | | | | | | | status | string | saving | | | | | | | user_id | string | 56ffddea5b4f423496444ea36c31be23 | | | | | | +-------------+--------+--------------------------------------+ | +--------------------------------------+---------------+----------------------------+-----------------------------------------------------------------+
In Liberty, the data returned corresponds to the role and user. Non-admin users will only return events that are scoped to them. Admin users will return all events related to the project they administer as well as all unscoped events.
Similar to querying meters, additional filter parameters can be given to retrieve specific events:
$ ceilometer event-list -q 'event_type=compute.instance.exists; \ instance_type=m1.tiny' +--------------------------------------+-------------------------+----------------------------+----------------------------------------------------------------------------------+ | Message ID | Event Type | Generated | Traits | +--------------------------------------+-------------------------+----------------------------+----------------------------------------------------------------------------------+ | 134a2ab3-6051-496c-b82f-10a3c367439a | compute.instance.exists | 2015-09-25T03:00:02.152041 | +------------------------+----------+------------------------------------------+ | | | | | | name | type | value | | | | | | +------------------------+----------+------------------------------------------+ | | | | | | audit_period_beginning | datetime | 2015-09-25T02:00:00 | | | | | | | audit_period_ending | datetime | 2015-09-25T03:00:00 | | | | | | | disk_gb | integer | 1 | | | | | | | ephemeral_gb | integer | 0 | | | | | | | host | string | localhost.localdomain | | | | | | | instance_id | string | 2115f189-c7f1-4228-97bc-d742600839f2 | | | | | | | instance_type | string | m1.tiny | | | | | | | instance_type_id | integer | 2 | | | | | | | launched_at | datetime | 2015-09-24T22:24:56 | | | | | | | memory_mb | integer | 512 | | | | | | | project_id | string | 56ffddea5b4f423496444ea36c31be23 | | | | | | | request_id | string | req-c6292b21-bf98-4a1d-b40c-cebba4d09a67 | | | | | | | root_gb | integer | 1 | | | | | | | service | string | compute | | | | | | | state | string | active | | | | | | | tenant_id | string | 56ffddea5b4f423496444ea36c31be23 | | | | | | | user_id | string | 0b3d725756f94923b9d0c4db864d06a9 | | | | | | | vcpus | integer | 1 | | | | | | +------------------------+----------+------------------------------------------+ | +--------------------------------------+-------------------------+----------------------------+----------------------------------------------------------------------------------+
As of the Liberty release, the number of items returned will be
restricted to the value defined by default_api_return_limit
in the
ceilometer.conf
configuration file. Alternatively, the value can
be set per query by passing the limit
option in the request.
The command-line client library provides python bindings in order to use the Telemetry Python API directly from python programs.
The first step in setting up the client is to create a client instance with the proper credentials:
>>> import ceilometerclient.client
>>> cclient = ceilometerclient.client.get_client(VERSION, username=USERNAME, password=PASSWORD, tenant_name=PROJECT_NAME, auth_url=AUTH_URL)
The VERSION
parameter can be 1
or 2
, specifying the API
version to be used.
The method calls look like the following:
>>> cclient.meters.list()
[<Meter ...>, ...]
>>> cclient.samples.list()
[<Sample ...>, ...]
For further details about the python-ceilometerclient package, see the Python bindings to the OpenStack Ceilometer API reference.
The Telemetry service provides several transport methods to forward the
data collected to the ceilometer-collector
service or to an external
system. The consumers of this data are widely different, like monitoring
systems, for which data loss is acceptable and billing systems, which
require reliable data transportation. Telemetry provides methods to
fulfill the requirements of both kind of systems, as it is described
below.
The publisher component makes it possible to persist the data into storage through the message bus or to send it to one or more external consumers. One chain can contain multiple publishers.
To solve the above mentioned problem, the notion of multi-publisher can be configured for each datapoint within the Telemetry service, allowing the same technical meter or event to be published multiple times to multiple destinations, each potentially using a different transport.
Publishers can be specified in the publishers
section for each
pipeline (for further details about pipelines see
Section 10.3, “Data collection, processing, and pipelines”) that is defined in
the pipeline.yaml
file.
The following publisher types are supported:
It can be specified in the form of direct://?dispatcher=http
. The
dispatcher's options include database, file, http, and gnocchi. For
more details on dispatcher, see Section 10.2.6, “Storing samples”.
It emits data in the configured dispatcher directly, default configuration
(the form is direct://
) is database dispatcher.
In the Mitaka release, this method can only emit data to the database
dispatcher, and the form is direct://
.
It can be specified in the form of
notifier://?option1=value1&option2=value2
. It emits data over
AMQP using oslo.messaging. This is the recommended method of
publishing.
It can be specified in the form of
rpc://?option1=value1&option2=value2
. It emits metering data
over lossy AMQP. This method is synchronous and may experience
performance issues. This publisher is deprecated in Liberty in favor of
the notifier publisher.
It can be specified in the form of udp://<host>:<port>/
. It emits
metering data for over UDP.
It can be specified in the form of
file://path?option1=value1&option2=value2
. This publisher
records metering data into a file.
If a file name and location is not specified, this publisher does not log any meters, instead it logs a warning message in the configured log file for Telemetry.
It can be specified in the form of:
kafka://kafka_broker_ip: kafka_broker_port?topic=kafka_topic
&option1=value1
.
This publisher sends metering data to a kafka broker.
If the topic parameter is missing, this publisher brings out
metering data under a topic name, ceilometer
. When the port
number is not specified, this publisher uses 9092 as the
broker's port.
The following options are available for rpc
and notifier
. The
policy option can be used by kafka
publisher:
per_meter_topic
The value of it is 1. It is used for publishing the samples on
additional metering_topic.sample_name
topic queue besides the
default metering_topic
queue.
policy
It is used for configuring the behavior for the case, when the publisher fails to send the samples, where the possible predefined values are the following:
Used for waiting and blocking until the samples have been sent.
Used for dropping the samples which are failed to be sent.
Used for creating an in-memory queue and retrying to send the
samples on the queue on the next samples publishing period (the
queue length can be configured with max_queue_length
, where
1024 is the default value).
The following option is additionally available for the notifier
publisher:
topic
The topic name of queue to publish to. Setting this will override the
default topic defined by metering_topic
and event_topic
options.
This option can be used to support multiple consumers. Support for this
feature was added in Kilo.
The following options are available for the file
publisher:
max_bytes
When this option is greater than zero, it will cause a rollover. When the size is about to be exceeded, the file is closed and a new file is silently opened for output. If its value is zero, rollover never occurs.
backup_count
If this value is non-zero, an extension will be appended to the filename of the old log, as '.1', '.2', and so forth until the specified value is reached. The file that is written and contains the newest data is always the one that is specified without any extensions.
The default publisher is notifier
, without any additional options
specified. A sample publishers
section in the
/etc/ceilometer/pipeline.yaml
looks like the following:
publishers:
- udp://10.0.0.2:1234
- rpc://?per_meter_topic=1 (deprecated in Liberty)
- notifier://?policy=drop&max_queue_length=512&topic=custom_target
- direct://?dispatcher=http
Alarms provide user-oriented Monitoring-as-a-Service for resources running on OpenStack. This type of monitoring ensures you can automatically scale in or out a group of instances through the Orchestration service, but you can also use alarms for general-purpose awareness of your cloud resources' health.
These alarms follow a tri-state model:
The rule governing the alarm has been evaluated as False
.
The rule governing the alarm have been evaluated as True
.
There are not enough datapoints available in the evaluation periods to meaningfully determine the alarm state.
The definition of an alarm provides the rules that govern when a state transition should occur, and the actions to be taken thereon. The nature of these rules depend on the alarm type.
For conventional threshold-oriented alarms, state transitions are governed by:
A static threshold value with a comparison operator such as greater than or less than.
A statistic selection to aggregate the data.
A sliding time window to indicate how far back into the recent past you want to look.
The Telemetry service also supports the concept of a meta-alarm, which aggregates over the current state of a set of underlying basic alarms combined via a logical operator (AND or OR).
A key associated concept is the notion of dimensioning which defines the set of matching meters that feed into an alarm evaluation. Recall that meters are per-resource-instance, so in the simplest case an alarm might be defined over a particular meter applied to all resources visible to a particular user. More useful however would be the option to explicitly select which specific resources you are interested in alarming on.
At one extreme you might have narrowly dimensioned alarms where this selection would have only a single target (identified by resource ID). At the other extreme, you could have widely dimensioned alarms where this selection identifies many resources over which the statistic is aggregated. For example all instances booted from a particular image or all instances with matching user metadata (the latter is how the Orchestration service identifies autoscaling groups).
Alarms are evaluated by the alarm-evaluator
service on a periodic
basis, defaulting to once every minute.
Any state transition of individual alarm (to ok
, alarm
, or
insufficient data
) may have one or more actions associated with
it. These actions effectively send a signal to a consumer that the
state transition has occurred, and provide some additional context.
This includes the new and previous states, with some reason data
describing the disposition with respect to the threshold, the number
of datapoints involved and most recent of these. State transitions
are detected by the alarm-evaluator
, whereas the
alarm-notifier
effects the actual notification action.
Webhooks
These are the de facto notification type used by Telemetry alarming and simply involve an HTTP POST request being sent to an endpoint, with a request body containing a description of the state transition encoded as a JSON fragment.
Log actions
These are a lightweight alternative to webhooks, whereby the state
transition is simply logged by the alarm-notifier
, and are
intended primarily for testing purposes.
The alarm evaluation process uses the same mechanism for workload partitioning as the central and compute agents. The Tooz library provides the coordination within the groups of service instances. For further information about this approach, see the section called Section 10.2.3, “Support for HA deployment”.
To use this workload partitioning solution set the
evaluation_service
option to default
. For more
information, see the alarm section in the
OpenStack Configuration Reference.
An example of creating a threshold-oriented alarm, based on an upper bound on the CPU utilization for a particular instance:
$ ceilometer alarm-threshold-create --name cpu_hi \ --description 'instance running hot' \ --meter-name cpu_util --threshold 70.0 \ --comparison-operator gt --statistic avg \ --period 600 --evaluation-periods 3 \ --alarm-action 'log://' \ --query resource_id=INSTANCE_ID
This creates an alarm that will fire when the average CPU utilization for an individual instance exceeds 70% for three consecutive 10 minute periods. The notification in this case is simply a log message, though it could alternatively be a webhook URL.
Alarm names must be unique for the alarms associated with an
individual project. Administrator can limit the maximum
resulting actions for three different states, and the
ability for a normal user to create log://
and test://
notifiers is disabled. This prevents unintentional
consumption of disk and memory resources by the
Telemetry service.
The sliding time window over which the alarm is evaluated is 30 minutes in this example. This window is not clamped to wall-clock time boundaries, rather it's anchored on the current time for each evaluation cycle, and continually creeps forward as each evaluation cycle rolls around (by default, this occurs every minute).
The period length is set to 600s in this case to reflect the out-of-the-box default cadence for collection of the associated meter. This period matching illustrates an important general principal to keep in mind for alarms:
The alarm period should be a whole number multiple (1 or more) of the interval configured in the pipeline corresponding to the target meter.
Otherwise the alarm will tend to flit in and out of the
insufficient data
state due to the mismatch between the actual
frequency of datapoints in the metering store and the statistics
queries used to compare against the alarm threshold. If a shorter
alarm period is needed, then the corresponding interval should be
adjusted in the pipeline.yaml
file.
Other notable alarm attributes that may be set on creation, or via a subsequent update, include:
The initial alarm state (defaults to insufficient data
).
A free-text description of the alarm (defaults to a synopsis of the alarm rule).
True if evaluation and actioning is to be enabled for this alarm
(defaults to True
).
True if actions should be repeatedly notified while the alarm
remains in the target state (defaults to False
).
An action to invoke when the alarm state transitions to ok
.
An action to invoke when the alarm state transitions to
insufficient data
.
Used to restrict evaluation of the alarm to certain times of the
day or days of the week (expressed as cron
expression with an
optional timezone).
An example of creating a combination alarm, based on the combined state of two underlying alarms:
$ ceilometer alarm-combination-create --name meta \ --alarm_ids ALARM_ID1 \ --alarm_ids ALARM_ID2 \ --operator or \ --alarm-action 'http://example.org/notify'
This creates an alarm that will fire when either one of two underlying
alarms transition into the alarm state. The notification in this case
is a webhook call. Any number of underlying alarms can be combined in
this way, using either and
or or
.
You can display all your alarms via (some attributes are omitted for brevity):
$ ceilometer alarm-list +----------+--------+-------------------+---------------------------------+ | Alarm ID | Name | State | Alarm condition | +----------+--------+-------------------+---------------------------------+ | ALARM_ID | cpu_hi | insufficient data | cpu_util > 70.0 during 3 x 600s | +----------+--------+-------------------+---------------------------------+
In this case, the state is reported as insufficient data
which
could indicate that:
meters have not yet been gathered about this instance over the evaluation window into the recent past (for example a brand-new instance)
or, that the identified instance is not visible to the user/project owning the alarm
or, simply that an alarm evaluation cycle hasn't kicked off since the alarm was created (by default, alarms are evaluated once per minute).
The visibility of alarms depends on the role and project associated with the user issuing the query:
admin users see all alarms, regardless of the owner
non-admin users see only the alarms associated with their project (as per the normal project segregation in OpenStack)
Once the state of the alarm has settled down, we might decide that we set that bar too low with 70%, in which case the threshold (or most any other alarm attribute) can be updated thusly:
$ ceilometer alarm-update --threshold 75 ALARM_ID
The change will take effect from the next evaluation cycle, which by default occurs every minute.
Most alarm attributes can be changed in this way, but there is also a convenient short-cut for getting and setting the alarm state:
$ ceilometer alarm-state-get ALARM_ID $ ceilometer alarm-state-set --state ok -a ALARM_ID
Over time the state of the alarm may change often, especially if the threshold is chosen to be close to the trending value of the statistic. You can follow the history of an alarm over its lifecycle via the audit API:
$ ceilometer alarm-history ALARM_ID +------------------+-----------+---------------------------------------+ | Type | Timestamp | Detail | +------------------+-----------+---------------------------------------+ | creation | time0 | name: cpu_hi | | | | description: instance running hot | | | | type: threshold | | | | rule: cpu_util > 70.0 during 3 x 600s | | state transition | time1 | state: ok | | rule change | time2 | rule: cpu_util > 75.0 during 3 x 600s | +------------------+-----------+---------------------------------------+
An alarm that is no longer required can be disabled so that it is no longer actively evaluated:
$ ceilometer alarm-update --enabled False -a ALARM_ID
or even deleted permanently (an irreversible step):
$ ceilometer alarm-delete ALARM_ID
By default, alarm history is retained for deleted alarms.
The Telemetry service collects meters within an OpenStack deployment. This section provides a brief summary about meters format and origin and also contains the list of available meters.
Telemetry collects meters by polling the infrastructure elements and also by consuming the notifications emitted by other OpenStack services. For more information about the polling mechanism and notifications see Section 10.2, “Data collection”. There are several meters which are collected by polling and by consuming. The origin for each meter is listed in the tables below.
You may need to configure Telemetry or other OpenStack services in order to be able to collect all the samples you need. For further information about configuration requirements see the Telemetry chapter in the Installation Tutorials and Guides. Also check the Telemetry manual installation description.
Telemetry uses the following meter types:
Type |
Description |
---|---|
Cumulative |
Increasing over time (instance hours) |
Delta |
Changing over time (bandwidth) |
Gauge |
Discrete items (floating IPs, image uploads) and fluctuating values (disk I/O) |
Telemetry provides the possibility to store metadata for samples. This metadata can be extended for OpenStack Compute and OpenStack Object Storage.
In order to add additional metadata information to OpenStack Compute you
have two options to choose from. The first one is to specify them when
you boot up a new instance. The additional information will be stored
with the sample in the form of resource_metadata.user_metadata.*
.
The new field should be defined by using the prefix metering.
. The
modified boot command look like the following:
$ openstack server create --property metering.custom_metadata=a_value my_vm
The other option is to set the reserved_metadata_keys
to the list of
metadata keys that you would like to be included in
resource_metadata
of the instance related samples that are collected
for OpenStack Compute. This option is included in the DEFAULT
section of the ceilometer.conf
configuration file.
You might also specify headers whose values will be stored along with
the sample data of OpenStack Object Storage. The additional information
is also stored under resource_metadata
. The format of the new field
is resource_metadata.http_header_$name
, where $name
is the name of
the header with -
replaced by _
.
For specifying the new header, you need to set metadata_headers
option
under the [filter:ceilometer]
section in proxy-server.conf
under the
swift
folder. You can use this additional data for instance to distinguish
external and internal users.
Measurements are grouped by services which are polled by Telemetry or emit notifications that this service consumes.
The Telemetry service supports storing notifications as events. This functionality was added later, therefore the list of meters still contains existence type and other event related items. The proper way of using Telemetry is to configure it to use the event store and turn off the collection of the event related meters. For further information about events see Events section in the Telemetry documentation. For further information about how to turn on and off meters see Section 10.3.1, “Pipeline configuration”. Please also note that currently no migration is available to move the already existing event type samples to the event store.
The following meters are collected for OpenStack Compute:
Name |
Type |
Unit |
Resource |
Origin |
Support |
Note |
---|---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | ||||||
instance |
Gauge |
instance |
instance ID |
Notification, Pollster |
Libvirt, Hyper-V, vSphere |
Existence of instance |
instance:<type> |
Gauge |
instance |
instance ID |
Notification, Pollster |
Libvirt, Hyper-V, vSphere |
Existence of instance <type> (OpenStack types) |
memory |
Gauge |
MB |
instance ID |
Notification |
Libvirt, Hyper-V |
Volume of RAM allocated to the instance |
memory.usage |
Gauge |
MB |
instance ID |
Pollster |
vSphere |
Volume of RAM used by the instance from the amount of its allocated memory |
cpu |
Cumulative |
ns |
instance ID |
Pollster |
Libvirt, Hyper-V |
CPU time used |
cpu_util |
Gauge |
% |
instance ID |
Pollster |
vSphere |
Average CPU utilization |
vcpus |
Gauge |
vcpu |
instance ID |
Notification |
Libvirt, Hyper-V |
Number of virtual CPUs allocated to the instance |
disk.read.requests |
Cumulative |
request |
instance ID |
Pollster |
Libvirt, Hyper-V |
Number of read requests |
disk.read.requests.rate |
Gauge |
request/s |
instance ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of read requests |
disk.write.requests |
Cumulative |
request |
instance ID |
Pollster |
Libvirt, Hyper-V |
Number of write requests |
disk.write.requests.rate |
Gauge |
request/s |
instance ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of write requests |
disk.read.bytes |
Cumulative |
B |
instance ID |
Pollster |
Libvirt, Hyper-V |
Volume of reads |
disk.read.bytes.rate |
Gauge |
B/s |
instance ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of reads |
disk.write.bytes |
Cumulative |
B |
instance ID |
Pollster |
Libvirt, Hyper-V |
Volume of writes |
disk.write.bytes.rate |
Gauge |
B/s |
instance ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of writes |
disk.root.size |
Gauge |
GB |
instance ID |
Notification |
Libvirt, Hyper-V |
Size of root disk |
disk.ephemeral.size |
Gauge |
GB |
instance ID |
Notification |
Libvirt, Hyper-V |
Size of ephemeral disk |
network.incoming.bytes |
Cumulative |
B |
interface ID |
Pollster |
Libvirt, Hyper-V |
Number of incoming bytes |
network.incoming.bytes.rate |
Gauge |
B/s |
interface ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of incoming bytes |
network.outgoing.bytes |
Cumulative |
B |
interface ID |
Pollster |
Libvirt, Hyper-V |
Number of outgoing bytes |
network.outgoing.bytes.rate |
Gauge |
B/s |
interface ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of outgoing bytes |
network.incoming.packets |
Cumulative |
packet |
interface ID |
Pollster |
Libvirt, Hyper-V |
Number of incoming packets |
network.incoming.packets.rate |
Gauge |
packet/s |
interface ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of incoming packets |
network.outgoing.packets |
Cumulative |
packet |
interface ID |
Pollster |
Libvirt, Hyper-V |
Number of outgoing packets |
network.outgoing.packets.rate |
Gauge |
packet/s |
interface ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of outgoing packets |
Meters added or hypervisor support changed in the Juno release | ||||||
instance |
Gauge |
instance |
instance ID |
Notification, Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Existence of instance |
instance:<type> |
Gauge |
instance |
instance ID |
Notification, Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Existence of instance <type> (OpenStack types) |
memory.usage |
Gauge |
MB |
instance ID |
Pollster |
vSphere, XenAPI |
Volume of RAM used by the instance from the amount of its allocated memory |
cpu_util |
Gauge |
% |
instance ID |
Pollster |
vSphere, XenAPI |
Average CPU utilization |
disk.read.bytes.rate |
Gauge |
B/s |
instance ID |
Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Average rate of reads |
disk.write.bytes.rate |
Gauge |
B/s |
instance ID |
Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Average rate of writes |
disk.device.read.requests |
Cumulative |
request |
disk ID |
Pollster |
Libvirt, Hyper-V |
Number of read requests |
disk.device.read.requests.rate |
Gauge |
request/s |
disk ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of read requests |
disk.device.write.requests |
Cumulative |
request |
disk ID |
Pollster |
Libvirt, Hyper-V |
Number of write requests |
disk.device.write.requests.rate |
Gauge |
request/s |
disk ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of write requests |
disk.device.read.bytes |
Cumulative |
B |
disk ID |
Pollster |
Libvirt, Hyper-V |
Volume of reads |
disk.device.read.bytes .rate |
Gauge |
B/s |
disk ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of reads |
disk.device.write.bytes |
Cumulative |
B |
disk ID |
Pollster |
Libvirt, Hyper-V |
Volume of writes |
disk.device.write.bytes .rate |
Gauge |
B/s |
disk ID |
Pollster |
Libvirt, Hyper-V, vSphere |
Average rate of writes |
network.incoming.bytes.rate |
Gauge |
B/s |
interface ID |
Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Average rate of incoming bytes |
network.outgoing.bytes.rate |
Gauge |
B/s |
interface ID |
Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Average rate of outgoing bytes |
network.incoming.packets.rate |
Gauge |
packet/s |
interface ID |
Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Average rate of incoming packets |
network.outgoing.packets.rate |
Gauge |
packet/s |
interface ID |
Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Average rate of outgoing packets |
Meters added or hypervisor support changed in the Kilo release | ||||||
memory.usage |
Gauge |
MB |
instance ID |
Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Volume of RAM used by the instance from the amount of its allocated memory |
memory.resident |
Gauge |
MB |
instance ID |
Pollster |
Libvirt |
Volume of RAM used by the instance on the physical machine |
disk.latency |
Gauge |
ms |
instance ID |
Pollster |
Hyper-V |
Average disk latency |
disk.iops |
Gauge |
count/s |
instance ID |
Pollster |
Hyper-V |
Average disk iops |
disk.device.latency |
Gauge |
ms |
disk ID |
Pollster |
Hyper-V |
Average disk latency per device |
disk.device.iops |
Gauge |
count/s |
disk ID |
Pollster |
Hyper-V |
Average disk iops per device |
disk.capacity |
Gauge |
B |
instance ID |
Pollster |
Libvirt |
The amount of disk that the instance can see |
disk.allocation |
Gauge |
B |
instance ID |
Pollster |
Libvirt |
The amount of disk occupied by the instance on the host machine |
disk.usage |
Gauge |
B |
instance ID |
Pollster |
Libvirt |
The physical size in bytes of the image container on the host |
disk.device.capacity |
Gauge |
B |
disk ID |
Pollster |
Libvirt |
The amount of disk per device that the instance can see |
disk.device.allocation |
Gauge |
B |
disk ID |
Pollster |
Libvirt |
The amount of disk per device occupied by the instance on the host machine |
disk.device.usage |
Gauge |
B |
disk ID |
Pollster |
Libvirt |
The physical size in bytes of the image container on the host per device |
Meters deprecated in the Kilo release | ||||||
instance:<type> |
Gauge |
instance |
instance ID |
Notification, Pollster |
Libvirt, Hyper-V, vSphere, XenAPI |
Existence of instance <type> (OpenStack types) |
Meters added in the Liberty release | ||||||
cpu.delta |
Delta |
ns |
instance ID |
Pollster |
Libvirt, Hyper-V |
CPU time used since previous datapoint |
Meters added in the Newton release | ||||||
cpu_l3_cache |
Gauge |
B |
instance ID |
Pollster |
Libvirt |
L3 cache used by the instance |
memory.bandwidth.total |
Gauge |
B/s |
instance ID |
Pollster |
Libvirt |
Total system bandwidth from one level of cache |
memory.bandwidth.local |
Gauge |
B/s |
instance ID |
Pollster |
Libvirt |
Bandwidth of memory traffic for a memory controller |
perf.cpu.cycles |
Gauge |
cycle |
instance ID |
Pollster |
Libvirt |
the number of cpu cycles one instruction needs |
perf.instructions |
Gauge |
instruction |
instance ID |
Pollster |
Libvirt |
the count of instructions |
perf.cache.references |
Gauge |
count |
instance ID |
Pollster |
Libvirt |
the count of cache hits |
perf.cache.misses |
Gauge |
count |
instance ID |
Pollster |
Libvirt |
the count of cache misses |
In the Ocata release, the instance
meter is no longer supported..
The instance:<type>
meter can be replaced by using extra parameters in
both the samples and statistics queries. Sample queries look like:
statistics: ceilometer statistics -m instance -g resource_metadata.instance_type samples: ceilometer sample-list -m instance -q metadata.instance_type=<value>
The Telemetry service supports to create new meters by using
transformers. For more details about transformers see
Section 10.3.1.1, “Transformers”. Among the meters gathered from libvirt and
Hyper-V there are a few ones which are generated from other meters. The list of
meters that are created by using the rate_of_change
transformer from the
above table is the following:
cpu_util
disk.read.requests.rate
disk.write.requests.rate
disk.read.bytes.rate
disk.write.bytes.rate
disk.device.read.requests.rate
disk.device.write.requests.rate
disk.device.read.bytes.rate
disk.device.write.bytes.rate
network.incoming.bytes.rate
network.outgoing.bytes.rate
network.incoming.packets.rate
network.outgoing.packets.rate
To enable the libvirt memory.usage
support, you need to install
libvirt version 1.1.1+, QEMU version 1.5+, and you also need to
prepare suitable balloon driver in the image. It is applicable
particularly for Windows guests, most modern Linux distributions
already have it built in. Telemetry is not able to fetch the
memory.usage
samples without the image balloon driver.
OpenStack Compute is capable of collecting CPU
related meters from
the compute host machines. In order to use that you need to set the
compute_monitors
option to ComputeDriverCPUMonitor
in the
nova.conf
configuration file. For further information see the
Compute configuration section in the Compute chapter
of the OpenStack Configuration Reference.
The following host machine related meters are collected for OpenStack Compute:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | |||||
compute.node.cpu.frequency |
Gauge |
MHz |
host ID |
Notification |
CPU frequency |
compute.node.cpu.kernel.time |
Cumulative |
ns |
host ID |
Notification |
CPU kernel time |
compute.node.cpu.idle.time |
Cumulative |
ns |
host ID |
Notification |
CPU idle time |
compute.node.cpu.user.time |
Cumulative |
ns |
host ID |
Notification |
CPU user mode time |
compute.node.cpu.iowait.time |
Cumulative |
ns |
host ID |
Notification |
CPU I/O wait time |
compute.node.cpu.kernel.percent |
Gauge |
% |
host ID |
Notification |
CPU kernel percentage |
compute.node.cpu.idle.percent |
Gauge |
% |
host ID |
Notification |
CPU idle percentage |
compute.node.cpu.user.percent |
Gauge |
% |
host ID |
Notification |
CPU user mode percentage |
compute.node.cpu.iowait.percent |
Gauge |
% |
host ID |
Notification |
CPU I/O wait percentage |
compute.node.cpu.percent |
Gauge |
% |
host ID |
Notification |
CPU utilization |
Telemetry captures notifications that are emitted by the Bare metal service. The source of the notifications are IPMI sensors that collect data from the host machine.
The sensor data is not available in the Bare metal service by default. To enable the meters and configure this module to emit notifications about the measured values see the Installation Guide for the Bare metal service.
The following meters are recorded for the Bare metal service:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Juno release | |||||
hardware.ipmi.fan |
Gauge |
RPM |
fan sensor |
Notification |
Fan rounds per minute (RPM) |
hardware.ipmi.temperature |
Gauge |
C |
temperature sensor |
Notification |
Temperature reading from sensor |
hardware.ipmi.current |
Gauge |
W |
current sensor |
Notification |
Current reading from sensor |
hardware.ipmi.voltage |
Gauge |
V |
voltage sensor |
Notification |
Voltage reading from sensor |
Another way of gathering IPMI based data is to use IPMI sensors
independently from the Bare metal service's components. Same meters as
Section 10.6.2, “Bare metal service” could be fetched except that origin is
Pollster
instead of Notification
.
You need to deploy the ceilometer-agent-ipmi on each IPMI-capable node in order to poll local sensor data. For further information about the IPMI agent see Section 10.2.2.2, “IPMI agent”.
To avoid duplication of metering data and unnecessary load on the
IPMI interface, do not deploy the IPMI agent on nodes that are
managed by the Bare metal service and keep the
conductor.send_sensor_data
option set to False
in the
ironic.conf
configuration file.
Besides generic IPMI sensor data, the following Intel Node Manager meters are recorded from capable platform:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Juno release | |||||
hardware.ipmi.node.power |
Gauge |
W |
host ID |
Pollster |
Current power of the system |
hardware.ipmi.node.temperature |
Gauge |
C |
host ID |
Pollster |
Current temperature of the system |
Meters added in the Kilo release | |||||
hardware.ipmi.node.inlet_temperature |
Gauge |
C |
host ID |
Pollster |
Inlet temperature of the system |
hardware.ipmi.node.outlet_temperature |
Gauge |
C |
host ID |
Pollster |
Outlet temperature of the system |
hardware.ipmi.node.airflow |
Gauge |
CFM |
host ID |
Pollster |
Volumetric airflow of the system, expressed as 1/10th of CFM |
hardware.ipmi.node.cups |
Gauge |
CUPS |
host ID |
Pollster |
CUPS(Compute Usage Per Second) index data of the system |
hardware.ipmi.node.cpu_util |
Gauge |
% |
host ID |
Pollster |
CPU CUPS utilization of the system |
hardware.ipmi.node.mem_util |
Gauge |
% |
host ID |
Pollster |
Memory CUPS utilization of the system |
hardware.ipmi.node.io_util |
Gauge |
% |
host ID |
Pollster |
IO CUPS utilization of the system |
Meters renamed in the Kilo release | |
---|---|
Original Name |
New Name |
hardware.ipmi.node.temperature |
hardware.ipmi.node.inlet_temperature |
hardware.ipmi.node.inlet_temperature |
hardware.ipmi.node.temperature |
Telemetry supports gathering SNMP based generic host meters. In order to be able to collect this data you need to run snmpd on each target host.
The following meters are available about the host machines by using SNMP:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Kilo release | |||||
hardware.cpu.load.1min |
Gauge |
process |
host ID |
Pollster |
CPU load in the past 1 minute |
hardware.cpu.load.5min |
Gauge |
process |
host ID |
Pollster |
CPU load in the past 5 minutes |
hardware.cpu.load.15min |
Gauge |
process |
host ID |
Pollster |
CPU load in the past 15 minutes |
hardware.disk.size.total |
Gauge |
KB |
disk ID |
Pollster |
Total disk size |
hardware.disk.size.used |
Gauge |
KB |
disk ID |
Pollster |
Used disk size |
hardware.memory.total |
Gauge |
KB |
host ID |
Pollster |
Total physical memory size |
hardware.memory.used |
Gauge |
KB |
host ID |
Pollster |
Used physical memory size |
hardware.memory.buffer |
Gauge |
KB |
host ID |
Pollster |
Physical memory buffer size |
hardware.memory.cached |
Gauge |
KB |
host ID |
Pollster |
Cached physical memory size |
hardware.memory.swap.total |
Gauge |
KB |
host ID |
Pollster |
Total swap space size |
hardware.memory.swap.avail |
Gauge |
KB |
host ID |
Pollster |
Available swap space size |
hardware.network.incoming.bytes |
Cumulative |
B |
interface ID |
Pollster |
Bytes received by network interface |
hardware.network.outgoing.bytes |
Cumulative |
B |
interface ID |
Pollster |
Bytes sent by network interface |
hardware.network.outgoing.errors |
Cumulative |
packet |
interface ID |
Pollster |
Sending error of network interface |
hardware.network.ip.incoming.datagrams |
Cumulative |
datagrams |
host ID |
Pollster |
Number of received datagrams |
hardware.network.ip.outgoing.datagrams |
Cumulative |
datagrams |
host ID |
Pollster |
Number of sent datagrams |
hardware.system_stats.io.incoming.blocks |
Cumulative |
blocks |
host ID |
Pollster |
Aggregated number of blocks received to block device |
hardware.system_stats.io.outgoing.blocks |
Cumulative |
blocks |
host ID |
Pollster |
Aggregated number of blocks sent to block device |
hardware.system_stats.cpu.idle |
Gauge |
% |
host ID |
Pollster |
CPU idle percentage |
Meters added in the Mitaka release | |||||
hardware.cpu.util |
Gauge |
% |
host ID |
Pollster |
cpu usage percentage |
The following meters are collected for OpenStack Image service:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | |||||
image |
Gauge |
image |
image ID |
Notification, Pollster |
Existence of the image |
image.size |
Gauge |
image |
image ID |
Notification, Pollster |
Size of the uploaded image |
image.update |
Delta |
image |
image ID |
Notification |
Number of updates on the image |
image.upload |
Delta |
image |
image ID |
Notification |
Number of uploads on the image |
image.delete |
Delta |
image |
image ID |
Notification |
Number of deletes on the image |
image.download |
Delta |
B |
image ID |
Notification |
Image is downloaded |
image.serve |
Delta |
B |
image ID |
Notification |
Image is served out |
The following meters are collected for OpenStack Block Storage:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | |||||
volume |
Gauge |
volume |
volume ID |
Notification |
Existence of the volume |
volume.size |
Gauge |
GB |
volume ID |
Notification |
Size of the volume |
Meters added in the Juno release | |||||
snapshot |
Gauge |
snapshot |
snapshot ID |
Notification |
Existence of the snapshot |
snapshot.size |
Gauge |
GB |
snapshot ID |
Notification |
Size of the snapshot |
Meters added in the Kilo release | |||||
volume.create.(start|end) |
Delta |
volume |
volume ID |
Notification |
Creation of the volume |
volume.delete.(start|end) |
Delta |
volume |
volume ID |
Notification |
Deletion of the volume |
volume.update.(start|end) |
Delta |
volume |
volume ID |
Notification |
Update the name or description of the volume |
volume.resize.(start|end) |
Delta |
volume |
volume ID |
Notification |
Update the size of the volume |
volume.attach.(start|end) |
Delta |
volume |
volume ID |
Notification |
Attaching the volume to an instance |
volume.detach.(start|end) |
Delta |
volume |
volume ID |
Notification |
Detaching the volume from an instance |
snapshot.create.(start|end) |
Delta |
snapshot |
snapshot ID |
Notification |
Creation of the snapshot |
snapshot.delete.(start|end) |
Delta |
snapshot |
snapshot ID |
Notification |
Deletion of the snapshot |
volume.backup.create.(start|end) |
Delta |
volume |
backup ID |
Notification |
Creation of the volume backup |
volume.backup.delete.(start|end) |
Delta |
volume |
backup ID |
Notification |
Deletion of the volume backup |
volume.backup.restore.(start|end) |
Delta |
volume |
backup ID |
Notification |
Restoration of the volume backup |
The following meters are collected for OpenStack Object Storage:
Name |
Type |
Unit |
Resource |
Origin |
Note | |||||
---|---|---|---|---|---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | ||||||||||
storage.objects |
Gauge |
object |
storage ID |
Pollster |
Number of objects | |||||
storage.objects.size |
Gauge |
B |
storage ID |
Pollster |
Total size of stored objects | |||||
storage.objects.containers |
Gauge |
container |
storage ID |
Pollster |
Number of containers | |||||
storage.objects.incoming.bytes |
Delta |
B |
storage ID |
Notification |
Number of incoming bytes | |||||
storage.objects.outgoing.bytes |
Delta |
B |
storage ID |
Notification |
Number of outgoing bytes | |||||
storage.api.request |
Delta |
request |
storage ID |
Notification |
Number of API requests against OpenStack Object Storage | |||||
storage.containers.objects |
Gauge |
object |
storage ID/container |
Pollster |
Number of objects in container | |||||
storage.containers.objects.size |
Gauge |
B |
storage ID/container |
Pollster |
Total size of stored objects in container | |||||
meters deprecated in the Kilo release | ||||||||||
storage.objects.in| Delta | B | storage ID | Notific| Number of incomcoming.bytes | | | | ation | ing bytes | ||||||||||
storage.objects.outgoing.bytes |
Delta |
B |
storage ID |
Notification |
Number of outgoing bytes | |||||
storage.api.request |
Delta |
request |
storage ID |
Notification |
Number of API requests against OpenStack Object Storage |
In order to gather meters from Ceph, you have to install and configure
the Ceph Object Gateway (radosgw) as it is described in the Installation
Manual. You have to enable
usage logging in
order to get the related meters from Ceph. You will also need an
admin
user with users
, buckets
, metadata
and usage
caps
configured.
In order to access Ceph from Telemetry, you need to specify a
service group
for radosgw
in the ceilometer.conf
configuration file along with access_key
and secret_key
of the
admin
user mentioned above.
The following meters are collected for Ceph Object Storage:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Kilo release | |||||
radosgw.objects |
Gauge |
object |
storage ID |
Pollster |
Number of objects |
radosgw.objects.size |
Gauge |
B |
storage ID |
Pollster |
Total size of stored objects |
radosgw.objects.containers |
Gauge |
container |
storage ID |
Pollster |
Number of containers |
radosgw.api.request |
Gauge |
request |
storage ID |
Pollster |
Number of API requests against Ceph Object Gateway (radosgw) |
radosgw.containers.objects |
Gauge |
object |
storage ID/container |
Pollster |
Number of objects in container |
radosgw.containers.objects.size |
Gauge |
B |
storage ID/container |
Pollster |
Total size of stored objects in container |
The usage
related information may not be updated right after an
upload or download, because the Ceph Object Gateway needs time to
update the usage properties. For instance, the default configuration
needs approximately 30 minutes to generate the usage logs.
The following meters are collected for OpenStack Identity:
Name |
Type |
Unit |
Resource |
Origin |
Note | ||||
---|---|---|---|---|---|---|---|---|---|
Meters added in the Juno release | |||||||||
identity.authenticate.success |
Delta |
user |
user ID |
Notification |
User successfully authenticated | ||||
identity.authenticate.pending |
Delta |
user |
user ID |
Notification |
User pending authentication | ||||
identity.authenticate.failure |
Delta |
user |
user ID |
Notification |
User failed to authenticate | ||||
identity.user.created |
Delta |
user |
user ID |
Notification |
User is created | ||||
identity.user.deleted |
Delta |
user |
user ID |
Notification |
User is deleted | ||||
identity.user.updated |
Delta |
user |
user ID |
Notification |
User is updated | ||||
identity.group.created |
Delta |
group |
group ID |
Notification |
Group is created | ||||
identity.group.deleted |
Delta |
group |
group ID |
Notification |
Group is deleted | ||||
identity.group.updated |
Delta |
group |
group ID |
Notification |
Group is updated | ||||
identity.role.created |
Delta |
role |
role ID |
Notification |
Role is created | ||||
identity.role.deleted |
Delta |
role |
role ID |
Notification |
Role is deleted | ||||
identity.role.updated |
Delta |
role |
role ID |
Notification |
Role is updated | ||||
identity.project.created |
Delta |
project |
project ID |
Notification |
Project is created | ||||
identity.project.deleted |
Delta |
project |
project ID |
Notification |
Project is deleted | ||||
identity.project.updated |
Delta |
project |
project ID |
Notification |
Project is updated | ||||
identity.trust.created |
Delta |
trust |
trust ID |
Notification |
Trust is created | ||||
identity.trust.deleted |
Delta |
trust |
trust ID |
Notification |
Trust is deleted | ||||
Meters added in the Kilo release | |||||||||
identity.role_assignment.created |
Delta |
role_assignment |
role ID |
Notification |
Role is added to an actor on a target | ||||
identity.role_assignment.deleted |
Delta |
role_assignment |
role ID |
Notification |
Role is removed from an actor on a target | ||||
All meters thoroughly deprecated in the liberty release |
The following meters are collected for OpenStack Networking:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | |||||
network |
Gauge |
network |
network ID |
Notification |
Existence of network |
network.create |
Delta |
network |
network ID |
Notification |
Creation requests for this network |
network.update |
Delta |
network |
network ID |
Notification |
Update requests for this network |
subnet |
Gauge |
subnet |
subnet ID |
Notification |
Existence of subnet |
subnet.create |
Delta |
subnet |
subnet ID |
Notification |
Creation requests for this subnet |
subnet.update |
Delta |
subnet |
subnet ID |
Notification |
Update requests for this subnet |
port |
Gauge |
port |
port ID |
Notification |
Existence of port |
port.create |
Delta |
port |
port ID |
Notification |
Creation requests for this port |
port.update |
Delta |
port |
port ID |
Notification |
Update requests for this port |
router |
Gauge |
router |
router ID |
Notification |
Existence of router |
router.create |
Delta |
router |
router ID |
Notification |
Creation requests for this router |
router.update |
Delta |
router |
router ID |
Notification |
Update requests for this router |
ip.floating |
Gauge |
ip |
ip ID |
Notification, Pollster |
Existence of IP |
ip.floating.create |
Delta |
ip |
ip ID |
Notification |
Creation requests for this IP |
ip.floating.update |
Delta |
ip |
ip ID |
Notification |
Update requests for this IP |
bandwidth |
Delta |
B |
label ID |
Notification |
Bytes through this l3 metering label |
The following meters are collected for SDN:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | |||||
switch |
Gauge |
switch |
switch ID |
Pollster |
Existence of switch |
switch.port |
Gauge |
port |
switch ID |
Pollster |
Existence of port |
switch.port.receive.packets |
Cumulative |
packet |
switch ID |
Pollster |
Packets received on port |
switch.port.transmit.packets |
Cumulative |
packet |
switch ID |
Pollster |
Packets transmitted on port |
switch.port.receive.bytes |
Cumulative |
B |
switch ID |
Pollster |
Bytes received on port |
switch.port.transmit.bytes |
Cumulative |
B |
switch ID |
Pollster |
Bytes transmitted on port |
switch.port.receive.drops |
Cumulative |
packet |
switch ID |
Pollster |
Drops received on port |
switch.port.transmit.drops |
Cumulative |
packet |
switch ID |
Pollster |
Drops transmitted on port |
switch.port.receive.errors |
Cumulative |
packet |
switch ID |
Pollster |
Errors received on port |
switch.port.transmit.errors |
Cumulative |
packet |
switch ID |
Pollster |
Errors transmitted on port |
switch.port.receive.frame_error |
Cumulative |
packet |
switch ID |
Pollster |
Frame alignment errors received on port |
switch.port.receive.overrun_error |
Cumulative |
packet |
switch ID |
Pollster |
Overrun errors received on port |
switch.port.receive.crc_error |
Cumulative |
packet |
switch ID |
Pollster |
CRC errors received on port |
switch.port.collision.count |
Cumulative |
count |
switch ID |
Pollster |
Collisions on port |
switch.table |
Gauge |
table |
switch ID |
Pollster |
Duration of table |
switch.table.active.entries |
Gauge |
entry |
switch ID |
Pollster |
Active entries in table |
switch.table.lookup.packets |
Gauge |
packet |
switch ID |
Pollster |
Lookup packets for table |
switch.table.matched.packets |
Gauge |
packet |
switch ID |
Pollster |
Packets matches for table |
switch.flow |
Gauge |
flow |
switch ID |
Pollster |
Duration of flow |
switch.flow.duration.seconds |
Gauge |
s |
switch ID |
Pollster |
Duration of flow in seconds |
switch.flow.duration.nanoseconds |
Gauge |
ns |
switch ID |
Pollster |
Duration of flow in nanoseconds |
switch.flow.packets |
Cumulative |
packet |
switch ID |
Pollster |
Packets received |
switch.flow.bytes |
Cumulative |
B |
switch ID |
Pollster |
Bytes received |
These meters are available for OpenFlow based switches. In order to enable these meters, each driver needs to be properly configured.
The following meters are collected for LBaaS v1:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Juno release | |||||
network.services.lb.pool |
Gauge |
pool |
pool ID |
Notification, Pollster |
Existence of a LB pool |
network.services.lb.vip |
Gauge |
vip |
vip ID |
Notification, Pollster |
Existence of a LB VIP |
network.services.lb.member |
Gauge |
member |
member ID |
Notification, Pollster |
Existence of a LB member |
network.services.lb.health_monitor |
Gauge |
health_monitor |
monitor ID |
Notification, Pollster |
Existence of a LB health probe |
network.services.lb.total.connections |
Cumulative |
connection |
pool ID |
Pollster |
Total connections on a LB |
network.services.lb.active.connections |
Gauge |
connection |
pool ID |
Pollster |
Active connections on a LB |
network.services.lb.incoming.bytes |
Gauge |
B |
pool ID |
Pollster |
Number of incoming Bytes |
network.services.lb.outgoing.bytes |
Gauge |
B |
pool ID |
Pollster |
Number of outgoing Bytes |
Meters added in the Kilo release | |||||
network.services.lb.pool.create |
Delta |
pool |
pool ID |
Notification |
LB pool was created |
network.services.lb.pool.update |
Delta |
pool |
pool ID |
Notification |
LB pool was updated |
network.services.lb.vip.create |
Delta |
vip |
vip ID |
Notification |
LB VIP was created |
network.services.lb.vip.update |
Delta |
vip |
vip ID |
Notification |
LB VIP was updated |
network.services.lb.member.create |
Delta |
member |
member ID |
Notification |
LB member was created |
network.services.lb.member.update |
Delta |
member |
member ID |
Notification |
LB member was updated |
network.services.lb.health_monitor.create |
Delta |
health_monitor |
monitor ID |
Notification |
LB health probe was created |
network.services.lb.health_monitor.update |
Delta |
health_monitor |
monitor ID |
Notification |
LB health probe was updated |
The following meters are collected for LBaaS v2. They are added in Mitaka release:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
network.services.lb.pool |
Gauge |
pool |
pool ID |
Notification, Pollster |
Existence of a LB pool |
network.services.lb.listener |
Gauge |
listener |
listener ID |
Notification, Pollster |
Existence of a LB listener |
network.services.lb.member |
Gauge |
member |
member ID |
Notification, Pollster |
Existence of a LB member |
network.services.lb.health_monitor |
Gauge |
health_monitor |
monitor ID |
Notification, Pollster |
Existence of a LB health probe |
network.services.lb.loadbalancer |
Gauge |
loadbalancer |
loadbalancer ID |
Notification, Pollster |
Existence of a LB loadbalancer |
network.services.lb.total.connections |
Cumulative |
connection |
pool ID |
Pollster |
Total connections on a LB |
network.services.lb.active.connections |
Gauge |
connection |
pool ID |
Pollster |
Active connections on a LB |
network.services.lb.incoming.bytes |
Gauge |
B |
pool ID |
Pollster |
Number of incoming Bytes |
network.services.lb.outgoing.bytes |
Gauge |
B |
pool ID |
Pollster |
Number of outgoing Bytes |
network.services.lb.pool.create |
Delta |
pool |
pool ID |
Notification |
LB pool was created |
network.services.lb.pool.update |
Delta |
pool |
pool ID |
Notification |
LB pool was updated |
network.services.lb.listener.create |
Delta |
listener |
listener ID |
Notification |
LB listener was created |
network.services.lb.listener.update |
Delta |
listener |
listener ID |
Notification |
LB listener was updated |
network.services.lb.member.create |
Delta |
member |
member ID |
Notification |
LB member was created |
network.services.lb.member.update |
Delta |
member |
member ID |
Notification |
LB member was updated |
network.services.lb.healthmonitor.create |
Delta |
health_monitor |
monitor ID |
Notification |
LB health probe was created |
network.services.lb.healthmonitor.update |
Delta |
health_monitor |
monitor ID |
Notification |
LB health probe was updated |
network.services.lb.loadbalancer.create |
Delta |
loadbalancer |
loadbalancer ID |
Notification |
LB loadbalancer was created |
network.services.lb.loadbalancer.update |
Delta |
loadbalancer |
loadbalancer ID |
Notification |
LB loadbalancer was updated |
The above meters are experimental and may generate a large load against the Neutron APIs. The future enhancement will be implemented when Neutron supports the new APIs.
The following meters are collected for VPNaaS:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Juno release | |||||
network.services.vpn |
Gauge |
vpnservice |
vpn ID |
Notification, Pollster |
Existence of a VPN |
network.services.vpn.connections |
Gauge |
ipsec_site_connection |
connection ID |
Notification, Pollster |
Existence of an IPSec connection |
Meters added in the Kilo release | |||||
network.services.vpn.create |
Delta |
vpnservice |
vpn ID |
Notification |
VPN was created |
network.services.vpn.update |
Delta |
vpnservice |
vpn ID |
Notification |
VPN was updated |
network.services.vpn.connections.create |
Delta |
ipsec_site_connection |
connection ID |
Notification |
IPSec connection was created |
network.services.vpn.connections.update |
Delta |
ipsec_site_connection |
connection ID |
Notification |
IPSec connection was updated |
network.services.vpn.ipsecpolicy |
Gauge |
ipsecpolicy |
ipsecpolicy ID |
Notification, Pollster |
Existence of an IPSec policy |
network.services.vpn.ipsecpolicy.create |
Delta |
ipsecpolicy |
ipsecpolicy ID |
Notification |
IPSec policy was created |
network.services.vpn.ipsecpolicy.update |
Delta |
ipsecpolicy |
ipsecpolicy ID |
Notification |
IPSec policy was updated |
network.services.vpn.ikepolicy |
Gauge |
ikepolicy |
ikepolicy ID |
Notification, Pollster |
Existence of an Ike policy |
network.services.vpn.ikepolicy.create |
Delta |
ikepolicy |
ikepolicy ID |
Notification |
Ike policy was created |
network.services.vpn.ikepolicy.update |
Delta |
ikepolicy |
ikepolicy ID |
Notification |
Ike policy was updated |
The following meters are collected for FWaaS:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Juno release | |||||
network.services.firewall |
Gauge |
firewall |
firewall ID |
Notification, Pollster |
Existence of a firewall |
network.services.firewall.policy |
Gauge |
firewall_policy |
firewall ID |
Notification, Pollster |
Existence of a firewall policy |
Meters added in the Kilo release | |||||
network.services.firewall.create |
Delta |
firewall |
firewall ID |
Notification |
Firewall was created |
network.services.firewall.update |
Delta |
firewall |
firewall ID |
Notification |
Firewall was updated |
network.services.firewall.policy.create |
Delta |
firewall_policy |
policy ID |
Notification |
Firewall policy was created |
network.services.firewall.policy.update |
Delta |
firewall_policy |
policy ID |
Notification |
Firewall policy was updated |
network.services.firewall.rule |
Gauge |
firewall_rule |
rule ID |
Notification |
Existence of a firewall rule |
network.services.firewall.rule.create |
Delta |
firewall_rule |
rule ID |
Notification |
Firewall rule was created |
network.services.firewall.rule.update |
Delta |
firewall_rule |
rule ID |
Notification |
Firewall rule was updated |
The following meters are collected for the Orchestration service:
Name |
Type |
Unit |
Resource |
Origin |
Note | |||||
---|---|---|---|---|---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | ||||||||||
stack.create |
Delta |
stack |
stack ID |
Notification |
Stack was successfully created | |||||
stack.update |
Delta |
stack |
stack ID |
Notification |
Stack was successfully updated | |||||
stack.delete |
Delta |
stack |
stack ID |
Notification |
Stack was successfully deleted | |||||
stack.resume |
Delta |
stack |
stack ID |
Notification |
Stack was successfully resumed | |||||
stack.suspend |
Delta |
stack |
stack ID |
Notification |
Stack was successfully suspended | |||||
All meters thoroughly deprecated in the Liberty release |
The following meters are collected for the Data processing service for OpenStack:
Name |
Type |
Unit |
Resource |
Origin |
Note | |||||
---|---|---|---|---|---|---|---|---|---|---|
Meters added in the Juno release | ||||||||||
cluster.create |
Delta |
cluster |
cluster ID |
Notification |
Cluster was successfully created | |||||
cluster.update |
Delta |
cluster |
cluster ID |
Notification |
Cluster was successfully updated | |||||
cluster.delete |
Delta |
cluster |
cluster ID |
Notification |
Cluster was successfully deleted | |||||
All meters thoroughly deprecated in the Liberty release |
The following meters are collected for the Key Value Store module:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Kilo release | |||||
magnetodb.table.create |
Gauge |
table |
table ID |
Notification |
Table was successfully created |
magnetodb.table.delete |
Gauge |
table |
table ID |
Notification |
Table was successfully deleted |
magnetodb.table.index.count |
Gauge |
index |
table ID |
Notification |
Number of indices created in a table |
The the Key Value Store meters are not supported in the Newton release and later.
The following energy related meters are available:
Name |
Type |
Unit |
Resource |
Origin |
Note |
---|---|---|---|---|---|
Meters added in the Icehouse release or earlier | |||||
energy |
Cumulative |
kWh |
probe ID |
Pollster |
Amount of energy |
power |
Gauge |
W |
probe ID |
Pollster |
Power consumption |
In addition to meters, the Telemetry service collects events triggered within an OpenStack environment. This section provides a brief summary of the events format in the Telemetry service.
While a sample represents a single, numeric datapoint within a time-series, an event is a broader concept that represents the state of a resource at a point in time. The state may be described using various data types including non-numeric data such as an instance's flavor. In general, events represent any action made in the OpenStack system.
To enable the creation and storage of events in the Telemetry service
store_events
option needs to be set to True
. For further configuration
options, see the event section in the OpenStack Configuration Reference.
It is advisable to set disable_non_metric_meters
to True
when enabling events in the Telemetry service. The Telemetry service
historically represented events as metering data, which may create
duplication of data if both events and non-metric meters are
enabled.
Events captured by the Telemetry service are represented by five key attributes:
A dotted string defining what event occurred such as
"compute.instance.resize.start"
.
A UUID for the event.
A timestamp of when the event occurred in the system.
A flat mapping of key-value pairs which describe the event. The event's traits contain most of the details of the event. Traits are typed, and can be strings, integers, floats, or datetimes.
Mainly for auditing purpose, the full event message can be stored (unindexed) for future evaluation.
The general philosophy of notifications in OpenStack is to emit any and all data someone might need, and let the consumer filter out what they are not interested in. In order to make processing simpler and more efficient, the notifications are stored and processed within Ceilometer as events. The notification payload, which can be an arbitrarily complex JSON data structure, is converted to a flat set of key-value pairs. This conversion is specified by a config file.
The event format is meant for efficient processing and querying.
Storage of complete notifications for auditing purposes can be
enabled by configuring store_raw
option.
The conversion from notifications to events is driven by a configuration
file defined by the definitions_cfg_file
in the ceilometer.conf
configuration file.
This includes descriptions of how to map fields in the notification body to Traits, and optional plug-ins for doing any programmatic translations (splitting a string, forcing case).
The mapping of notifications to events is defined per event_type, which can be wildcarded. Traits are added to events if the corresponding fields in the notification exist and are non-null.
The default definition file included with the Telemetry service contains a list of known notifications and useful traits. The mappings provided can be modified to include more or less data according to user requirements.
If the definitions file is not present, a warning will be logged, but an
empty set of definitions will be assumed. By default, any notifications
that do not have a corresponding event definition in the definitions
file will be converted to events with a set of minimal traits. This can
be changed by setting the option drop_unmatched_notifications
in the
ceilometer.conf
file. If this is set to True
, any unmapped
notifications will be dropped.
The basic set of traits (all are TEXT type) that will be added to all events if the notification has the relevant data are: service (notification's publisher), tenant_id, and request_id. These do not have to be specified in the event definition, they are automatically added, but their definitions can be overridden for a given event_type.
The event definitions file is in YAML format. It consists of a list of
event definitions, which are mappings. Order is significant, the list of
definitions is scanned in reverse order to find a definition which
matches the notification's event_type. That definition will be used to
generate the event. The reverse ordering is done because it is common to
want to have a more general wildcarded definition (such as
compute.instance.*
) with a set of traits common to all of those
events, with a few more specific event definitions afterwards that have
all of the above traits, plus a few more.
Each event definition is a mapping with two keys:
This is a list (or a string, which will be taken as a 1 element
list) of event_types this definition will handle. These can be
wildcarded with unix shell glob syntax. An exclusion listing
(starting with a !
) will exclude any types listed from matching.
If only exclusions are listed, the definition will match anything
not matching the exclusions.
This is a mapping, the keys are the trait names, and the values are trait definitions.
Each trait definition is a mapping with the following keys:
A path specification for the field(s) in the notification you wish
to extract for this trait. Specifications can be written to match
multiple possible fields. By default the value will be the first
such field. The paths can be specified with a dot syntax
(payload.host
). Square bracket syntax (payload[host]
) is
also supported. In either case, if the key for the field you are
looking for contains special characters, like .
, it will need to
be quoted (with double or single quotes):
payload.image_meta.’org.openstack__1__architecture’
. The syntax
used for the field specification is a variant of
JSONPath
(Optional) The data type for this trait. Valid options are:
text
, int
, float
, and datetime
. Defaults to text
if not specified.
(Optional) Used to execute simple programmatic conversions on the value in a notification field.
You can configure the Telemetry service to deliver the events
into external sinks. These sinks are configurable in the
/etc/ceilometer/event_pipeline.yaml
file.
The Telemetry service has similar log settings as the other OpenStack services. Multiple options are available to change the target of logging, the format of the log entries and the log levels.
The log settings can be changed in ceilometer.conf
. The list of
configuration options are listed in the logging configuration options
table in the Telemetry
section
in the OpenStack Configuration Reference.
By default stderr
is used as standard output for the log messages.
It can be changed to either a log file or syslog. The debug
and
verbose
options are also set to false in the default settings, the
default log levels of the corresponding modules can be found in the
table referred above.
As it can be seen in Bug 1355809, the wrong ordering of service startup can result in data loss.
When the services are started for the first time or in line with the
message queue service restart, it takes time while the
ceilometer-collector
service establishes the connection and joins or
rejoins to the configured exchanges. Therefore, if the
ceilometer-agent-compute
, ceilometer-agent-central
, and the
ceilometer-agent-notification
services are started before
the ceilometer-collector
service, the ceilometer-collector
service
may lose some messages while connecting to the message queue service.
The possibility of this issue to happen is higher, when the polling
interval is set to a relatively short period. In order to avoid this
situation, the recommended order of service startup is to start or
restart the ceilometer-collector
service after the message queue. All
the other Telemetry services should be started or restarted after and
the ceilometer-agent-compute
should be the last in the sequence, as this
component emits metering messages in order to send the samples to the
collector.
In the Icehouse release of OpenStack a new service was introduced to be responsible for consuming notifications that are coming from other OpenStack services.
If the ceilometer-agent-notification
service is not installed and
started, samples originating from notifications will not be generated.
In case of the lack of notification based samples, the state of this
service and the log file of Telemetry should be checked first.
For the list of meters that are originated from notifications, see the Telemetry Measurements Reference.
When using the Telemetry command-line client, the credentials and the
os_auth_url
have to be set in order for the client to authenticate
against OpenStack Identity. For further details
about the credentials that have to be provided see the Telemetry Python
API.
The service catalog provided by OpenStack Identity contains the
URLs that are available for authentication. The URLs have
different port
s, based on whether the type of the given URL is
public
, internal
or admin
.
OpenStack Identity is about to change API version from v2 to v3. The
adminURL
endpoint (which is available via the port: 35357
)
supports only the v3 version, while the other two supports both.
The Telemetry command line client is not adapted to the v3 version of
the OpenStack Identity API. If the adminURL
is used as
os_auth_url
, the ceilometer
command results in the following
error message:
$ ceilometer meter-list Unable to determine the Keystone version to authenticate with \ using the given auth_url: http://10.0.2.15:35357/v2.0
Therefore when specifying the os_auth_url
parameter on the command
line or by using environment variable, use the internalURL
or
publicURL
.
For more details check the bug report Bug 1351841.
The following are some suggested best practices to follow when deploying and configuring the Telemetry service. The best practices are divided into data collection and storage.
The Telemetry service collects a continuously growing set of data. Not all the data will be relevant for an administrator to monitor.
Based on your needs, you can edit the pipeline.yaml
configuration
file to include a selected number of meters while disregarding the
rest.
By default, Telemetry service polls the service APIs every 10
minutes. You can change the polling interval on a per meter basis by
editing the pipeline.yaml
configuration file.
If the polling interval is too short, it will likely cause increase of stored data and the stress on the service APIs.
Expand the configuration to have greater control over different meter intervals.
If you are using the Kilo version of Telemetry, you can delay or adjust
polling requests by enabling the jitter support. This adds a random
delay on how the polling agents send requests to the service APIs. To
enable jitter, set shuffle_time_before_polling_task
in the
ceilometer.conf
configuration file to an integer greater
than 0.
If you are using Juno or later releases, based on the number of resources that will be polled, you can add additional central and compute agents as necessary. The agents are designed to scale horizontally.
If you are using Juno or later releases, use the notifier://
publisher rather than rpc://
as there is a certain level of overhead
that comes with RPC.
We recommend that you avoid open-ended queries. In order to get better performance you can use reasonable time ranges and/or other query constraints for retrieving measurements.
For example, this open-ended query might return an unpredictable amount of data:
$ ceilometer sample-list --meter cpu -q 'resource_id=INSTANCE_ID_1'
Whereas, this well-formed query returns a more reasonable amount of data, hence better performance:
$ ceilometer sample-list --meter cpu -q 'resource_id=INSTANCE_ID_1;timestamp > 2015-05-01T00:00:00;timestamp < 2015-06-01T00:00:00'
As of the Liberty release, the number of items returned will be
restricted to the value defined by default_api_return_limit
in the
ceilometer.conf
configuration file. Alternatively, the value can
be set per query by passing limit
option in request.
You can install the API behind mod_wsgi
, as it provides more
settings to tweak, like threads
and processes
in case of
WSGIDaemon
.
The collection service provided by the Telemetry project is not intended to be an archival service. Set a Time to Live (TTL) value to expire data and minimize the database size. If you would like to keep your data for longer time period, you may consider storing it in a data warehouse outside of Telemetry.
We recommend that you do not use SQLAlchemy back end prior to the Juno release, as it previously contained extraneous relationships to handle deprecated data models. This resulted in extremely poor query performance.
We recommend that you do not run MongoDB on the same node as the controller. Keep it on a separate node optimized for fast storage for better performance. Also it is advisable for the MongoDB node to have a lot of memory.
Use replica sets in MongoDB. Replica sets provide high availability through automatic failover. If your primary node fails, MongoDB will elect a secondary node to replace the primary node, and your cluster will remain functional.
For more information on replica sets, see the MongoDB replica sets docs.
Use sharding in MongoDB. Sharding helps in storing data records across multiple machines and is the MongoDB’s approach to meet the demands of data growth.
For more information on sharding, see the MongoDB sharding docs.