the gnocchi experiment
TRANSCRIPT
History
● Ceilometer started in 2012○ Original mission: provide an infrastructure to collect any
information needed regarding OpenStack projects
● Added alarming in 2013○ Create rules and based on threshold conditions that when broken
trigger action
● Added events in 2014○ The state of an object in an OpenStack service at a point in time
● New mission○ To reliably collect data on the utilization of the physical and
virtual resources comprising deployed clouds, persist these data for
subsequent retrieval and analysis, and trigger actions when defined criteria are met.
Ceilometer Architecture
OpenStack Services
Notification Bus
AP
I
External Systems
Notification Agents
Agent1
AgentN
Agent2
Pipeline
Polling Agents
Agent1
AgentN
Agent2
Pipeline
Databases
AlarmsEventsMeters
Alarm
Evaluator
Alarm
Notifier
Collectors
Collector1
CollectorN
Collector2
Growing pains
● Too large of a scope - we did everything● Too complex - must deploy everything● Too much data - all data in one place● Too few resources - handful of developers● Too generic a solution - storage designed to handle any
scenario● Good at nothing, average/bad at everything
Ceilometer
Gnocchi
Ceilometer Architecture
Notification Bus
Aodh
OpenStack Services
Metrics A
PI
External Systems
Notification Agents
Agent1
AgentN
Agent2
Pipeline
Polling Agents
Agent1
AgentN
Agent2
PankoAlarms
Events
Metrics
Alarm
Evaluator
Collectors
Collector1
CollectorN
Collector2 Alarm
Notifier
Events A
PI
Componentisation
● Split functionality into own projects○ Faster rate of change○ Less expertise
● Important functionality lives● Ceilometer - data gathering and transformation service● Gnocchi - time series storage service● Aodh - alarming service● Panko - event focused storage service● They all work together and separately
Gnocchi use cases
● Storage brick for a billing system● Alarm-triggering or monitoring system● Statistical usage of data
Ceilometer to Gnocchi
● Ceilometer legacy storage captures full-resolution data○ Each datapoint has:
Timestamp, measurement, IDs, resource metadata, metric metadata, etc…
● Gnocchi stores pre-aggregated data in a timeserie○ Each datapoint has:
Timestamp, measurement… that’s it… and then it’s compressed
○ resource metadata is an explicit subset AND not tied to measurement
○ Defined archival rules■ capture data at 1 min
granularity for 1 day AND 3 hr granularity for 1 month AND ...
CeilometerRaw sample
{ "user_id": "0d9d089b8f8340999fbe01354ef84643", "resource_id": "a7c7cf84-5bf7-4838-a116-645ea376f4e0", "timestamp": "2016-05-11T18:23:46.166000", "meter": "disk.write.bytes", "volume": 56114794496, "source": "openstack", "recorded_at": "2016-05-11T18:23:47.177000", "project_id": "dec2b73655154e31be903fc93e575146", "type": "cumulative", "id": "7fbf56ca-17a5-11e6-a210-e8bdd1f62a56", "unit": "B", "metadata": { "instance_host": "cloud03.wz", "ephemeral_gb": "0", "flavor.vcpus": "8", "OS-EXT-AZ.availability_zone": "nova", "memory_mb": "16384", "display_name": "gord_dev", "state": "active", "flavor.id": "5", "status": "active", "ramdisk_id": "None", "flavor.name": "m1.xlarge", "disk_gb": "160", "kernel_id": "None", "image.id": "dba2c73c-3f11-45a1-998a-6a4ca2cf243e", "flavor.ram": "16384", "host": "64fe410a8b602f69fe43a180c62b02d6c00e41c03caba40a092e2fb6", "device": "['vda']", "flavor.ephemeral": "0", "image.name": "fedora-23-x86_64", }}
Separation of value
Resource
● Id● User_id● Project_id● Start_timestamp: timestamp● End_timestamp: timestamp● Metadata: {attribute: value}● Metric: list
Measurements
● [ (timestamp, value), ... ]
Metric
● Name● archive_policy
MetricD Aggregation
Metric Storage
MetricDComputation
workers2
raw metric dump
computed aggregates
13backlog
1. Get unprocessed datapoint2. Compute new aggregations
a. Update sum, avg, min, max, etc… values based on define policy
3. Add datapoint to backlog for next computationa. Delete datapoints not required for
future aggregationsb. By default, only keep backlog for
single period.
Storage format
Metric Storageraw metric dump
computed aggregates
backlog
● [ (timestamp, value), (timestamp,value) ]● One object per write
● { values: { timestamp: value, timestamp:value }, block_size: max number of points, back_window: number of blocks to retain}
● Binary serialised using msgpacks● One object per metric
● { first_timestamp: first timestamp of block, aggregation_method: sum, min, max, etc…, max_size: max number of points, sampling: granularity (60s, 300s, etc…), timestamps: [ time1, time2, … ], values: [value1, value2, … ]}
● Binary serialised using msgpacks● Compressed with LZ4● Split into chunks to minimise transfer when updating large series● (potentially) multiple objects per aggregate per granularity per metric
Query path
AP
I
Resource Indexer
Metric Storage
What’s the cpu utilisation for VM1?
resource_id
Meausures (all granularities)
metric_id
+---------------------------+-------------+----------------+| timestamp | granularity | value |+---------------------------+-------------+----------------+| 2016-04-07T00:00:00+00:00 | 86400.0 | 0.30323927544 || 2016-04-07T17:00:00+00:00 | 3600.0 | 1.2855184725 || 2016-04-07T18:00:00+00:00 | 3600.0 | 0.188613527791 || 2016-04-07T19:00:00+00:00 | 3600.0 | 0.188871232024 || 2016-04-07T20:00:00+00:00 | 3600.0 | 0.188876901916 || 2016-04-07T21:00:00+00:00 | 3600.0 | 0.189646641908 || 2016-04-07T21:10:00+00:00 | 300.0 | 0.190019839676 || 2016-04-07T21:15:00+00:00 | 300.0 | 0.186565358466 || 2016-04-07T21:20:00+00:00 | 300.0 | 0.183166934543 || 2016-04-07T21:25:00+00:00 | 300.0 | 0.179994544916 || 2016-04-07T21:30:00+00:00 | 300.0 | 0.186649908928 || 2016-04-07T21:35:00+00:00 | 300.0 | 0.193315212093 || 2016-04-07T21:40:00+00:00 | 300.0 | 0.193272093903 || 2016-04-07T21:45:00+00:00 | 300.0 | 0.196677374077 || 2016-04-07T21:50:00+00:00 | 300.0 | 0.193300189049 |+---------------------------+-------------+----------------+
metric_id
Query pathA
PI
Resource Indexer
Metric Storage
What’s the metadata for VM1? resource_id
resource+-----------------------+----------------------------------------------------------------+
| Field | Value |
+-----------------------+----------------------------------------------------------------+
| created_by_project_id | f7481a38d7c543528d5121fab9eb2b99 |
| created_by_user_id | 9246f424dcb341478067967f495dc133 |
| display_name | test3 |
| ended_at | None |
| flavor_id | 1 |
| host | 7f218c8350a86a71dbe6d14d57e8f74fa60ac360fee825192a6cf624 |
| id | e90974a6-31bf-4e47-8824-ca074cd9b47d |
| image_ref | 671375cc-177b-497a-8551-4351af3f856d |
| metrics | cpu.delta: 20cd1d71-de2f-43d5-90a8-b23ad31a7d04 |
| | cpu_util: 22cd22e7-e48e-4f21-887a-b1c6612b4c98 |
| | disk.iops: 9611a114-d37e-42e7-9b0c-0fb5e61d96c8 |
| | disk.latency: 6205c66f-2a5d-49c8-85e6-aa7572cfb34a |
| | disk.root.size: c9f9ca31-7e54-4dd7-81ad-129d86951dbc |
| | disk.usage: 4f29ca2e-d58f-40a9-94a7-15084233c1bb |
| original_resource_id | e90974a6-31bf-4e47-8824-ca074cd9b47d |
| project_id | 71bf402adea343609f2192ce998fa38e |
| revision_end | None |
| revision_start | 2016-04-07T17:32:33.245924+00:00 |
| server_group | None |
| started_at | 2016-04-07T17:32:25.740862+00:00 |
| type | instance |
| user_id | fd3eb127863b4177bf1abb38dda1f557 |
+-----------------------+----------------------------------------------------------------+
Ceilometer to Gnocchi
Ceilometer legacy storage
● Single datapoint averages to ~1.5KB/point (mongodb) or ~150B/point (SQL)
● For 1000 VM, capturing 10 metrics/VM, every minute:~15MB/minute, ~900MB/hour, ~21GB/day, etc…
Gnocchi
● Single datapoint AT MOST is 9B/point
● For 1000 VM, capturing 10 metrics/VM, every minute:~90KB/minute, ~5.4MB/hour, ~130MB/day, etc…