the gnocchi experiment

The Gnocchi Experiment

playing with timeseries

History

● Ceilometer started in 2012○ Original mission: provide an infrastructure to collect any

information needed regarding OpenStack projects

● Added alarming in 2013○ Create rules and based on threshold conditions that when broken

trigger action

● Added events in 2014○ The state of an object in an OpenStack service at a point in time

● New mission○ To reliably collect data on the utilization of the physical and

virtual resources comprising deployed clouds, persist these data for

subsequent retrieval and analysis, and trigger actions when defined criteria are met.

Ceilometer Architecture

OpenStack Services

Notification Bus

AP

I

External Systems

Notification Agents

Agent1

AgentN

Agent2

Pipeline

Polling Agents

Agent1

AgentN

Agent2

Pipeline

Databases

AlarmsEventsMeters

Alarm

Evaluator

Alarm

Notifier

Collectors

Collector1

CollectorN

Collector2

this didn’t work.

Growing pains

● Too large of a scope - we did everything● Too complex - must deploy everything● Too much data - all data in one place● Too few resources - handful of developers● Too generic a solution - storage designed to handle any

scenario● Good at nothing, average/bad at everything

Ceilometer

Gnocchi

Ceilometer Architecture

Notification Bus

Aodh

OpenStack Services

Metrics A

PI

External Systems

Notification Agents

Agent1

AgentN

Agent2

Pipeline

Polling Agents

Agent1

AgentN

Agent2

PankoAlarms

Events

Metrics

Alarm

Evaluator

Collectors

Collector1

CollectorN

Collector2 Alarm

Notifier

Events A

PI

Componentisation

● Split functionality into own projects○ Faster rate of change○ Less expertise

● Important functionality lives● Ceilometer - data gathering and transformation service● Gnocchi - time series storage service● Aodh - alarming service● Panko - event focused storage service● They all work together and separately

Gnocchi

Gnocchi use cases

● Storage brick for a billing system● Alarm-triggering or monitoring system● Statistical usage of data

Ceilometer to Gnocchi

● Ceilometer legacy storage captures full-resolution data○ Each datapoint has:

Timestamp, measurement, IDs, resource metadata, metric metadata, etc…

● Gnocchi stores pre-aggregated data in a timeserie○ Each datapoint has:

Timestamp, measurement… that’s it… and then it’s compressed

○ resource metadata is an explicit subset AND not tied to measurement

○ Defined archival rules■ capture data at 1 min

granularity for 1 day AND 3 hr granularity for 1 month AND ...

Archive Policies

5 minute granularity for a day

1 day granularity for a year

How it all works...

CeilometerRaw sample

{ "user_id": "0d9d089b8f8340999fbe01354ef84643", "resource_id": "a7c7cf84-5bf7-4838-a116-645ea376f4e0", "timestamp": "2016-05-11T18:23:46.166000", "meter": "disk.write.bytes", "volume": 56114794496, "source": "openstack", "recorded_at": "2016-05-11T18:23:47.177000", "project_id": "dec2b73655154e31be903fc93e575146", "type": "cumulative", "id": "7fbf56ca-17a5-11e6-a210-e8bdd1f62a56", "unit": "B", "metadata": { "instance_host": "cloud03.wz", "ephemeral_gb": "0", "flavor.vcpus": "8", "OS-EXT-AZ.availability_zone": "nova", "memory_mb": "16384", "display_name": "gord_dev", "state": "active", "flavor.id": "5", "status": "active", "ramdisk_id": "None", "flavor.name": "m1.xlarge", "disk_gb": "160", "kernel_id": "None", "image.id": "dba2c73c-3f11-45a1-998a-6a4ca2cf243e", "flavor.ram": "16384", "host": "64fe410a8b602f69fe43a180c62b02d6c00e41c03caba40a092e2fb6", "device": "['vda']", "flavor.ephemeral": "0", "image.name": "fedora-23-x86_64", }}

Separation of value

Resource

● Id● User_id● Project_id● Start_timestamp: timestamp● End_timestamp: timestamp● Metadata: {attribute: value}● Metric: list

Measurements

● [ (timestamp, value), ... ]

Metric

● Name● archive_policy

Gnocchi Architecture

AP

I

Resource Indexer

Metric Storage MetricD

Computation workers

data

MetricD Aggregation

Metric Storage

MetricDComputation

workers2

raw metric dump

computed aggregates

13backlog

1. Get unprocessed datapoint2. Compute new aggregations

a. Update sum, avg, min, max, etc… values based on define policy

3. Add datapoint to backlog for next computationa. Delete datapoints not required for

future aggregationsb. By default, only keep backlog for

single period.

Storage format

Metric Storageraw metric dump

computed aggregates

backlog

● [ (timestamp, value), (timestamp,value) ]● One object per write

● { values: { timestamp: value, timestamp:value }, block_size: max number of points, back_window: number of blocks to retain}

● Binary serialised using msgpacks● One object per metric

● { first_timestamp: first timestamp of block, aggregation_method: sum, min, max, etc…, max_size: max number of points, sampling: granularity (60s, 300s, etc…), timestamps: [ time1, time2, … ], values: [value1, value2, … ]}

● Binary serialised using msgpacks● Compressed with LZ4● Split into chunks to minimise transfer when updating large series● (potentially) multiple objects per aggregate per granularity per metric

Query path

AP

I

Resource Indexer

Metric Storage

What’s the cpu utilisation for VM1?

resource_id

Meausures (all granularities)

metric_id

+---------------------------+-------------+----------------+| timestamp | granularity | value |+---------------------------+-------------+----------------+| 2016-04-07T00:00:00+00:00 | 86400.0 | 0.30323927544 || 2016-04-07T17:00:00+00:00 | 3600.0 | 1.2855184725 || 2016-04-07T18:00:00+00:00 | 3600.0 | 0.188613527791 || 2016-04-07T19:00:00+00:00 | 3600.0 | 0.188871232024 || 2016-04-07T20:00:00+00:00 | 3600.0 | 0.188876901916 || 2016-04-07T21:00:00+00:00 | 3600.0 | 0.189646641908 || 2016-04-07T21:10:00+00:00 | 300.0 | 0.190019839676 || 2016-04-07T21:15:00+00:00 | 300.0 | 0.186565358466 || 2016-04-07T21:20:00+00:00 | 300.0 | 0.183166934543 || 2016-04-07T21:25:00+00:00 | 300.0 | 0.179994544916 || 2016-04-07T21:30:00+00:00 | 300.0 | 0.186649908928 || 2016-04-07T21:35:00+00:00 | 300.0 | 0.193315212093 || 2016-04-07T21:40:00+00:00 | 300.0 | 0.193272093903 || 2016-04-07T21:45:00+00:00 | 300.0 | 0.196677374077 || 2016-04-07T21:50:00+00:00 | 300.0 | 0.193300189049 |+---------------------------+-------------+----------------+

metric_id

Zero computation at query. Only lookup.

Results (benchmark data, Gnocchi 1.3.x)

Ceilometer to Gnocchi

Ceilometer legacy storage

● Single datapoint averages to ~1.5KB/point (mongodb) or ~150B/point (SQL)

● For 1000 VM, capturing 10 metrics/VM, every minute:~15MB/minute, ~900MB/hour, ~21GB/day, etc…

Gnocchi

● Single datapoint AT MOST is 9B/point

● For 1000 VM, capturing 10 metrics/VM, every minute:~90KB/minute, ~5.4MB/hour, ~130MB/day, etc…

the gnocchi experiment

Technology