stabilizing the jenga tower: scaling out ceilometer

37
Stabilizing the Jenga tower: Scaling out Ceilometer Gordon Chung & Pradeep Kilambi Engineers @ Red Hat, Inc.

Upload: pradeep-kilambi

Post on 11-Aug-2015

117 views

Category:

Engineering


6 download

TRANSCRIPT

Page 1: Stabilizing the Jenga tower: Scaling out Ceilometer

Stabilizing the Jenga tower: Scaling out Ceilometer

Gordon Chung & Pradeep KilambiEngineers @ Red Hat, Inc.

Page 2: Stabilizing the Jenga tower: Scaling out Ceilometer

Our Mission

“To reliably collect measurements of the utilization of physical & virtual resources comprising deployed clouds, persist this data for subsequent retrieval & analysis, and trigger actions when defined criteria are met."

Page 3: Stabilizing the Jenga tower: Scaling out Ceilometer

Overview

● Collect physical and virtual resource data ● Transform data to something measurable● Publish data to various targets● Persist data to storage● Retrieve data via API for further analysis, billing,

triggering actions etc.

Collect Transform Publish Persist Retrieve

Page 4: Stabilizing the Jenga tower: Scaling out Ceilometer

Architecture (Icehouse)OpenStack Services

Notification Bus

AP

I

External Systems

Notification Agents

Agent1

AgentN

Agent2

Pipeline

Polling Agents

Agent1

AgentN

Agent2

Pipeline

Database

EventsMetersAlarms

Alarm

Evaluator

Alarm

Notifier

Collectors

Collector1

CollectorN

Collector2

Partial HASupport

Active-Active HA support

Page 5: Stabilizing the Jenga tower: Scaling out Ceilometer

Ceilometer as it’s perceived

Ceilometer

Cloud Admin

Page 6: Stabilizing the Jenga tower: Scaling out Ceilometer
Page 7: Stabilizing the Jenga tower: Scaling out Ceilometer

“API response too slow”

Page 8: Stabilizing the Jenga tower: Scaling out Ceilometer

“When Ceilometer dies, Glance dies.”

Page 9: Stabilizing the Jenga tower: Scaling out Ceilometer

“Ceilometer is leaking memory”

Page 10: Stabilizing the Jenga tower: Scaling out Ceilometer

“Ceilometer doesn’t scale”

Page 11: Stabilizing the Jenga tower: Scaling out Ceilometer

“HAProxy is messing with MongoDB replica-sets”

Page 12: Stabilizing the Jenga tower: Scaling out Ceilometer

“Ceilometer is not Production Ready”

Page 13: Stabilizing the Jenga tower: Scaling out Ceilometer

Evolution of Ceilometer

Page 14: Stabilizing the Jenga tower: Scaling out Ceilometer

Architecture (Juno)OpenStack Services

Notification Bus

AP

I

External Systems

Notification Agents

Agent1

AgentN

Agent2

Pipeline

Polling Agents

Agent1

AgentN

Agent2

Pipeline

Databases

Alarms

EventsMeters

Alarm

Evaluator

Alarm

Notifier

Collectors

Collector1

CollectorN

Collector2

Partial HASupport

Active-Active HA support

Page 15: Stabilizing the Jenga tower: Scaling out Ceilometer

Active/Active Workload Partitioning

Page 16: Stabilizing the Jenga tower: Scaling out Ceilometer

Architecture (Kilo)OpenStack Services

Notification Bus

AP

I

External Systems

Notification Agents

Agent1

AgentN

Agent2

Pipeline

Polling Agents

Agent1

AgentN

Agent2

Pipeline

Databases

Alarms

Alarm

Evaluator

Alarm

Notifier

Collectors

Collector1

CollectorN

Collector2

Meters

Events

Active-Active HA support

Page 17: Stabilizing the Jenga tower: Scaling out Ceilometer

best Practices

Page 18: Stabilizing the Jenga tower: Scaling out Ceilometer

Best Practices (Data Collection)

● Modify your pipeline to match requirements○ Collect only meters you need by tuning pipeline.yaml○ Tweak polling interval as needed

● Enable jittering to polling (Kilo+)● Scale out - add agents as load increases (Juno+)● Use notifier publisher vs rpc publisher (Juno+)

Page 19: Stabilizing the Jenga tower: Scaling out Ceilometer

Best Practices (Data Storage)

● Avoid open-ended queries, query on a time range● Install API behind mod_wsgi● Tweak WSGIDaemon settings such as threads and

processes● Set a TTL, expire data to minimise database size● Run mongodb on a separate node

○ Use sharding and replica-sets

Page 20: Stabilizing the Jenga tower: Scaling out Ceilometer

Different Strokes for Different Folks

Page 21: Stabilizing the Jenga tower: Scaling out Ceilometer

Deployment Scenarios (Lambda Design)

Polling /Notification

AgentsQueue1

Queue2

Short-TermDatabase

ArchiveDatabase

Collector (short-term)

Collector (short-term)

Collector (short-term)

Collector (short-term)

Collector (short-term)

Collector (long-term)

Page 22: Stabilizing the Jenga tower: Scaling out Ceilometer

Deployment Scenarios (Data Segregation)

Polling /Notification

AgentsQueue1

Queue2

Database

AuditDatabase

Collector (short-term)

Collector (short-term)Collector (public)

Collector (short-term)

Collector (short-term)

Collector(audit)

Page 23: Stabilizing the Jenga tower: Scaling out Ceilometer

Deployment Scenarios (JSON Files)

Polling /Notification

AgentsQueue1

Collector (short-term)

Collector (short-term)Collector Apache Spark

JSON files

Page 24: Stabilizing the Jenga tower: Scaling out Ceilometer

Deployment Scenarios (Fraud Detection)

Polling /Notification

AgentsQueue

Collector (short-term)

Collector (short-term)Collector

Proprietary Alerting System

HTTP

Page 25: Stabilizing the Jenga tower: Scaling out Ceilometer

Deployment Scenarios (Custom consumers)

Polling /Notification

AgentsKafka Apache Storm

Page 26: Stabilizing the Jenga tower: Scaling out Ceilometer

Deployment Scenarios (Debugging)

Polling /Notification

Agents

EventQueue Collectors ElasticSearch

Kibana

Page 27: Stabilizing the Jenga tower: Scaling out Ceilometer

OpenStack Services

Deployment Scenarios (Noisy Services)

Notification Bus

Notification Bus

Databases

Alarms

Collectors

Collector1

CollectorN

Collector2

Meters

Events

Notification Agents

Agent1

AgentN

Agent2

Pipeline

Page 28: Stabilizing the Jenga tower: Scaling out Ceilometer

Continual Evolution

Continual Evolution

Page 29: Stabilizing the Jenga tower: Scaling out Ceilometer

Liberty

● Gnocchi Integration● Building up events● Declarative data collection● Minimise the bloat

Page 30: Stabilizing the Jenga tower: Scaling out Ceilometer

Gnocchi: Resource Metering as a Service

● Lightweight time-series metadata● Separate storage and data models for

resources and time-series data● indexer for metrics and resources● Eagerly pre-aggregates metric data● Supports restricted cross-metric

aggregation● Per time-series configurable retention policy

Page 31: Stabilizing the Jenga tower: Scaling out Ceilometer

Size matters { "_id": ObjectId("55103dd3bf4d2c7a7de6e319"), "counter_name": "cpu", "user_id": "72bd0799d496476f9eed16d49e0b86e9", "resource_id": "d7f94857-a0d8-4864-8ab1-124055950973", "timestamp": ISODate("2015-03-23T16:22:43Z"), "message_signature": "539736605d14c0aa8c85058e6e9e67a078146f2e80a218d8dc6711c8d6875ae5", "message_id": "d559f244-d178-11e4-9fa9-28b2bd01ed52", "source": "openstack", "counter_unit": "ns", "counter_volume": NumberLong("22450000000"), "recorded_at": ISODate("2015-03-23T16:22:43.412Z"), "project_id": "99fb96cb63624163975dcbf95d7d2d6f", "resource_metadata": { "status": "active", "cpu_number": 1, "ephemeral_gb": 0, "display_name": "inst-3", "name": "instance-00000003", "disk_gb": 0, "kernel_id": "4e303a91-ae5b-43c7-b823-fd6f2cceab4e", "image": { "id": "490af6b0-2402-45d8-bcb1-c81376326e8d", "links": [ { "href": "http://10.162.32.175:8774/837660dc95324be594a0607d80a22c53/images/490af6b0-2402-45d8-bcb1-c81376326e8d", "rel": "bookmark" } ], "name": "cirros-0.3.2-x86_64-uec" }, "ramdisk_id": "7112ea15-3ece-4805-9f23-f6141a6f27b0", "vcpus": 1, "memory_mb": 64, "instance_type": "42", …..}

{"2015-03-23T16:22:43Z" : 1 }

gnocchi datapoint

ceilometer datapoint (mongodb)

Vs

Page 32: Stabilizing the Jenga tower: Scaling out Ceilometer

Gnocchi Benchmarks

Page 33: Stabilizing the Jenga tower: Scaling out Ceilometer

Gnocchi Benchmarks

Page 34: Stabilizing the Jenga tower: Scaling out Ceilometer

Discussions

● operators session - May 19, 2015 (12:05pm) Rm 306● design track - May 20, 2015 (9:00am - 3:30pm)

○ event alarms; ceilometer componentisation ● design track - May 21, 2015 (9:00am - 12:30pm)● speaker session:

○ The Anatomy of an Action - May 21, 2015 (1:30pm)

● irc: #openstack-ceilometer● mailing-list: [email protected]

Page 35: Stabilizing the Jenga tower: Scaling out Ceilometer

Gnocchi

Architecture (Gnocchi)OpenStack Services

Notification Bus

AP

I

External Systems

Notification Agents

Agent1

AgentN

Agent2

Pipeline

Polling Agents

Agent1

AgentN

Agent2

Pipeline

Databases

Alarms

Alarm

E

valuator

Alarm

Notifier

Collectors

Collector1

CollectorN

Collector2

Events

Active-Active HA support

API

Metric Resources

Page 37: Stabilizing the Jenga tower: Scaling out Ceilometer

Thank You