monasca - netways...... what every software engineer should know about real-time data's...

54
Monasca Monitoring/Logging-as-a-Service (at-scale)

Upload: hadang

Post on 28-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

MonascaMonitoringLogging-as-a-Service (at-scale)

Speaker

Roland Hochmuth

Hewlett Packard Enterprise

Fort Collins Colorado USA

Agenda

bull Describe how to build a highly scalable monitoring and logging as a service platform

bull Architectural and design principles

bull Scale HA

bull Provide an overview of Monascabull Features

bull API

bull Demo

What is Monitoring-as-a-Service

bull A Monitoring or Logging solution deployed as Software-as-a-Servicebull Eg CloudWatch Datadog New Relic Librato Loggly and many others

bull First-class preferably RESTful HTTP API

bull Authentication

bull Multi-tenancy

bull Provides self-provisioning to userstenants of the service

bull Designed to be highly reliable and operate at scale

bull Historically run by an operations team doing web services

What is OpenStack

bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources

bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services

bull Deployed in both public and private clouds

What is Monasca

bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)

bull Microservices message-bus based architecture

bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp

Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity

planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable

The Log

bull The Log What every software engineer should know about real-time datas unifying abstraction

bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying

bull Log An append-only totally-ordered sequence of records ordered by time

From To

Monitoring Architecture

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 2: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Speaker

Roland Hochmuth

Hewlett Packard Enterprise

Fort Collins Colorado USA

Agenda

bull Describe how to build a highly scalable monitoring and logging as a service platform

bull Architectural and design principles

bull Scale HA

bull Provide an overview of Monascabull Features

bull API

bull Demo

What is Monitoring-as-a-Service

bull A Monitoring or Logging solution deployed as Software-as-a-Servicebull Eg CloudWatch Datadog New Relic Librato Loggly and many others

bull First-class preferably RESTful HTTP API

bull Authentication

bull Multi-tenancy

bull Provides self-provisioning to userstenants of the service

bull Designed to be highly reliable and operate at scale

bull Historically run by an operations team doing web services

What is OpenStack

bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources

bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services

bull Deployed in both public and private clouds

What is Monasca

bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)

bull Microservices message-bus based architecture

bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp

Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity

planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable

The Log

bull The Log What every software engineer should know about real-time datas unifying abstraction

bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying

bull Log An append-only totally-ordered sequence of records ordered by time

From To

Monitoring Architecture

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 3: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Agenda

bull Describe how to build a highly scalable monitoring and logging as a service platform

bull Architectural and design principles

bull Scale HA

bull Provide an overview of Monascabull Features

bull API

bull Demo

What is Monitoring-as-a-Service

bull A Monitoring or Logging solution deployed as Software-as-a-Servicebull Eg CloudWatch Datadog New Relic Librato Loggly and many others

bull First-class preferably RESTful HTTP API

bull Authentication

bull Multi-tenancy

bull Provides self-provisioning to userstenants of the service

bull Designed to be highly reliable and operate at scale

bull Historically run by an operations team doing web services

What is OpenStack

bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources

bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services

bull Deployed in both public and private clouds

What is Monasca

bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)

bull Microservices message-bus based architecture

bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp

Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity

planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable

The Log

bull The Log What every software engineer should know about real-time datas unifying abstraction

bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying

bull Log An append-only totally-ordered sequence of records ordered by time

From To

Monitoring Architecture

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 4: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

What is Monitoring-as-a-Service

bull A Monitoring or Logging solution deployed as Software-as-a-Servicebull Eg CloudWatch Datadog New Relic Librato Loggly and many others

bull First-class preferably RESTful HTTP API

bull Authentication

bull Multi-tenancy

bull Provides self-provisioning to userstenants of the service

bull Designed to be highly reliable and operate at scale

bull Historically run by an operations team doing web services

What is OpenStack

bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources

bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services

bull Deployed in both public and private clouds

What is Monasca

bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)

bull Microservices message-bus based architecture

bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp

Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity

planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable

The Log

bull The Log What every software engineer should know about real-time datas unifying abstraction

bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying

bull Log An append-only totally-ordered sequence of records ordered by time

From To

Monitoring Architecture

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 5: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

What is OpenStack

bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources

bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services

bull Deployed in both public and private clouds

What is Monasca

bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)

bull Microservices message-bus based architecture

bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp

Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity

planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable

The Log

bull The Log What every software engineer should know about real-time datas unifying abstraction

bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying

bull Log An append-only totally-ordered sequence of records ordered by time

From To

Monitoring Architecture

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 6: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

What is Monasca

bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)

bull Microservices message-bus based architecture

bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp

Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity

planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable

The Log

bull The Log What every software engineer should know about real-time datas unifying abstraction

bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying

bull Log An append-only totally-ordered sequence of records ordered by time

From To

Monitoring Architecture

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 7: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

The Log

bull The Log What every software engineer should know about real-time datas unifying abstraction

bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying

bull Log An append-only totally-ordered sequence of records ordered by time

From To

Monitoring Architecture

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 8: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Monitoring Architecture

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 9: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Kafka

bull A performant distributed durable publishsubscribe messaging and stream processing system

bull Metrics logs and events are published to topics in Kafka

bull Microservices register in a consumer group as a consumer

bull Microservices subscribe to topics and consume metricslogs and events

bull Messages are replicated per consumer group

bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced

bull At-least-once semantic guarantees on message delivery

bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas

bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 10: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

CQRS

bull Command Query Responsibility Segregation (CQRS)

bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state

2 Query side that gets information without changing state

bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently

bull Read store can be optimized for the query pattern of the application

bull Referencebull Event sourcing CQRS stream processing and Apache Kafka

bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 11: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Microservices

bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application

bull Communication between services occurs via a network

bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change

bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 12: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

POST Metrics Sequence

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 13: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Domain Events Sequence

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 14: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Deployment Models (HAScale)

bull Many ways to deploy Monasca

bull Typically deployed in a clusteredHA configuration using three nodes or greater

bull If any node or microservice fails the cluster remains operational

bull Partitions in Kafka are redistributed among the remaining components

bull Preferably the database is run on a separate layer from the other componentsmicroservices

bull Note Monasca can also be deployed on a single-node non-clustered

bull Has also been containerized and run in Kubernetes

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 15: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Metrics ModelPOST v20metrics

name http_statusdimensions

url httphostdomaincom1234servicecluster c1control_plane ccpservice compute

timestamp 0 milliseconds value 10value_meta

status_code 500msg Internal server error

bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)

pairs that are used to uniquely identify a metric

bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement

bull Normally used for errors and messages

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 16: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Push vs Pull

bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues

bull Low-latency sub-second latency difficult for pull model

bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be

discovered or registered

bull Events

bull Temporary cachingbuffering of metricsevents while service unreachable

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 17: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Monasca API

bull Primary point for pushing metrics and handling queries

bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone

bull Resources Metrics Alarm Definitions Alarms and Notification Methods

bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs

bull Horizontally scalable

bull Publishes metrics to Kafka

bull Queries timeseries DB for measurements and statistics

bull Queries Config DB for alarms alarm definitions and notification methods

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 18: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Persister

bull Consumes both metrics and alarm state transition events from Kafka

bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance

bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing

the metric or alarm state transition eventbull Note duplicates are possible

bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining

persisters

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 19: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Time Series Databases

bull Used for storingbull Metricsbull Alarm state history

bull Two databases supported1 Vertica

bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series

2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka

bull Investigating support for additional databases

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 20: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Config Database

bull Stores all transactional data for Monasca such asbull Alarm Definitions

bull Alarms

bull Notification Methods

bull MySQL and Postgres supported

bull Typically deployed in a clustered or HA configuration

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 21: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Threshold Engine

bull Near real-time stream processing clustered and highly available threshold engine

bull Based on Apache Storm

bull Consumes metrics from Kafka

bull Creates alarms based on metrics that match patterns specified in the alarm definition

bull Evaluates whether metrics exceed threshold

bull Publishes alarm state transition events to Kafka

bull Supports both simple and compound alarm expressions

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 22: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Notification Engine

bull Consumes alarm state transition events from Kafka produced by the Threshold Engine

bull Evaluates whether notifications should be sent based on actions specified in the alarm definition

bull OK ALARM and UNDETERMINED actions

bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to

retry topic in Kafka and retried laterbull Grouping notifications In progress

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 23: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Kafka Message Schema

bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services

bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 24: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Metrics

Create query and get statistics for metrics

bull GET POST v20metrics

bull GET v20metricsnamesbull Returns the unique metric names

bull GET v20metricsdimensionnamesbull Returns the unique dimension names

bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 25: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Measurements

GET v20metricsmeasurements

bull Returns a list of measurements

bull Query parametersbull Name and dimensions to filter by

bull Start_time and end_time

bull Offset and limit

bull merge_metrics allow multiple metrics to be combined into a single list of measurements

bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 26: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Statistics

GET v20metricsstatistics

bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list

of statisticsbull group_by list of columns to group the metrics to be returned Allows

multiple unique metrics to be returned in a single query

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 27: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Metrics Names

GET v20metricsnames

bull Returns a list of the unique metric names

bull Query parametersbull Dimensions

bull Offset limit

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 28: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Metric Dimension Names

GET v20metricsdimensionsnames

bull List the dimension names

bull Query parametersbull Metric name

bull Offset limit

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 29: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Metric Dimension Values

GET v20metricsdimensionsnamesvalues

bull List the dimension values

bull Query parametersbull Metric name

bull Dimension name

bull Offset limit

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 30: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Alarm Definitions

POST GET v20alarm-definitions

bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions

bull One alarm definition can result in zero or more alarms

bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000

bull Alarm states (OK ALARM and UNDETERMINED)

bull Actions associated with alarms for state transitions

bull User assigned severity (LOW MEDIUM HIGH CRITICAL)

bull Thresholds can be dynamically adjusted via PATCH

bull Minimal lifecycle management alarm_lifecycle_state and link

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 31: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

List Alarms

GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |

ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and

time format in UTCbull Offset limitbull sort_by

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 32: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Alarms

GET PUT PATCH DELETE v20alarmsalarm-id

bull Alarms created by the Threshold Engine based on matching alarm definitions

bull When new nodes or components are deployed alarms are automatically created

bull Alarms are resources within Monasca They have a resource ID and lifecycle

bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received

bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log

files occur and no metrics when there arent any errors

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 33: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Alarm Counts

GET v20alarmscount

bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity

bull Used for summary dashboards

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 34: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Example Helion Ops Console

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 35: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Alarm History

GET v20alarmsstate-history

bull Lists the alarm state history for alarms

bull Query Parametersbull Dimensions to filter on

bull Startend timestamp

bull Offset limit

GET v20alarmsalarm-idstate-history

bull Lists the alarm state history for a specific alarm

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 36: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Notification Methods

POST GET DELETE v20notification-methods

Notification methods are associated with Actions in alarm definitions

Example

POST v20notification-methods

nameName of notification method

typeEMAIL

addressjohndoehpcom

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 37: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Monasca Agent

bull System metrics (cpu memory network filesystem hellip)

bull Service metricsbull MySQL Kafka and many others

bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions

bull VM system metrics

bull Open vSwitch metrics

bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)

bull Runs any Nagios plugin or check_mk

bull ExtensiblePluggable Additional services can be easily added

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 38: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Agent details

bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API

bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests

bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone

bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored

bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 39: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

GrafanaMonasca Integration

bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca

bull httpsgithubcomopenstackmonasca-grafana-datasource

bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana

bull Support for Alerting will be added in Grafana 4

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 40: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Grafana Monasca Data Source

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 41: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Logging Architecture

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 42: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Logging API

bull POST v30logs

bull Batch log messages in a single http request

bull Global local mixed dimensionsbull Similar to dimensions in metrics

bull JSON only

bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-

log-api-specmd

bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 43: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Log Model

bull dimensions

hostnamedevstack

servicemonitoring

componentmonasca-api

logs[

messagemsg1

dimensions

servicecompute

componentnova-api

pathvarlogmysqllog

messagemsg2

dimensions

pathvarlogmonascamonasca-apilog

]

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 44: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Log Agents

bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1

bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406

bull Logspout Under Investigation

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 45: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Kibana Integration

bull Keystone authentication support for Kibana

bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone

bull Note In progress of moving to official OpenStack repo

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 46: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Composabilty LoggingMetrics

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 47: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Transform and Analytics Engine

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 48: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Monasca Transform

bull A new micro-service in Monasca that aggregates and transforms metrics

bull Currently based on Apache Spark Streaming

bull Use Casesbull Object Storage Disk Capacity

bull Object Storage Capacity

bull Compute Host Capacity

bull VM Capacity

bull More to come

bull Metrics are aggregated and published every hour

bull Currently in deployment in HPE Helion OpenStack 40

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 49: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Monasca Analytics

bull A framework that adds data science tools (parsers algorithms etc)

bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes

bull Thin orchestration layer that instantiates an execution environment

bull Focused onbull Anomaly detection

bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)

bull Example algorithms One Class SVM and LiNGAM

bull Status Under Development

bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 50: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Distributions amp Deployments

bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization

Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec

bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack

bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates

bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager

bull NECbull Planning to include Monasca in Cloud Solution Menus solution

bull Others

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 51: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Statistics MitakaNewton Release

bull Organizations

bull Contributors

bull Commits

bull Reviews

bull Lines of code

31

97

1075

4080

215370

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 52: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Ecosystem

bull Hewlett Packard Enterprise

bull Fujitsu

bull Charter Communications

bull NEC

bull Cisco

bull Cloudbase Solutions

bull SUSE

bull SolidFire

bull SAP

bull Cray Inc

bull FIWARE Lab

bull Mirantis

bull Broadcom

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 53: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Containers and Kubernetes

bull New Monasca Agent Pluginsbull Docker plugin

bull cAdviser plugin

bull Kubernetes plugin Monitors both Kubernetes control plane and containers

bull Prometheus client plugin Scrapes apps

bull Mesos pugin

bull Containerization of Monasca

bull Heapster Monasca data sink

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications

Page 54: Monasca - NETWAYS...... What every software engineer should know about real-time data's unifying ... systems/log-what-every-software-engineer-should-know-about

Next Steps

bull Containerizing Monasca

bull Monitoring containers and container managers such as Kubernetes

bull Grouping notifications