monasca - netways...... what every software engineer should know about real-time data's...
TRANSCRIPT
MonascaMonitoringLogging-as-a-Service (at-scale)
Speaker
Roland Hochmuth
Hewlett Packard Enterprise
Fort Collins Colorado USA
Agenda
bull Describe how to build a highly scalable monitoring and logging as a service platform
bull Architectural and design principles
bull Scale HA
bull Provide an overview of Monascabull Features
bull API
bull Demo
What is Monitoring-as-a-Service
bull A Monitoring or Logging solution deployed as Software-as-a-Servicebull Eg CloudWatch Datadog New Relic Librato Loggly and many others
bull First-class preferably RESTful HTTP API
bull Authentication
bull Multi-tenancy
bull Provides self-provisioning to userstenants of the service
bull Designed to be highly reliable and operate at scale
bull Historically run by an operations team doing web services
What is OpenStack
bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources
bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services
bull Deployed in both public and private clouds
What is Monasca
bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)
bull Microservices message-bus based architecture
bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp
Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity
planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable
The Log
bull The Log What every software engineer should know about real-time datas unifying abstraction
bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying
bull Log An append-only totally-ordered sequence of records ordered by time
From To
Monitoring Architecture
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Speaker
Roland Hochmuth
Hewlett Packard Enterprise
Fort Collins Colorado USA
Agenda
bull Describe how to build a highly scalable monitoring and logging as a service platform
bull Architectural and design principles
bull Scale HA
bull Provide an overview of Monascabull Features
bull API
bull Demo
What is Monitoring-as-a-Service
bull A Monitoring or Logging solution deployed as Software-as-a-Servicebull Eg CloudWatch Datadog New Relic Librato Loggly and many others
bull First-class preferably RESTful HTTP API
bull Authentication
bull Multi-tenancy
bull Provides self-provisioning to userstenants of the service
bull Designed to be highly reliable and operate at scale
bull Historically run by an operations team doing web services
What is OpenStack
bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources
bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services
bull Deployed in both public and private clouds
What is Monasca
bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)
bull Microservices message-bus based architecture
bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp
Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity
planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable
The Log
bull The Log What every software engineer should know about real-time datas unifying abstraction
bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying
bull Log An append-only totally-ordered sequence of records ordered by time
From To
Monitoring Architecture
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Agenda
bull Describe how to build a highly scalable monitoring and logging as a service platform
bull Architectural and design principles
bull Scale HA
bull Provide an overview of Monascabull Features
bull API
bull Demo
What is Monitoring-as-a-Service
bull A Monitoring or Logging solution deployed as Software-as-a-Servicebull Eg CloudWatch Datadog New Relic Librato Loggly and many others
bull First-class preferably RESTful HTTP API
bull Authentication
bull Multi-tenancy
bull Provides self-provisioning to userstenants of the service
bull Designed to be highly reliable and operate at scale
bull Historically run by an operations team doing web services
What is OpenStack
bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources
bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services
bull Deployed in both public and private clouds
What is Monasca
bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)
bull Microservices message-bus based architecture
bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp
Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity
planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable
The Log
bull The Log What every software engineer should know about real-time datas unifying abstraction
bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying
bull Log An append-only totally-ordered sequence of records ordered by time
From To
Monitoring Architecture
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
What is Monitoring-as-a-Service
bull A Monitoring or Logging solution deployed as Software-as-a-Servicebull Eg CloudWatch Datadog New Relic Librato Loggly and many others
bull First-class preferably RESTful HTTP API
bull Authentication
bull Multi-tenancy
bull Provides self-provisioning to userstenants of the service
bull Designed to be highly reliable and operate at scale
bull Historically run by an operations team doing web services
What is OpenStack
bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources
bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services
bull Deployed in both public and private clouds
What is Monasca
bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)
bull Microservices message-bus based architecture
bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp
Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity
planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable
The Log
bull The Log What every software engineer should know about real-time datas unifying abstraction
bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying
bull Log An append-only totally-ordered sequence of records ordered by time
From To
Monitoring Architecture
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
What is OpenStack
bull OpenStack is a cloud operating system that controls large pools of compute storage and networking resources
bull Open-source alternative to AWS Microsoft Azure Google Cloud and other cloud services
bull Deployed in both public and private clouds
What is Monasca
bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)
bull Microservices message-bus based architecture
bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp
Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity
planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable
The Log
bull The Log What every software engineer should know about real-time datas unifying abstraction
bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying
bull Log An append-only totally-ordered sequence of records ordered by time
From To
Monitoring Architecture
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
What is Monasca
bull Open-source MonitoringLogging-as-a-Service platform for OpenStackbull Authentication currently via OpenStack Identity Service (Keystone)
bull Microservices message-bus based architecture
bull First-class RESTful APIbull Push-based metricsbull Consolidates Operational Monitoring Monitoring-as-a-Service Metering amp
Billing and morebull Designed for elastic cloud environmentsdeploymentsbull High-availability clustering built-inbull Horizontally scalable and vertically 4 tieredlayered architecturebull Capable of long-term data retention to address metering SLA capacity
planning trend analysis post-hoc RCA and other use casesbull Extensible and Composable
The Log
bull The Log What every software engineer should know about real-time datas unifying abstraction
bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying
bull Log An append-only totally-ordered sequence of records ordered by time
From To
Monitoring Architecture
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
The Log
bull The Log What every software engineer should know about real-time datas unifying abstraction
bull httpsengineeringlinkedincomdistributed-systemslog-what-every-software-engineer-should-know-about-real-time-datas-unifying
bull Log An append-only totally-ordered sequence of records ordered by time
From To
Monitoring Architecture
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Monitoring Architecture
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Kafka
bull A performant distributed durable publishsubscribe messaging and stream processing system
bull Metrics logs and events are published to topics in Kafka
bull Microservices register in a consumer group as a consumer
bull Microservices subscribe to topics and consume metricslogs and events
bull Messages are replicated per consumer group
bull Messages are load-balanced across all consumers in a consumer groupbull Can addremove micro-services to handle load or mitigate problemsbull As micro-services expandcontract the partitions are automatically re-balanced
bull At-least-once semantic guarantees on message delivery
bull Also used for domain events notification retry events periodic notifications grouping notifcations and other areas
bull Always accept data never drop data true elasticitybull Loggly httpswwwyoutubecomwatchv=LpNbjXFPyZ0
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
CQRS
bull Command Query Responsibility Segregation (CQRS)
bull CQRS involves splitting an application into two parts internally1 Command side ordering the system to update state
2 Query side that gets information without changing state
bull Advantagesbull Decouples the readwrite load Allows each to be scaled independently
bull Read store can be optimized for the query pattern of the application
bull Referencebull Event sourcing CQRS stream processing and Apache Kafka
bull httpswwwconfluentioblogevent-sourcing-cqrs-stream-processing-apache-kafka-whats-connection
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Microservices
bull Microservices are small autonomous decoupled services that are deployed independenty and work together as a single application
bull Communication between services occurs via a network
bull Services need to be able to change independently of each other and be deployed by themselves without requiring consumers to change
bull Benefitsbull Resiliencebull Scalebull Ease of deploymentbull Organizational Alignmentbull Optimized for ChangeReplaceability
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
POST Metrics Sequence
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Domain Events Sequence
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Deployment Models (HAScale)
bull Many ways to deploy Monasca
bull Typically deployed in a clusteredHA configuration using three nodes or greater
bull If any node or microservice fails the cluster remains operational
bull Partitions in Kafka are redistributed among the remaining components
bull Preferably the database is run on a separate layer from the other componentsmicroservices
bull Note Monasca can also be deployed on a single-node non-clustered
bull Has also been containerized and run in Kubernetes
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Metrics ModelPOST v20metrics
name http_statusdimensions
url httphostdomaincom1234servicecluster c1control_plane ccpservice compute
timestamp 0 milliseconds value 10value_meta
status_code 500msg Internal server error
bull Simple concise multi-dimensional flexible descriptionbull Name (string)bull Dimensions Dictionary of user-defined (key value)
pairs that are used to uniquely identify a metric
bull Optional dictionary of user-defined (key value) pairs that can be used to describe a measurement
bull Normally used for errors and messages
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Push vs Pull
bull Monitoring-as-a-Servicebull Cant always pull due to firewalls and network issues
bull Low-latency sub-second latency difficult for pull model
bull Doesnt require service discovery and registrationbull As entities are deployed they can start sending metrics without have to be
discovered or registered
bull Events
bull Temporary cachingbuffering of metricsevents while service unreachable
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Monasca API
bull Primary point for pushing metrics and handling queries
bull Authenticates all requests against the Keystone identity servicebull Note auth tokens are cached to reduce the load on Keystone
bull Resources Metrics Alarm Definitions Alarms and Notification Methods
bull API Specificationbull httpsgithubcomopenstackmonasca-apitreemasterdocs
bull Horizontally scalable
bull Publishes metrics to Kafka
bull Queries timeseries DB for measurements and statistics
bull Queries Config DB for alarms alarm definitions and notification methods
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Persister
bull Consumes both metrics and alarm state transition events from Kafka
bull Stores temporarily in-memory and does batch writes to the TSDB based on batch size or time to optimize write performance
bull At-least once message delivery semanticsbull No metrics or alarm state transition events are lostbull The Kafka consumer offset for each batch is only updated after successfully storing
the metric or alarm state transition eventbull Note duplicates are possible
bull HAfault-tolerancebull Multiple persisters run simultaneously and balance loadbull If a persister fails the load is automatically re-balanced across the remaining
persisters
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Time Series Databases
bull Used for storingbull Metricsbull Alarm state history
bull Two databases supported1 Vertica
bull Enterprise class proprietary closed-source clustered HA analytics databasebull Excels at time-series
2 InfluxDBbull Open-source single-node time-series DBbull Clustering is closed-sourcebull Note can replicate to multiple instances of InfluxDB using Kafka
bull Investigating support for additional databases
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Config Database
bull Stores all transactional data for Monasca such asbull Alarm Definitions
bull Alarms
bull Notification Methods
bull MySQL and Postgres supported
bull Typically deployed in a clustered or HA configuration
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Threshold Engine
bull Near real-time stream processing clustered and highly available threshold engine
bull Based on Apache Storm
bull Consumes metrics from Kafka
bull Creates alarms based on metrics that match patterns specified in the alarm definition
bull Evaluates whether metrics exceed threshold
bull Publishes alarm state transition events to Kafka
bull Supports both simple and compound alarm expressions
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Notification Engine
bull Consumes alarm state transition events from Kafka produced by the Threshold Engine
bull Evaluates whether notifications should be sent based on actions specified in the alarm definition
bull OK ALARM and UNDETERMINED actions
bull Supports email PagerDuty webhooks HipChat Slack and JIRAbull Dynamic plugins supportedbull Supports both one-shot and periodic notificationsbull If sending to the notification address fails then notification is published to
retry topic in Kafka and retried laterbull Grouping notifications In progress
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Kafka Message Schema
bull JSON messages publishedconsumed tofrom Kafka by Monasca micro-services
bull Well-defined schema is published atbull httpswikiopenstackorgwikiMonascaMessage_Schema
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Metrics
Create query and get statistics for metrics
bull GET POST v20metrics
bull GET v20metricsnamesbull Returns the unique metric names
bull GET v20metricsdimensionnamesbull Returns the unique dimension names
bull GET v20metricsdimensionnamesvaluesbull Returns the unique dimension values
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Measurements
GET v20metricsmeasurements
bull Returns a list of measurements
bull Query parametersbull Name and dimensions to filter by
bull Start_time and end_time
bull Offset and limit
bull merge_metrics allow multiple metrics to be combined into a single list of measurements
bull group_by list of columns to group the metrics to be returned Allows multiple unique metrics to be returned in a single query
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Statistics
GET v20metricsstatistics
bull Query parametersbull Name and dimensions to filter bybull Start_time and end_timebull Statistics avg min max sum and countbull Period The time period to aggregate measurements bybull Offset limitbull merge_metrics allow multiple metrics to be combined into a single list
of statisticsbull group_by list of columns to group the metrics to be returned Allows
multiple unique metrics to be returned in a single query
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Metrics Names
GET v20metricsnames
bull Returns a list of the unique metric names
bull Query parametersbull Dimensions
bull Offset limit
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Metric Dimension Names
GET v20metricsdimensionsnames
bull List the dimension names
bull Query parametersbull Metric name
bull Offset limit
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Metric Dimension Values
GET v20metricsdimensionsnamesvalues
bull List the dimension values
bull Query parametersbull Metric name
bull Dimension name
bull Offset limit
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Alarm Definitions
POST GET v20alarm-definitions
bull Alarm definitions are templates that are used to automatically and dynamically create alarms based on matching metric names and dimensions
bull One alarm definition can result in zero or more alarms
bull Simple grammar for creating compound alarm expressionsbull avg(cpuuser_perc) gt 85 or avg(diskread_opsdevice=vda 120) gt 1000
bull Alarm states (OK ALARM and UNDETERMINED)
bull Actions associated with alarms for state transitions
bull User assigned severity (LOW MEDIUM HIGH CRITICAL)
bull Thresholds can be dynamically adjusted via PATCH
bull Minimal lifecycle management alarm_lifecycle_state and link
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
List Alarms
GET v20alarmsQuery parametersbull metric_name - Name of metric to filter bybull metric_dimensionsbull State OK ALARM or UNDETERMINEDbull Severity One or more severities to filter by separated with |
ex severity=LOW|MEDIUMbull state_updated_start_time The start time in ISO 8601 combined date and
time format in UTCbull Offset limitbull sort_by
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Alarms
GET PUT PATCH DELETE v20alarmsalarm-id
bull Alarms created by the Threshold Engine based on matching alarm definitions
bull When new nodes or components are deployed alarms are automatically created
bull Alarms are resources within Monasca They have a resource ID and lifecycle
bull By default three states OK ALARM and UNDETERMINEDbull UNDETERMINED state occurs when metrics are no longer being received
bull Deterministic alarms two states OK and ALARMbull Used for systems where metrics are sporadic Eg Creating metrics when errors in log
files occur and no metrics when there arent any errors
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Alarm Counts
GET v20alarmscount
bull Query the total number of alarms in the OK ALARM or UNDETERMINED state and their severities grouped by metrics dimension such as OpenStack service state and severity
bull Used for summary dashboards
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Example Helion Ops Console
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Alarm History
GET v20alarmsstate-history
bull Lists the alarm state history for alarms
bull Query Parametersbull Dimensions to filter on
bull Startend timestamp
bull Offset limit
GET v20alarmsalarm-idstate-history
bull Lists the alarm state history for a specific alarm
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Notification Methods
POST GET DELETE v20notification-methods
Notification methods are associated with Actions in alarm definitions
Example
POST v20notification-methods
nameName of notification method
typeEMAIL
addressjohndoehpcom
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Monasca Agent
bull System metrics (cpu memory network filesystem hellip)
bull Service metricsbull MySQL Kafka and many others
bull Application metricsbull Built-in Statsd daemonbull Python monasca-statsd library Adds support for dimensions
bull VM system metrics
bull Open vSwitch metrics
bull Active checksbull HTTP status checks and response timesbull System updown checks (ping and ssh)
bull Runs any Nagios plugin or check_mk
bull ExtensiblePluggable Additional services can be easily added
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Agent details
bull The Agent Forwarder buffers metrics for a short time to increase the size of the http request body (number of metrics) sent to the Monasca API
bull The Agent request an auth token from the Keystone Identity service which is supplied on all requests
bull The Monasca Agent and API caches Monasca Agent and API caches Monasca Agent and API caches auth tokens in-memory to reduce the round-trip authorization requests to Keystone
bull If network connectivity between the Agent and API occurs the Agent will buffer metrics and send when connectivity is restored
bull Metrics are submitted using a ldquoagentrdquo role which only allows metrics to be POSTrsquod to the metrics endpoint
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
GrafanaMonasca Integration
bull Datasource A datasource that can be added to the Grafana dashboard to enable Monasca
bull httpsgithubcomopenstackmonasca-grafana-datasource
bull Keystone authenticationbull httpsgithubcomtwc-openstackgrafana
bull Support for Alerting will be added in Grafana 4
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Grafana Monasca Data Source
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Logging Architecture
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Logging API
bull POST v30logs
bull Batch log messages in a single http request
bull Global local mixed dimensionsbull Similar to dimensions in metrics
bull JSON only
bull Specificationbull httpsgithubcomopenstackmonasca-log-apiblobmasterdocsmonasca-
log-api-specmd
bull Queries not done via API but via Tenantized version of Kibanabull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Log Model
bull dimensions
hostnamedevstack
servicemonitoring
componentmonasca-api
logs[
messagemsg1
dimensions
servicecompute
componentnova-api
pathvarlogmysqllog
messagemsg2
dimensions
pathvarlogmonascamonasca-apilog
]
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Log Agents
bull Logstashbull httpsgithubcomlogstash-pluginslogstash-output-monasca_log_apipull1
bull Beaverbull httpsgithubcompython-beaverpython-beaverpull406
bull Logspout Under Investigation
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Kibana Integration
bull Keystone authentication support for Kibana
bull Authentication pluginbull httpsgithubcomFujitsuEnablingSoftwareTechnologyGmbHfts-keystone
bull Note In progress of moving to official OpenStack repo
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Composabilty LoggingMetrics
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Transform and Analytics Engine
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Monasca Transform
bull A new micro-service in Monasca that aggregates and transforms metrics
bull Currently based on Apache Spark Streaming
bull Use Casesbull Object Storage Disk Capacity
bull Object Storage Capacity
bull Compute Host Capacity
bull VM Capacity
bull More to come
bull Metrics are aggregated and published every hour
bull Currently in deployment in HPE Helion OpenStack 40
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-transform
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Monasca Analytics
bull A framework that adds data science tools (parsers algorithms etc)
bull Features include bull Algorithmic flow definition enabling sharing of complex algorithmic recipes
bull Thin orchestration layer that instantiates an execution environment
bull Focused onbull Anomaly detection
bull Reducing alert fatigue via alarm clustering (unsupervised machine learning)
bull Example algorithms One Class SVM and LiNGAM
bull Status Under Development
bull OpenStack projectrepobull httpsgithubcomopenstackmonasca-analytics
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Distributions amp Deployments
bull Charter Communicationsbull Monasca and Grafana is currently deployed in production private cloudbull Monitoring-as-a-Service Use cases supported with Grafana as the Visualization
Dashboardbull 2 datacenters 600-700 compute nodes 1000 VMs 11000 metricssec
bull FIWARE Labbull httpsuperuseropenstackorgarticlesmonitoring-a-multi-region-cloud-based-on-openstack
bull Hewlett Packard Enterprise Cloud System Helion OpenStackbull Supported and tested up to 65K metricssec injest rates
bull Fujitsubull FUJITSU Software ServerView Cloud Monitoring Manager
bull NECbull Planning to include Monasca in Cloud Solution Menus solution
bull Others
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Statistics MitakaNewton Release
bull Organizations
bull Contributors
bull Commits
bull Reviews
bull Lines of code
31
97
1075
4080
215370
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Ecosystem
bull Hewlett Packard Enterprise
bull Fujitsu
bull Charter Communications
bull NEC
bull Cisco
bull Cloudbase Solutions
bull SUSE
bull SolidFire
bull SAP
bull Cray Inc
bull FIWARE Lab
bull Mirantis
bull Broadcom
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Containers and Kubernetes
bull New Monasca Agent Pluginsbull Docker plugin
bull cAdviser plugin
bull Kubernetes plugin Monitors both Kubernetes control plane and containers
bull Prometheus client plugin Scrapes apps
bull Mesos pugin
bull Containerization of Monasca
bull Heapster Monasca data sink
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications
Next Steps
bull Containerizing Monasca
bull Monitoring containers and container managers such as Kubernetes
bull Grouping notifications