containers infrastructure for advanced management · developers do not know about operators issues...

34
Containers Infrastructure for Advanced Management Federico Simoncelli Associate Manager, Red Hat October 2016

Upload: phungliem

Post on 10-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Containers Infrastructurefor Advanced Management

Federico SimoncelliAssociate Manager, Red HatOctober 2016

About Me

Kubernetes● Decoupling problems to hand out to different teams

○ Developers do operations for their application○ Cluster Admins do operations for cluster software○ Kernel and Operating System do operations for nodes○ Hardware operations for clouds

● Layer of abstraction for Application definition● Machines don’t have an identity or a specific function

○ “All ...machines are created equal”

● Developers do not know about Operators issues● Operators do not know about Applications issues

OpenShift● 100% based and compatible with Kubernetes● Kubernetes influencer for new features

○ Projects and Namespaces○ Templates○ Routes and Ingress

● Additional features related to images life-cycle and rolling updates

● Integrated experience in many areas○ Opinionated metrics and logging solutions○ Developer Web Console

Application Components DistributionTraditional and Kubernetes distribution of application components

SCALE

CO

MP

LEX

ITY

Dev team.How can we move faster?

Dev meets Ops.How do we run at

scale?

DevOps.Can we turn it into

a platform?

Production Ops.How do we

manage at scale?

One developer.How do I

containerize?

New Set Of (Old) Problems for Operators

Deployment Requirements● Standardized and easy to reproduce

○ Pick a platform Atomic vs Traditional

● Automatic and composable● Deploy-and-forget is not enough● Maintainable

○ Definition of desired state and reconciliation

● Allow to reliably modify infrastructure○ Scaling (add and remove nodes)○ Change configurations, etc.

● Somehow similar to Kubernetes principles

Deployment Status● Kubernetes

○ kube-up based on SaltStack (turning into kube-deploy)■ Mostly for GCE (and Vagrant for development)

○ Kargo based on Ansible○ GKE (possible future)

● OpenShift○ https://github.com/openshift/openshift-ansible○ Supports AWS GCE libvirt OpenStack Vagrant

● Containers on OpenStack○ Kubernetes and OpenShift Heat templates○ Magnum container orchestration as first class resources○ https://github.com/redhat-openstack/openshift-on-openstack

OpenShift-Ansible● Actively maintained and feature-rich● Based on a healthy Open Source automation project

○ Large ecosystem○ Composable with other automations

● Describe your infrastructure as “inventory”○ Inventory can be versioned and updated

● Simple interactive installation○ atomic-openshift-installer

● Advanced installation supporting many advanced features

○ Possibly hard to master

Monitoring Objectives● Notification of incidents

○ Grace period○ Notifications

● Debug new or unknown issues○ Quickly have at hand the overall status of the cluster○ Easy access to metrics and logging

■ Metrics and logging at all levels (infrastructure, etc.)

● Analyze trending and proactively avoid future incidents

○ Scheduled maintenance○ Datacenter Hardware upgrades

Common Monitoring Architecture

Monitor Kubernetes-Based Clusters with Heapster● Leverage the infrastructure to monitor

the same infrastructure○ What if monitoring is failing continuously?

● Heapster○ Enables Container Cluster Monitoring and

Performance Analysis○ Different sinks

● Autoscaling○ Collected data are then used to autoscale

Pods (when configured)

Agile Monitoring● Running continuously a data center 24/7 demands

more than Metrics collection● Contribution to Heapster and cAdvisor is “slow”● Integrate additional solutions and technologies● Agile addition of new Metrics

○ No development involved

● Monitoring for known issues○ Nodes can self-heal

● Statistics on most recurring issues○ Identify fragile components or architecture○ Focus development for reliability

Application and Infrastructure Monitoring● Roles and duties separation (once again)

○ Developers should be interested only on metrics and logs of applications

■ Developers must see only data of objects they own

○ Operators are mostly interested on metrics and logs of the infrastructure (e.g. nodes)

● Metrics, logging and alerts belong to objects○ Heapster collects metrics per object (node, container, etc.)

● Security considerations○ Applications and infrastructure in the same data store?○ Tenancy in data store is enough for you?

Monitoring Architecture Considerations● Reliability and disruptions isolation

● Scalability of each subsystem

● Data locality

● Reuse of existing solutions

● Security (and isolation of data)

● Monitoring life-cycle (upgrade and rollback)

● Cross correlation of multiple clusters and solutions

● Single technology for Metrics and Logging?

Direct Monitoring

Metrics and Logging Federation

Hawkular and ElasticSearch

● Open Source solutions for metrics and logging○ Hawkular based on Cassandra○ ElasticSearch based on Lucene

● Data stores used by many existing projects● Technologies of choice for OpenShift

○ Work out of the box in OpenShift

● Hawkular trigger definitions for Alerts● Kibana visualization tool for ElasticSearch

Image and SecuritySecurity assessment

● How to trust underlying images?● How to keep the images safe● How to enforce security policies?

Technologies

● Signed images● OpenSCAP assessment tools● Atomic Scan and Blackduck

Putting It All Together

● Maintainable deployment solution○ Support cluster re-shaping○ Versionable

● Monitoring unexpected events and alerts● Planning data center evolution over time● Ability of monitoring and cross-link with the

underlying infrastructure● Out-Of-The-Box experience

○ Knowledge gathered from a community of Operators

ManageIQ Comprehensive Cloud Management● Single-Pane of Glass

○ Monitoring○ Management

● Private and Public All-Around○ VMs, Instances, Containers, Storage, Network

● Management Framework○ Infrastructure applications

● Policies and Alerts● Reports and Chargeback Reports● Automation● Capacity Planning

ManageIQ Project and History● Virtualization Management since 2006● Acquired by Red Hat in December 2012● Open-Sourced in June 2014

7 Technical Leaders 3 Monthly Stable Builds

~50 Core Engineers Nightly Builds

~100 Contributors (and counting) 3 Weeks Sprints

3 Companies Involved 200 Average PR (per Sprint)

Introducing Containers to ManageIQ 2015 - 2016● Inventory collection of major objects

○ Nodes, Pods, Services, Replicators, etc.

● Cross-linking for nodes on known instances● Dashboard and Topology● Metrics collection from Hawkular

○ Utilization aggregation (Project, Service, etc.)

● Smart-State Analysis○ Collection of images packages

● OpenSCAP for container images● Policies for container objects● Chargeback

ManageIQ Inventory and Relationships

Service ContainerPod Image

NodeCluster

Instance

Containers Management in ManageIQ in 2017Current ongoing efforts for 2017

● Alerts dashboard and life-cycle● Live Metrics and Alerts

○ Metrics served by Hawkular to ManageIQ○ Support native Hawkular triggers for Alerts

● Dynamic Metrics and Alerts○ Custom metrics and alerts on-demand

● Automation○ Manage and re-provision ManageIQ using Ansible

● Integration with Logging and ELK stack

Get Involved!● Community http://talk.manageiq.org● Code https://github.com/ManageIQ/manageiq providers/containers● Documentation http://manageiq.org/documentation● Social:

○ Twitter @manageiq #manageiq

Federico [email protected]://twitter.com/simon3z