improving devops through cloud automation and management - real-world rocket science with chef,...

Real-World Rocket Science

with Chef and Ostrato

Nicolas Rycar, Automation Engineer

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

Computing Resources Have Been Increasing Exponentially

Virtual Nodes

Physical Hardware

20

40

60

80

100

120

Mill

ion

s

1980Mainframe

1990Client/Server

2010+Web-Scale

2000Datacenter

Millions

of

Serv

ers

Increased Size Leads to Operational Complexity

WEB

SERVERS

APPLICATION

SERVERS

DATABASE

ADD 1

SERVER

20+ Changes

12+ New

dependences

How does Chef work?

• Ensures desired state by continually testing and

repairing individual resources in the system

• You compose policies using a series of simple

declarations

• The Chef client fetches those policies from a

central server and applies them to the local

machine

• The state of the machine is recorded and sent

back to a database, where it is indexed for

search, reporting, and audit.

Chef Client Checks In

Node

Chef Server

Policy is Stored on a Central Server

Node

Chef Server

"recipe[ntp::client]"

"recipe[users]"

"role[webserver]"

Chef Client Pulls New Policy and Applies It

Chef Server

"recipe[ntp::client]"

"recipe[users]"

"role[webserver]"

The Chef Software Platform

Chef

Development Kit

Cookbook and

Policy

Authoring

Test-Driven

Infrastructure

Chef Server

Management

Console

Analytics

Platform

High Availability

and Replication

Chef

Client

Nodes

Data

Center

The

Cloud

Dale Wickizer, Chief Technology Officer

Private Cloud

Business

Group A

Business

Group BIT

Top Challenges:- Governance

- Orchestration

- Controlling Costs

- Continuous Delivery

Market Problem

Our Design Philosophy• Build a powerful, cloud service management platform:

• Seamless operations across public & private clouds

• Simple-to-use

• Open Source

• Deliver immediate business value

– Strong, global policies

– Rich product features

– Role-based Access Controls (RBAC)

• Great user experience

• User-specific marketplaces (multi-tenant)

• Same intuitive actions and workflows, regardless of CSP

Self-Service Portals

Governance Engine

Cloud Operations

Our Solution

Ostrato cloudSM

API Abstraction Layer

Self-Service Portals

Governance Engine

Cloud Operations

Our Differentiation

Ostrato cloudSM

API Abstraction Layer

What is Ostrato cloudSM?

GET

/parking_calendars

200 OK

[

{

"name":

"Schedule A",

"id": <id>,

"calendar_url":

<url>,

"times": {

With

GUI

With

API

C

O

N

T

R

O

L

One Pane to Govern Cloud Services

Automation & Governance in DevOps

• Organizations struggle to combine dev & QA

processes with IT operations (a.k.a,“DevOps”)

• Business problem: Move application changes to

production faster, without sacrificing:

• Quality

• Governance

• Reporting & Visibility

• Cost Controls

• Security

Customer

Expectation

Continuous

Delivery

DevOps: The Promise…

Real DevOps Is A Lot More Complicated ..

Key: Policy-Driven Automation

Ostrato Chef

Self-Service Marketplaces

• No Scripting

• Template-driven

Strong Governance

• RBAC-driven Global Policies

• Workflow Approval

CSP-independent Provisioning

• Fast & Repeatable

• API-Driven

Configuration Management:

• Powerful

• Scalable

Andrew Heifetz, Chief Cloud Officer

Who is Andrew?

• Chief Cloud Officer for OpenWhere

• 20+ years serving Fortune 500, Public Sector, and high

growth new ventures across multiple sectors including

telecommunications, media and entertainment,

remote sensing, defense, and intelligence.

• Held Senior Management Consulting Roles at Ernst &

Young's Center for Technology Enablement and

leadership at various start-up companies.

• Aerospace Engineer

24x7 video streaming from the International SpaceStation:

Ten months from whiteboard to Initial Operations

Ground

Terminal 1

CUSTOMER

SUPPORT

CENTER

Ground

Terminal 2

Ground

Terminal n

MISSION

OPERATION

CENTER

Cloud Data

Center 2

Cloud Data

Center 1

NETWORK

OPERATIONS

CENTER

Typical ground system environment has high degree of

operational complexity

Typical ground system has a high degree of

operational complexity

• 100’s - 1000’s of servers

• 4 Types of databases

clusters

• 2 HPC clusters

• 17 VLANS

• 7 internal firewalls

• Hardened Windows and

Linux Images

• 9 Major COTS Packages

• 15 Custom applications

in five languages

• 3 NAS devices –

Petabyte level storage

• Multiple locations

including public and

private Clouds

Systems Engineering and Program Management require

multiple environments throughout the mission...

• Need multiple, concurrent environments - Large systems require multiple copies of the environment to support concurrent activities. The different environments can include functional testing, pre-integration (component-to component testing), system integration, training, performance, user acceptance, training, and production simulation.

• Support Out of cycle, ad-hoc testing needs – Emergency production fixes, critical security patches, and other mission events can trigger activities that require on-demand environments to support these ad-hoc test requirements.

• Mimic production environment – test environments should be as close to the production configuration as possible in order to validate nonfunctional requirements like high availability, recovery, performance, etc.

...but have limited time & budget for

support.

And then the mission can change at a moments notice

AWS, Chef, and Ostrato made it possible to

accelerate the development life cycle.

• Too expensive to maintain

multiple environments

• Over 50% difference in

configurations between

environments

• Resources are focused on

production, alternative

environments are

secondary

1. Use AWS for low cost, on-

demand infrastructure

2. Use Chef to capture

environment

configurations as software

3. Use Ostrato to provide

self-service and cost

management

Traditional Approach OpenWhere Approach

Our approach required all three capabilities to meet

the requirements while reducing costs & schedule

Infrastructure Configuration Infrastructure Provisioning

Orchestration & Governance

Version

Control

Continuous

Integration

Chef

Server

Ostrato

Virtual Private Cloud

Create Initial System from marketplace

using HEAT/Cloud Formation Templates

Virtual servers include Chef

client which checks in

1

2

Developer

checks in recipe

Continuous integration

and unit testsDeploy to Chef

Server

3 4 5

The three components working together

Use Purpose Built Environments

Fixed, Static Environments

– Average 50% variance between

production & lower

environments

– Supporting environments have

variable demand, low overall

utilization, & minimal support

Dynamic, on-demand environments

– Built and scaled for specific

purposes

– Chef ensures no infrastructure

variance between environments

– Ostrato was used to orchestrate

the lifecycle of the environments

Fixed Environments

Development Staging

Integration Demonstration

Test /QA Training

Production Etc.

Dynamic Environments

Development for Sprint 11 (3 weeks)

User Test for User Story US 217 (14 hours)

Regression Test for Defect 42 (1 hour)

Performance Test for release 2.3.1 (4 hours)

Training for release 2.3.2 (8 hours/day)

Production for release 2.3.0 (1 month)

Create Programmatic Bill of Material

The entire system is captured as a software code. This allows the

infrastructure to be version controlled and replicated like any other

software asset.

Provide Self Service for the entire System

(not just servers)

Single server is not a viable unit of work

“In today’s distributed compute environment, developers can’t develop on local workstations.”

Teams want self-service to full systems, not servers

“I want an entire system not 12 servers, 2 subnets, database, NAT, load balancer, etc”

Teams aren’t good about clean up, so need guard rails (governance)

Summary

• 1st Program where infrastructure wasn’t a bottleneck

• Create parallel environments– De-conflicts development

activities

– Reduces schedule pressure

– Increases agility

• Need all three capabilities to be successful – AWS: Cloud Infrastructure

– Chef: Infrastructure Configuration Management

– Ostrato: Orchestration and governance

Q & A

Dale Wickizer

CTO, Ostrato

[email protected]

@dalewickizer

Nicolas Rycar

Automation Engineer, Chef

[email protected]

@rycar

Andrew Heifetz

Chief Cloud Officer, OpenWhere

[email protected]

@andyheifetz

Contact Information

improving devops through cloud automation and management - real-world rocket science with chef,...

Technology

authorthe chef server

chef work

itchef server

single chef server ha

new policy

central servernodechef

computing resources

nodechef client