about mecontinuouslifecycle.london/wp-content/uploads/2018/... · brave new world - greenfield...

45
May 16-17 2018 Mike Fowler, Senior Site Reliability Engineer Leveraging Automation for a Disposable Infrastructure

Upload: others

Post on 20-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

May 16-17 2018

Mike Fowler, Senior Site Reliability Engineer

Leveraging Automation for a Disposable Infrastructure

Page 2: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Senior Site Reliability Engineer in the Public Cloud Practice Background in Software & Systems Engineering, System & Database Administration Contributed to PostgreSQL, Terraform & YAWL PostgreSQL evangelist

May 16-17 2018

About Me

Page 3: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

So I like to think I know Data...

May 16-17 2018

Page 4: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

The story, all names, characters, and incidents portrayed in this production are fictitious. No identification with actual persons (living or deceased), places, buildings, and products is intended or should be inferred. Franchise coffee shops Our hero, a lowly Head of Systems Engineering is faced with the epic quest of moving to the cloud

May 16-17 2018

Our Hero’s Epic Quest

Page 5: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Use cloud as spare/batch capacity Duplicate existing estate in the cloud Brave New World- Greenfield development- “Version 2.0”

May 16-17 2018

Approaching Cloud Migration

Page 6: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Direct mapping of existing infrastructure to the cloud- Load balancers become Elastic Load Balancers- SANs become Buckets or Elastic File Systems Minimal operational change required- Everything is the same just in a new location Perceived as a “quick win” to cloud adoption- Little AWS/GCP/Azure specific knowledge required

May 16-17 2018

The Appeal of a Lift & Shift

Page 7: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

We’re changing only where our hardware is- Operationally no different then the past- Instance size based on current hardware size- No change to deployment process Under utilisation of resource- Still paying for excess capacity Stunted scalability- We can throw more virtual hardware at it- Add additional node behind load balancers

May 16-17 2018

The Penalty of a Lift & Shift

Page 8: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Our hero has a new CTO Recognises that we’re just moving our problems “We’re under-investing in the future”

May 16-17 2018

Brave New World

Page 9: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

No “legacy” baggage Free reign for experimentation Perceived as a “low risk” path to cloud adoption- If it doesn’t work, switch it off- “No risk” to existing production environment

May 16-17 2018

The Appeal of a Brave New World

Page 10: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Organisationally isolated- Limited impact to existing practices- Leads to a “Us vs. Them” mentality Focus is usually on application functionality with infrastructure seen as a necessity Project has a high risk of failure- Care free scoping leads to an unfocused project- Significant time can be lost to integrating with the old world

May 16-17 2018

The Penalty of a Brave New World

Page 11: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Are we just building a traditional but virtual data centre?- Lift & Shift is operationally the same- Brave New World isn’t part of the Real World How are we leveraging the power of a dynamic infrastructure? Our infrastructure is scalable, but is the application?

May 16-17 2018

Are we really “doing cloud”?

Page 12: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

This is not a new problem How do we move on from our comfortable past?

May 16-17 2018

Breaking the Mould

Page 13: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Conway’s law states you’re doomed to design your organisational structure

May 16-17 2018

● Conway’s Law:

“Organisations which design systems … are constrained to produce designs which are copies of the communication structures of these organisations”

- Melvin Conway, 1967

Breaking the Mould

Page 14: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Scaling of software isn’t just the same elements bigger, it’s an increase in different elements that interact in a nonlinear fashion. Complexity of the whole increases much more than lineraly.

May 16-17 2018

● No Silver Bullet:

“A scaling-up of a software entity is not merely a repetition of the same elements in larger size; it is necessarily an increase in the number of different elements. In most cases, the elements interact with each other in some nonlinear fashion, and the complexity of the whole increases much more than linearly.”

- Fred Brooks Jr., 1986

Breaking the Mould

Page 15: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Applying existing patterns at best misses out on possible improvements with new technology and at worst it adds more complexity.

May 16-17 2018

● Infrastructure as Code

“In many cases, applying existing patterns will, at best, miss out on opportunities to leverage newer technology to simplify and improve the architecture. At worst, replicating existing patterns with the newer platforms will involve adding even more complexity.”

-Kief Morris, 2016

Breaking the Mould

Page 16: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Systems should work correctly even in the face of adversity

May 16-17 2018

● Designing Data-Intensive Applications:

“The system should continue to work correctly (performing the correct function at the desired level of performance) even in the face of adversity (hardware or software faults, and even human error).”

- Martin Kleppmann, 2017

Breaking the Mould

Page 17: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Our hero needs a different approach

May 16-17 2018

● ●

A Different Approach●

Page 18: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

The more you care about individual things the more they will hold your attention In a truly scalable environment you should only care about the combination of many individual things

May 16-17 2018

Attitude

The attitude you have to your environment will determine the

limits of your scalability●

Page 19: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

You treat your servers like pets- You give them names (igloo, husky, snowshoe)- You give them homes (racks on site or co-located)- If they fail, you do everything you can to save them Every server is an investment- Often the best hardware that can be afforded- Amortised over years- Excess capacity to allow for growth Provisioning new servers takes weeks

May 16-17 2018

Attitude: Living in the Iron Age

Page 20: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

You treat your servers like cattle- They have identifiers- You care only where they are geographically- If they fail, you put them down and get a new one Your architecture is your investment- Configuration is chosen for your current load- Pay for what you use- Capacity can be added when required Provisioning new servers takes seconds

May 16-17 2018

Attitude: Living in the Cloud Age

Page 21: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Are we simply herding our pets?- In a Lift & Shift this is almost certainly so- Scaling groups is a start but it is not the end How are we managing our virtual servers?- Complex cloud-init scripts?- Traditional configuration management?

May 16-17 2018

Attitude: Is Pets v Cattle enough?

vs

Page 22: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Everything is a package and can be discarded You treat your servers like single use products- They’re pre-packaged for a particular purpose- If they fail, you toss it away and grab another You automate everything Never make a manual change

May 16-17 2018

Attitude: The Disposable Infrastructure

Page 23: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

(slide 1 of 2) Repeatability brings reliability and predictability Defining a build pipeline:- Ensures the same process is followed for every change- Provides an audit trail for every change- Gives visibility of your value stream

May 16-17 2018

Be Continuous

Continuous integration and delivery is a must

Page 24: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

(slide 2 of 2) Your developers probably already practice CI- It is the standard for code development- The output of CI can be the start of CD Continuous delivery doesn’t have to mean continuous deployment - Build pipelines can have approval stages- Every change should be deployable

May 16-17 2018

Be Continuous

Continuous integration and delivery is a must

Page 25: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Many applications expect a static infrastructure- Hard-coded assumptions that an IP address won’t change once an application is started Many applications are cluster unaware- Sticky sessions on load balancers can help- Some protocols don’t load balance well

May 16-17 2018

Refactoring to the Cloud

Your applications need to be (re)built to fit a dynamic

infrastructure

Page 26: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Refactor to contemporary architectural approaches- Service Oriented Architectures & Microservices- Transition from stateful services to stateless Package everything using distribution packagers- The output of your build pipeline is a RPM/DEB- Your $CM_TOOL already supports this Chose a deployment strategy -Machine images vs. containers

May 16-17 2018

Adopting Contemporary Approaches

Page 27: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Fear not vendor lock in, savings are to be reaped leveraging commodity services Use SQS instead of automating the installation and configuration of a message broker and accepting the operational burden of maintaining it Careful abstraction of the API will allow porting to a different platform if absolutely necessary

May 16-17 2018

Fear not Vendor Lock-In

Page 28: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

(slide 1/2) Design the infrastructure in parallel to the cloud aware application changes Mandate every instance is part of a scaling group to enforce cluster awareness Use the same principles for infrastructure development as you use for applications

May 16-17 2018

Infrastructure is Code

Dynamic infrastructure must be treated as a first class

citizen in any cloud project

Page 29: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

(slide 2/2) Script/encode everything unless there is no API/tooling support Deploy the same infrastructure in development, test and production environments- Sizing can be parameterised Your deployment pipeline becomes the assembly of application packages and infrastructure configuration High cohesion and loose coupling applies to infrastructure as much as it does to applications

May 16-17 2018

Infrastructure is Code

Dynamic infrastructure must be treated as a first class

citizen in any cloud project

Page 30: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

If it can go wrong, it will go wrong so think in terms of when and not if Treating our infrastructure and its hosted applications as disposable in conjunction with CD eliminates a number of failure scenarios

May 16-17 2018

Planning to fail

Planning to fail will lead to success

Page 31: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

(slide 1/3) Regularly test your disposability- Terminate instances at random to ensure resiliency- Block all network access to an instance - Chaos Monkey & the Simian Army- Trigger failovers for less disposable services Constantly churning disposable instances helps prevent configuration drift

May 16-17 2018

Planning to fail

Page 32: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

(slide 2/3) Availability and durability cost Identify points of failure and assess:- How often will this failure occur?- How do I mitigate this failure?- How do I test this failure to ensure mitigation?- Is the cost of mitigation worth the customer impact during failure?

May 16-17 2018

Planning to fail

Page 33: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

(slide 3/3) Be honest in assessing the worth of your business- Do you really need to double your costs to run in multiple regions? - Trello, Slack & many other high profile companies – including Amazon - were affected by the S3 outage

May 16-17 2018

Planning to fail

Page 34: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Test the durability of your data- User error is your biggest risk- - “I forgot the WHERE clause”- - “I thought I was in the test environment” Regularly exercise data loss & recovery scenarios in development and test environments Make back-ups and regularly test they restore- Consider storing backups in both S3 & Google- Store backups in multiple regions If you don’t want a full ELK stack at least ship log files to CloudWatch or Stackdriver

May 16-17 2018

Data is not Disposable

Data is not disposable and is probably more important

than your availability

Page 35: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Multiple backup strategies, all failed Multiple failures, same engineers, too much pressure, too tired, mistakes made

May 16-17 2018

https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/

A Lesson to Learn From

Page 36: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Jenkins solves all our problems! AWS solves all our problems! Docker solves all our problems! Kubernetes solves all our problems!

May 16-17 2018

Tooling is Not The Answer

Tooling is not the answer but it is part of an

automated solution

Page 37: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Let us assume we have a front end web application which places orders in a queue for subsequent asynchronous fulfilment by a separate application backed by a database. We’ve already refactored our applications for the cloud. We will have a CI pipeline for the applications, the output being AMI images A separate CD pipeline executes infrastructure code and rolls out the new AMIs Goal is to promote infrastructure and AMIs between environments

May 16-17 2018

Remember Our Hero?

Page 38: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Can create many different machine images Consider creating a base image to control OS updates Use normal configuration management tools- Support for Ansible, Chef & Puppet- Can just write shell script if you must Use placeholders for configuration to be filled by launch scripts

May 16-17 2018

https://packer.io

Packer

Page 39: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Source our code from a repo, build and test Package our application as a DEB or RPM Place our artifact into a S3 repository Run Packer to generate a new AMI

May 16-17 2018

Application Pipeline

Page 40: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Declarative language for the construction of infrastructure Supports all major vendors State can be stored in buckets to facilitate sharing Separate out infrastructure layers- Minimises blast radius of changes- Keep persistent apart from disposable

May 16-17 2018

https://terraform.io

Terraform

Page 41: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Triggered by new AMIs or Terraform code changes Apply Terraform to update the infrastructure Run integration tests to verify application build Wait for approval before promotion to next environment

May 16-17 2018

Infrastructure Pipeline

Page 42: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

Any instance can be terminated Resilient to zone failure Cross-region read replica allows DR for region failure- Just need to run Terraform in the region to add the instances when required and update Route 53

May 16-17 2018

Deployed Infrastructure

Page 43: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

May 16-17 2018

● Have attitude● Be continuous● Refactor to the Cloud● Infrastructure is code● Plan to fail● Data is King● Tooling is not The Answer

Summary

Page 44: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!

May 16-17 2018

Questions?

Mike Fowler

gh-mlfowler

mlfowler

mike dot fowler at claranet dot uk

Page 45: About Mecontinuouslifecycle.london/wp-content/uploads/2018/... · Brave New World - Greenfield development ... Docker solves all our problems! Kubernetes solves all our problems!