test driven infrastructure development

Post on 25-May-2015

947 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

"Test driven Infrastructure development" by Tomas Doran at Puppet Camp Barcelona 2013. Learn about upcoming Puppet Camps at http://puppetlabs.com/community/puppet-camp/

TRANSCRIPT

Test driven Infrastructure development

Tomas (t0m) Doran<tomas.doran@timgroup.com>@bobtfishhttps://github.com/bobtfishhttps://github.com/youdevise

Thursday, 14 March 13

‘Real men’ develop in production!

Thursday, 14 March 13

Repeat again and again. Development cycle SLOOOW.

‘Real men’ develop in production!

• Edit / Commit / Push

Thursday, 14 March 13

Repeat again and again. Development cycle SLOOOW.

‘Real men’ develop in production!

• Edit / Commit / Push

• Update puppetmaster

Thursday, 14 March 13

Repeat again and again. Development cycle SLOOOW.

‘Real men’ develop in production!

• Edit / Commit / Push

• Update puppetmaster

• puppet agent -t

Thursday, 14 March 13

Repeat again and again. Development cycle SLOOOW.

‘Real men’ develop in production!

• Edit / Commit / Push

• Update puppetmaster

• puppet agent -t

• Repeat

Thursday, 14 March 13

Repeat again and again. Development cycle SLOOOW.

This is insane!

Thursday, 14 March 13

CHOAS and FAIL result when you break each other. Or, MORE likely (this happens twice a day!)

This is insane!

• Try it on an 8 person team.

Thursday, 14 March 13

CHOAS and FAIL result when you break each other. Or, MORE likely (this happens twice a day!)

This is insane!

• Try it on an 8 person team.

Thursday, 14 March 13

CHOAS and FAIL result when you break each other. Or, MORE likely (this happens twice a day!)

This is insane!

• Try it on an 8 person team.

• ‘LOL - I broke puppet’

Thursday, 14 March 13

CHOAS and FAIL result when you break each other. Or, MORE likely (this happens twice a day!)

OI!!!

Thursday, 14 March 13

OI!!!

OI t0m!!!!

Thursday, 14 March 13

OI!!!

OI t0m!!!!

You broke

puppet!

Thursday, 14 March 13

OI!!!

OI t0m!!!!

You broke

puppet!

AARRRGGH!!!

Thursday, 14 March 13

Lets fix this!

• First, a glossary:

Thursday, 14 March 13

Lets fix this!

• First, a glossary:

• mco - mcollective

Thursday, 14 March 13

Lets fix this!

• First, a glossary:

• mco - mcollective

• ENC - External node classifier

Thursday, 14 March 13

We can do better

Thursday, 14 March 13

This at least lets you develop things independently. Everyone can do dev in their own branch and merge once they have something that doesn’t break _everything_. You can also rebase -i (squash) all the ARGH PUPPET SYNTAX commits.

We can do better

• Branch == environment

Thursday, 14 March 13

This at least lets you develop things independently. Everyone can do dev in their own branch and merge once they have something that doesn’t break _everything_. You can also rebase -i (squash) all the ARGH PUPPET SYNTAX commits.

We can do better

• Branch == environment

• Branch / Commit / Push

Thursday, 14 March 13

This at least lets you develop things independently. Everyone can do dev in their own branch and merge once they have something that doesn’t break _everything_. You can also rebase -i (squash) all the ARGH PUPPET SYNTAX commits.

We can do better

• Branch == environment

• Branch / Commit / Push

• mco puppetupdate

Thursday, 14 March 13

This at least lets you develop things independently. Everyone can do dev in their own branch and merge once they have something that doesn’t break _everything_. You can also rebase -i (squash) all the ARGH PUPPET SYNTAX commits.

We can do better

• Branch == environment

• Branch / Commit / Push

• mco puppetupdate

• puppet agent -t --environment xxx

Thursday, 14 March 13

This at least lets you develop things independently. Everyone can do dev in their own branch and merge once they have something that doesn’t break _everything_. You can also rebase -i (squash) all the ARGH PUPPET SYNTAX commits.

Sounds good?

•Then you’ll be wanting:

•https://github.com/youdevise/puppetupdate

Thursday, 14 March 13

It’s a bit basic, but then I ripped it out of work internal code at 8am ;)

So we fixed it?

Thursday, 14 March 13

So we fixed it?

Thursday, 14 March 13

Refactoring

Thursday, 14 March 13

Sorry Chris, but when you say ‘refactoring’ - it’s not refactoring unless you have tests.The problem is that you can’t always remember to run the right branch on all the right nodes. Or rather, how do you even know what all the right nodes are? And if you’re hacking on custom functions, or anything using exported resource - WOE

Refactoring

• We change things to be consistent across codebase:

• Why did puppet just delete all the firewall rules on the production database?

Thursday, 14 March 13

Sorry Chris, but when you say ‘refactoring’ - it’s not refactoring unless you have tests.The problem is that you can’t always remember to run the right branch on all the right nodes. Or rather, how do you even know what all the right nodes are? And if you’re hacking on custom functions, or anything using exported resource - WOE

Refactoring

• We change things to be consistent across codebase:

• Why did puppet just delete all the firewall rules on the production database?

• We don’t refactor:

• Add bugs all the time due to inconsistency

Thursday, 14 March 13

Sorry Chris, but when you say ‘refactoring’ - it’s not refactoring unless you have tests.The problem is that you can’t always remember to run the right branch on all the right nodes. Or rather, how do you even know what all the right nodes are? And if you’re hacking on custom functions, or anything using exported resource - WOE

• Hard coded IPs in 10 places

Unfortunate reality:

Thursday, 14 March 13

So, despite our best efforts, our puppet code was SHIIIIT.Exported resources IS NOT a good fit for non-trivial things (like generating load balancer configs). Ergo lots of hard coded IPs in multiple places. Ergo puppet code per site.

• Hard coded IPs in 10 places

• role::oy_lb

Unfortunate reality:

Thursday, 14 March 13

So, despite our best efforts, our puppet code was SHIIIIT.Exported resources IS NOT a good fit for non-trivial things (like generating load balancer configs). Ergo lots of hard coded IPs in multiple places. Ergo puppet code per site.

• Hard coded IPs in 10 places

• role::oy_lb

• hiera data split by domain (colo)

Unfortunate reality:

Thursday, 14 March 13

So, despite our best efforts, our puppet code was SHIIIIT.Exported resources IS NOT a good fit for non-trivial things (like generating load balancer configs). Ergo lots of hard coded IPs in multiple places. Ergo puppet code per site.

• Hard coded IPs in 10 places

• role::oy_lb

• hiera data split by domain (colo)

• mco puppet

Unfortunate reality:

Thursday, 14 March 13

So, despite our best efforts, our puppet code was SHIIIIT.Exported resources IS NOT a good fit for non-trivial things (like generating load balancer configs). Ergo lots of hard coded IPs in multiple places. Ergo puppet code per site.

• Hard coded IPs in 10 places

• role::oy_lb

• hiera data split by domain (colo)

• mco puppet

• 4 weeks per app per environment

Unfortunate reality:

Thursday, 14 March 13

So, despite our best efforts, our puppet code was SHIIIIT.Exported resources IS NOT a good fit for non-trivial things (like generating load balancer configs). Ergo lots of hard coded IPs in multiple places. Ergo puppet code per site.

The state of the art

Thursday, 14 March 13

The state of the art

• It’s certainly in a state

Thursday, 14 March 13

Nobody does automatic runsPuppet becomes an auditing tool (automatic noop runs + reports)

The state of the art

• It’s certainly in a state

• Automatic runs dangerous

Thursday, 14 March 13

Nobody does automatic runsPuppet becomes an auditing tool (automatic noop runs + reports)

The state of the art

• It’s certainly in a state

• Automatic runs dangerous

• cron --noop runs

Thursday, 14 March 13

Nobody does automatic runsPuppet becomes an auditing tool (automatic noop runs + reports)

The state of the art

• It’s certainly in a state

• Automatic runs dangerous

• cron --noop runs

• puppet becomes an auditing system

Thursday, 14 March 13

Nobody does automatic runsPuppet becomes an auditing tool (automatic noop runs + reports)

The state of the art

• It’s certainly in a state

• Automatic runs dangerous

• cron --noop runs

• puppet becomes an auditing system

• This isn’t what I signed up for!

Thursday, 14 March 13

Nobody does automatic runsPuppet becomes an auditing tool (automatic noop runs + reports)

Business says no!

Thursday, 14 March 13

Business says no!

• Launching new products has a long lead time

• This is unhelpful if your company is trying to branch out into new markets

Thursday, 14 March 13

Business says no!

• Launching new products has a long lead time

• This is unhelpful if your company is trying to branch out into new markets

• CI / stage environments unlike prod

• Issues when new functionality goes live

• Developers think you’re incompetent

Thursday, 14 March 13

What is wrong with this picture?

Thursday, 14 March 13

You just don’t know the answer to any of these questions in any reliable way...But, generally, the answers are NO, YES, NO, NO

What is wrong with this picture?•Did you run it everywhere?

Thursday, 14 March 13

You just don’t know the answer to any of these questions in any reliable way...But, generally, the answers are NO, YES, NO, NO

What is wrong with this picture?•Did you run it everywhere?

•Does it affect anything you’re not expecting?

Thursday, 14 March 13

You just don’t know the answer to any of these questions in any reliable way...But, generally, the answers are NO, YES, NO, NO

What is wrong with this picture?•Did you run it everywhere?

•Does it affect anything you’re not expecting?

•Can you rebuild cleanly?

Thursday, 14 March 13

You just don’t know the answer to any of these questions in any reliable way...But, generally, the answers are NO, YES, NO, NO

What is wrong with this picture?•Did you run it everywhere?

•Does it affect anything you’re not expecting?

•Can you rebuild cleanly?

•Does the code even make things reflect current state?

Thursday, 14 March 13

You just don’t know the answer to any of these questions in any reliable way...But, generally, the answers are NO, YES, NO, NO

‘We use puppet’

Thursday, 14 March 13

Hint - you don’t!

‘We use puppet’

• Means nothing

Thursday, 14 March 13

Hint - you don’t!

‘We use puppet’

• Means nothing

• State of your system is the sum of all changes

Thursday, 14 March 13

Hint - you don’t!

‘We use puppet’

• Means nothing

• State of your system is the sum of all changes

• How do you know your code can rebuild things?

Thursday, 14 March 13

Hint - you don’t!

It’s all mierda

Thursday, 14 March 13

We need to grow up, and raise the level of the conversation..

It’s all mierda

•Development communities are 10 years ahead

Thursday, 14 March 13

We need to grow up, and raise the level of the conversation..

It’s all mierda

•Development communities are 10 years ahead

•We don’t integration test

• (repeatably)

Thursday, 14 March 13

We need to grow up, and raise the level of the conversation..

It’s all mierda

•Development communities are 10 years ahead

•We don’t integration test

• (repeatably)

•We can’t build / rebuild

• (reliably)

Thursday, 14 March 13

We need to grow up, and raise the level of the conversation..

Infra is hard

Thursday, 14 March 13

Sure - it’s much much harder to get a standalone testable system in infra than it is in development.

Infra is hard

• Infrastructure is inherently more complex

Thursday, 14 March 13

Sure - it’s much much harder to get a standalone testable system in infra than it is in development.

Infra is hard

• Infrastructure is inherently more complex

• Less control

Thursday, 14 March 13

Sure - it’s much much harder to get a standalone testable system in infra than it is in development.

Infra is hard

• Infrastructure is inherently more complex

• Less control

• More moving parts

Thursday, 14 March 13

Sure - it’s much much harder to get a standalone testable system in infra than it is in development.

Infra is hard

• Infrastructure is inherently more complex

• Less control

• More moving parts

• ‘End to end’ testing

Thursday, 14 March 13

Sure - it’s much much harder to get a standalone testable system in infra than it is in development.

Infra is hard

• Infrastructure is inherently more complex

• Less control

• More moving parts

• ‘End to end’ testing

• Persistent data

Thursday, 14 March 13

Sure - it’s much much harder to get a standalone testable system in infra than it is in development.

No excuses:Scientific method

Thursday, 14 March 13

I do not consider this an excuse to abandon sanity.

The solution?

Thursday, 14 March 13

The solution?

• Re-provision everything in tests

• N.B. Not perfect (but better!)

Thursday, 14 March 13

The solution?

• Re-provision everything in tests

• N.B. Not perfect (but better!)

Thursday, 14 March 13

The solution?

• Re-provision everything in tests

• N.B. Not perfect (but better!)

• Proper software engineering

• Unit and integration tests

• Build pipeline + promotion

Thursday, 14 March 13

Openstack

• Our tests spinning up 12 machines => VMs

Thursday, 14 March 13

So, we should use openstack, right? As of December, when we looked - 2 networks max, inflexible. lvs not possible.

Openstack

• Our tests spinning up 12 machines => VMs

• Openstack going to be awesome, right now:

Thursday, 14 March 13

So, we should use openstack, right? As of December, when we looked - 2 networks max, inflexible. lvs not possible.

Openstack

• Our tests spinning up 12 machines => VMs

• Openstack going to be awesome, right now:

• Networking sucks

Thursday, 14 March 13

So, we should use openstack, right? As of December, when we looked - 2 networks max, inflexible. lvs not possible.

Openstack

• Our tests spinning up 12 machines => VMs

• Openstack going to be awesome, right now:

• Networking sucks

• Load balancing is a shambles

Thursday, 14 March 13

So, we should use openstack, right? As of December, when we looked - 2 networks max, inflexible. lvs not possible.

Openstack

• Our tests spinning up 12 machines => VMs

• Openstack going to be awesome, right now:

• Networking sucks

• Load balancing is a shambles

• lvs / vlans / metal / bonding - nope

Thursday, 14 March 13

So, we should use openstack, right? As of December, when we looked - 2 networks max, inflexible. lvs not possible.

My desires:

Thursday, 14 March 13

My desires:• Reuse as much code as possible! (e.g. load

balancers)

Thursday, 14 March 13

My desires:• Reuse as much code as possible! (e.g. load

balancers)

• No per colo/environment puppet code

Thursday, 14 March 13

My desires:• Reuse as much code as possible! (e.g. load

balancers)

• No per colo/environment puppet code

• No IPs anywhere

Thursday, 14 March 13

My desires:• Reuse as much code as possible! (e.g. load

balancers)

• No per colo/environment puppet code

• No IPs anywhere

• ‘DRY’

Thursday, 14 March 13

My desires:• Reuse as much code as possible! (e.g. load

balancers)

• No per colo/environment puppet code

• No IPs anywhere

• ‘DRY’

• CI pipeline to promote to production

Thursday, 14 March 13

My desires:• Reuse as much code as possible! (e.g. load

balancers)

• No per colo/environment puppet code

• No IPs anywhere

• ‘DRY’

• CI pipeline to promote to production

• 1 puppet run from provisioned to working

Thursday, 14 March 13

My desires:• Reuse as much code as possible! (e.g. load

balancers)

• No per colo/environment puppet code

• No IPs anywhere

• ‘DRY’

• CI pipeline to promote to production

• 1 puppet run from provisioned to working

• Repeatable and testable!

Thursday, 14 March 13

Orc

Thursday, 14 March 13

Orc• Continuous (zero downtime) deployment

Thursday, 14 March 13

Orc• Continuous (zero downtime) deployment

• Development / infrastructure application contract

Thursday, 14 March 13

Orc• Continuous (zero downtime) deployment

• Development / infrastructure application contract

• Model driven

Thursday, 14 March 13

Orc• Continuous (zero downtime) deployment

• Development / infrastructure application contract

• Model driven

• https://github.com/youdevise/orc/

Thursday, 14 March 13

Puppetroll

Thursday, 14 March 13

Puppetroll

• Rolls out a consistent sha1 from the puppetmaster to an entire environment

Thursday, 14 March 13

Puppetroll

• Rolls out a consistent sha1 from the puppetmaster to an entire environment

• Fails if any puppet run fails

Thursday, 14 March 13

Puppetroll

• Rolls out a consistent sha1 from the puppetmaster to an entire environment

• Fails if any puppet run fails

• https://github.com/youdevise/puppetroll

Thursday, 14 March 13

Provisioning tools

Thursday, 14 March 13

Provisioning tools

• debootstrap custom gold images

Thursday, 14 March 13

Provisioning tools

• debootstrap custom gold images

• mcollective ‘computenode’ agent for kvm

Thursday, 14 March 13

Provisioning tools

• debootstrap custom gold images

• mcollective ‘computenode’ agent for kvm

• ‘provision me a machine called X, on networks Y and Z’

Thursday, 14 March 13

Provisioning tools

• debootstrap custom gold images

• mcollective ‘computenode’ agent for kvm

• ‘provision me a machine called X, on networks Y and Z’

• Dynamic IP allocation (dnsmasq locally, DDNS for real)

Thursday, 14 March 13

stacks

Thursday, 14 March 13

stacks

• Model driven deployment

Thursday, 14 March 13

stacks

• Model driven deployment

• DSL for describing groups of systems + dependencies

Thursday, 14 March 13

stacks

• Model driven deployment

• DSL for describing groups of systems + dependencies

• rake tasks to provision / test / clean up stack + deps

Thursday, 14 March 13

stacks

• Model driven deployment

• DSL for describing groups of systems + dependencies

• rake tasks to provision / test / clean up stack + deps

• Can provision a full environment, run E2E tests, tear it down - in CI.

Thursday, 14 March 13

Thursday, 14 March 13

Thursday, 14 March 13

I want to hack on load balancers

= 4 new, independent machines

Thursday, 14 March 13

How it works?

Thursday, 14 March 13

How it works?

• DSL creates model of systems

Thursday, 14 March 13

How it works?

• DSL creates model of systems

• rake task ‘launch’:

Thursday, 14 March 13

How it works?

• DSL creates model of systems

• rake task ‘launch’:

• mco provisions boxes on compute nodes

Thursday, 14 March 13

How it works?

• DSL creates model of systems

• rake task ‘launch’:

• mco provisions boxes on compute nodes

• each box runs puppet --waitforcert

Thursday, 14 March 13

How it works?

• DSL creates model of systems

• rake task ‘launch’:

• mco provisions boxes on compute nodes

• each box runs puppet --waitforcert

• mco signs cert

Thursday, 14 March 13

How it works?

• DSL creates model of systems

• rake task ‘launch’:

• mco provisions boxes on compute nodes

• each box runs puppet --waitforcert

• mco signs cert

• puppet runs for each box

Thursday, 14 March 13

mco computenode

Thursday, 14 March 13

Puppetmaster

Thursday, 14 March 13

Puppetmaster

• Uses the same model

Thursday, 14 March 13

Puppetmaster

• Uses the same model

• Generates an ENC for each node

Thursday, 14 March 13

Puppetmaster

• Uses the same model

• Generates an ENC for each node

• Puppet code:

Thursday, 14 March 13

Puppetmaster

• Uses the same model

• Generates an ENC for each node

• Puppet code:

• Just installs things / starts services

Thursday, 14 March 13

Puppetmaster

• Uses the same model

• Generates an ENC for each node

• Puppet code:

• Just installs things / starts services

• I.E. what it’s good at!

Thursday, 14 March 13

External node classifier

Thursday, 14 March 13

Putting it together

Thursday, 14 March 13

So, what do we have? Well - everything I showed you already...Building proxy server layer (by refactoring puppet code) right now. Databases to follow!

Putting it together

• Still ongoing - live production apps ETA two weeks.

Thursday, 14 March 13

So, what do we have? Well - everything I showed you already...Building proxy server layer (by refactoring puppet code) right now. Databases to follow!

Putting it together

• Still ongoing - live production apps ETA two weeks.

• Still haven’t solved re-provisioning problem for live environments!

Thursday, 14 March 13

So, what do we have? Well - everything I showed you already...Building proxy server layer (by refactoring puppet code) right now. Databases to follow!

Putting it together

• Still ongoing - live production apps ETA two weeks.

• Still haven’t solved re-provisioning problem for live environments!

• Do have repeatable and testable / tested infrastructure building in CI!

Thursday, 14 March 13

So, what do we have? Well - everything I showed you already...Building proxy server layer (by refactoring puppet code) right now. Databases to follow!

Thursday, 14 March 13

Thursday, 14 March 13

The top table is our test overview - we have two types of tests, those which are for a specific machine (i.e. a VM) and those which are for a virtual service (backed by multiple machines)‘behaves like’ is an rspec thing we haven’t overridden.For each machine, we test that it’s pingable, then run every nrpe (nagios) agent and check them - if all the things nagios monitors are OK, machine is OK.

In the (near) future?

Thursday, 14 March 13

In the (near) future?

• Live application stack in production

Thursday, 14 March 13

In the (near) future?

• Live application stack in production

• Automated ‘promotion’ of good changes to production

Thursday, 14 March 13

In the (near) future?

• Live application stack in production

• Automated ‘promotion’ of good changes to production

• Integrated environment support for dev stacks on dev branches/environments

Thursday, 14 March 13

In the (near) future?

• Live application stack in production

• Automated ‘promotion’ of good changes to production

• Integrated environment support for dev stacks on dev branches/environments

• Open source all the things!

Thursday, 14 March 13

Thanks!

Thursday, 14 March 13

Thanks!

• puppet is an awesome tool.

• It doesn’t solve higher level system modeling problems

• It shouldn’t try to!

Thursday, 14 March 13

Thanks!

• puppet is an awesome tool.

• It doesn’t solve higher level system modeling problems

• It shouldn’t try to!

• sysadmins need to level up

• It’s not done till you can test it still works

Thursday, 14 March 13

Photo Credits• Escher's "Relativity" in LEGO - Andrew Lipson (http://www.andrewlipson.com)

• Manure - Flickr - chesbayprogram

• Provisions - Flickr - quinn.anya

• Stacked - Flickr - andrewrennie

• Dilbert - Flickr - osde-info

• Stacking wood - fickr - arthuserea

• Square wheels - Flickr - vrogy

• Puppets - Flickr - SkipSteuart

• Light bulb - Flickr - bazik

• This-is-not-art - Wikimedia commons - Loran Davis

• Danger of death - Flickr - zigazou76

• Bob the Builder - Flickr - jamesclay

• Swiss roll - Flickr - add1sun

• Orc - Flickr - photo_munki

• Danger! Danger - Flickr - donsolo

• Cow of the future - Flickr - thewamphyri

• SCIENCE - Flickr - chasblackman

Thursday, 14 March 13

Links!

• http://github.com/youdevise

• http://github.com/bobtfish

• https://devblog.timgroup.com/

• (Yes, we are hiring)

Thursday, 14 March 13

top related