clearly, i have made some bad decisions

77

Upload: jonathan-hitchcock

Post on 12-Jul-2015

4.752 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Clearly, I Have Made Some Bad Decisions
Page 2: Clearly, I Have Made Some Bad Decisions

“I don’t have scaling problems”

Page 3: Clearly, I Have Made Some Bad Decisions

Scaling is about change

not about quantity

Page 4: Clearly, I Have Made Some Bad Decisions

Problems don’t occur when things are normal

Page 5: Clearly, I Have Made Some Bad Decisions

If things change, you will have scaling problems

Page 6: Clearly, I Have Made Some Bad Decisions

Work takes time to do

Page 7: Clearly, I Have Made Some Bad Decisions

Work takes time to do

Email needs to be read

Page 8: Clearly, I Have Made Some Bad Decisions

Work takes time to do

Email needs to be read

Code runs on a server

Page 9: Clearly, I Have Made Some Bad Decisions

Mistake!“I don’t have scaling

problems”

Page 10: Clearly, I Have Made Some Bad Decisions

Mistake!“I don’t have scaling

problems”Not a mistake we’re making if we’re here?

Page 11: Clearly, I Have Made Some Bad Decisions

Mistakes will be made

Problems will happen

Page 12: Clearly, I Have Made Some Bad Decisions

Mistakes will be made

Problems will happen

But there are things we can do to be prepared

Page 13: Clearly, I Have Made Some Bad Decisions

#1 Measure Everything

Page 14: Clearly, I Have Made Some Bad Decisions

How do you know if something is wrong?

Page 15: Clearly, I Have Made Some Bad Decisions

How do you know if something is wrong?

not wrong?

Page 16: Clearly, I Have Made Some Bad Decisions

# uptime 17:27:18 up 405 days, 2:36, 1 user, load average: 26.93, 10.46, 6.16

!?!?

Page 17: Clearly, I Have Made Some Bad Decisions

# uptime 17:27:18 up 405 days, 2:36, 1 user, load average: 26.93, 10.46, 6.16

Page 18: Clearly, I Have Made Some Bad Decisions
Page 19: Clearly, I Have Made Some Bad Decisions

Read your log files

Page 20: Clearly, I Have Made Some Bad Decisions

Read your log files(Exceptions aren’t always exceptional)

Page 21: Clearly, I Have Made Some Bad Decisions

Measure in production (hat tip: Coda, “metrics, metrics everywhere”)

That’s the only place where things are really happening

Page 22: Clearly, I Have Made Some Bad Decisions

Measure in production (hat tip: Coda, “metrics, metrics everywhere”)

That’s the only place where things are really happening

But don’t let your metrics causeperformance problems

Page 23: Clearly, I Have Made Some Bad Decisions

PING web (192.168.19.1): 56 data bytesRequest timeout for icmp_seq 0Request timeout for icmp_seq 1Request timeout for icmp_seq 2Request timeout for icmp_seq 3

Sometimes you can just tell things are wrong

Page 24: Clearly, I Have Made Some Bad Decisions

#2 Infrastructure as code(and config management)

Page 25: Clearly, I Have Made Some Bad Decisions

Don’t do this.

Page 26: Clearly, I Have Made Some Bad Decisions

Chef or Puppet(or cfengine or bcfg2)

Server config is code

Page 27: Clearly, I Have Made Some Bad Decisions

Chef or Puppet(or cfengine or bcfg2)

Server config is codeRevision control

Page 28: Clearly, I Have Made Some Bad Decisions

Chef or Puppet(or cfengine or bcfg2)

Server config is codeRevision control

Feature branches

Page 29: Clearly, I Have Made Some Bad Decisions

Chef or Puppet(or cfengine or bcfg2)

Server config is codeRevision control

Feature branchesCommenting and authorship

Page 30: Clearly, I Have Made Some Bad Decisions

Chef or Puppet(or cfengine or bcfg2)

Server config is codeRevision control

Feature branchesCommenting and authorship Centralized

(not in someone’s head)

Page 31: Clearly, I Have Made Some Bad Decisions

Should I choose Chef or Puppet?

Page 32: Clearly, I Have Made Some Bad Decisions

Should I choose Chef or Puppet?

Yes(Seriously, this is non-negotiable.)

Page 33: Clearly, I Have Made Some Bad Decisions

How do I switch my servers to start using config management?

My advice:build new ones, throw the old

ones away.

Page 34: Clearly, I Have Made Some Bad Decisions

Clean Known state

test clustersBuild

Page 35: Clearly, I Have Made Some Bad Decisions

Clean Known state

test clustersDestroy

Build

Page 36: Clearly, I Have Made Some Bad Decisions

Clean Known state

test clusters

live machines

DestroyBuild

Build

Page 37: Clearly, I Have Made Some Bad Decisions

Clean Known state

test clusters

live machines

DestroyBuild

Build

Use

Page 38: Clearly, I Have Made Some Bad Decisions

Clean Known state

test clusters

live machines!

DestroyBuild

BuildUse

Destroy

Page 39: Clearly, I Have Made Some Bad Decisions

One-button servers

What about your code?

Page 40: Clearly, I Have Made Some Bad Decisions

#3a Real deployment

Page 41: Clearly, I Have Made Some Bad Decisions

Don’t do this.

$ svn upU www/index.phpU www/payments.phpU www/settings-live.phpU www/settings-dev.phpA www/specials.php U .Updated to revision 9703.

Page 42: Clearly, I Have Made Some Bad Decisions

Deployment is more than just putting code in place.

Page 43: Clearly, I Have Made Some Bad Decisions

Deployment is more than just putting code in place.

reproducible idempotent rollouts

Page 44: Clearly, I Have Made Some Bad Decisions

Deployment is more than just putting code in place.

reproducible idempotent rollouts

tied to a known build number

Page 45: Clearly, I Have Made Some Bad Decisions

Deployment is more than just putting code in place.

reproducible idempotent rollouts

tied to a known build number

with separately-versioned known configuration

Page 46: Clearly, I Have Made Some Bad Decisions

Deployment is more than just putting code in place.

reproducible idempotent rollouts

tied to a known build number

with separately-versioned known configuration

triggered non-manually across any number of servers

Page 47: Clearly, I Have Made Some Bad Decisions

Deployment is more than just putting code in place.

reproducible idempotent rollouts

tied to a known build number

with separately-versioned known configuration

triggered non-manually across any number of servers

with full dependency management

Page 48: Clearly, I Have Made Some Bad Decisions

Deployment is more than just putting code in place.

reproducible idempotent rollouts

tied to a known build number

with separately-versioned known configuration

triggered non-manually across any number of servers

with full dependency management

and automated regression testing.

Page 49: Clearly, I Have Made Some Bad Decisions

Etsy’s Deployinator

Vlad the Deployer

Fabric

Capistrano

OS Packages

Roll your own

Page 50: Clearly, I Have Made Some Bad Decisions

#3b Continuous deployment

Page 51: Clearly, I Have Made Some Bad Decisions

Holy Grailtrunk = live

tests block commits

feature flags?

dark launches?

Page 52: Clearly, I Have Made Some Bad Decisions

Cowboy

vs

Perfectionist

Page 53: Clearly, I Have Made Some Bad Decisions

Fast iteration = fast test results

One huge feature tested... and rejected

Ten new tiny features testedTwo accepted

Page 54: Clearly, I Have Made Some Bad Decisions

Failure is comfortable

Blame out, responsibility in

Consequences immediately visible

Page 55: Clearly, I Have Made Some Bad Decisions

Okay, fine:Continuous Integration

Page 56: Clearly, I Have Made Some Bad Decisions

Things still go wrong

After all that

Page 57: Clearly, I Have Made Some Bad Decisions

#4 Plan for failure

Page 58: Clearly, I Have Made Some Bad Decisions

Take backups

Test backups

Page 59: Clearly, I Have Made Some Bad Decisions

Automate servers

Test server crashes

Page 60: Clearly, I Have Made Some Bad Decisions

Netflix’s Chaos Monkey

And cousins: the Simian Army

Page 61: Clearly, I Have Made Some Bad Decisions

Server failures predicted and foiled

What about code? New features?

Page 62: Clearly, I Have Made Some Bad Decisions

#5 Future Compatibility

Page 63: Clearly, I Have Made Some Bad Decisions

ALTER TABLE `user` ADD COLUMN `twootr` VARCHAR(16);CREATE INDEX `twootr_idx` ON `user` (`twootr`);

Don’t do this.(on live)

Page 64: Clearly, I Have Made Some Bad Decisions

“Future compatible” schemas

“Future compatible” code

Normalized tables are performance heavy

Don’t assume any columns?

Page 65: Clearly, I Have Made Some Bad Decisions

Shiny new Yucky old

?

ReadWrite

Migrate

Page 66: Clearly, I Have Made Some Bad Decisions

What about other bad decisions?

Page 67: Clearly, I Have Made Some Bad Decisions

#6 Wing It

Page 68: Clearly, I Have Made Some Bad Decisions

- Django- MySQL

spof.yola.com

Scheduled for reboot

Page 69: Clearly, I Have Made Some Bad Decisions

- Django- MySQL

spof.yola.com

- MySQL

Slave Replication

Page 70: Clearly, I Have Made Some Bad Decisions

- Django- MySQL

spof.yola.com

- MySQL

Page 71: Clearly, I Have Made Some Bad Decisions

- Django

spof.yola.com

- Django

- MySQL - MySQL

Slave replication

Page 72: Clearly, I Have Made Some Bad Decisions

- Django

spof.yola.com

- Django

- MySQL - MySQL

LB

Slave replication

Page 73: Clearly, I Have Made Some Bad Decisions

- Django

spof.yola.com

- Django

- MySQL - MySQL

LB

Slave replication

Drop DNS TTL

Page 74: Clearly, I Have Made Some Bad Decisions

- Django

spof.yola.com

- Django

- MySQL - MySQL

LB

Slave replication

Page 75: Clearly, I Have Made Some Bad Decisions
Page 76: Clearly, I Have Made Some Bad Decisions

But it’s okay

Page 77: Clearly, I Have Made Some Bad Decisions

Jonathan Hitchcock

@vhata

github.com/vhata