clearly, i have made some bad decisions
TRANSCRIPT
“I don’t have scaling problems”
Scaling is about change
not about quantity
Problems don’t occur when things are normal
If things change, you will have scaling problems
Work takes time to do
Work takes time to do
Email needs to be read
Work takes time to do
Email needs to be read
Code runs on a server
Mistake!“I don’t have scaling
problems”
Mistake!“I don’t have scaling
problems”Not a mistake we’re making if we’re here?
Mistakes will be made
Problems will happen
Mistakes will be made
Problems will happen
But there are things we can do to be prepared
#1 Measure Everything
How do you know if something is wrong?
How do you know if something is wrong?
not wrong?
# uptime 17:27:18 up 405 days, 2:36, 1 user, load average: 26.93, 10.46, 6.16
!?!?
# uptime 17:27:18 up 405 days, 2:36, 1 user, load average: 26.93, 10.46, 6.16
Read your log files
Read your log files(Exceptions aren’t always exceptional)
Measure in production (hat tip: Coda, “metrics, metrics everywhere”)
That’s the only place where things are really happening
Measure in production (hat tip: Coda, “metrics, metrics everywhere”)
That’s the only place where things are really happening
But don’t let your metrics causeperformance problems
PING web (192.168.19.1): 56 data bytesRequest timeout for icmp_seq 0Request timeout for icmp_seq 1Request timeout for icmp_seq 2Request timeout for icmp_seq 3
Sometimes you can just tell things are wrong
#2 Infrastructure as code(and config management)
Don’t do this.
Chef or Puppet(or cfengine or bcfg2)
Server config is code
Chef or Puppet(or cfengine or bcfg2)
Server config is codeRevision control
Chef or Puppet(or cfengine or bcfg2)
Server config is codeRevision control
Feature branches
Chef or Puppet(or cfengine or bcfg2)
Server config is codeRevision control
Feature branchesCommenting and authorship
Chef or Puppet(or cfengine or bcfg2)
Server config is codeRevision control
Feature branchesCommenting and authorship Centralized
(not in someone’s head)
Should I choose Chef or Puppet?
Should I choose Chef or Puppet?
Yes(Seriously, this is non-negotiable.)
How do I switch my servers to start using config management?
My advice:build new ones, throw the old
ones away.
Clean Known state
test clustersBuild
Clean Known state
test clustersDestroy
Build
Clean Known state
test clusters
live machines
DestroyBuild
Build
Clean Known state
test clusters
live machines
DestroyBuild
Build
Use
Clean Known state
test clusters
live machines!
DestroyBuild
BuildUse
Destroy
One-button servers
What about your code?
#3a Real deployment
Don’t do this.
$ svn upU www/index.phpU www/payments.phpU www/settings-live.phpU www/settings-dev.phpA www/specials.php U .Updated to revision 9703.
Deployment is more than just putting code in place.
Deployment is more than just putting code in place.
reproducible idempotent rollouts
Deployment is more than just putting code in place.
reproducible idempotent rollouts
tied to a known build number
Deployment is more than just putting code in place.
reproducible idempotent rollouts
tied to a known build number
with separately-versioned known configuration
Deployment is more than just putting code in place.
reproducible idempotent rollouts
tied to a known build number
with separately-versioned known configuration
triggered non-manually across any number of servers
Deployment is more than just putting code in place.
reproducible idempotent rollouts
tied to a known build number
with separately-versioned known configuration
triggered non-manually across any number of servers
with full dependency management
Deployment is more than just putting code in place.
reproducible idempotent rollouts
tied to a known build number
with separately-versioned known configuration
triggered non-manually across any number of servers
with full dependency management
and automated regression testing.
Etsy’s Deployinator
Vlad the Deployer
Fabric
Capistrano
OS Packages
Roll your own
#3b Continuous deployment
Holy Grailtrunk = live
tests block commits
feature flags?
dark launches?
Cowboy
vs
Perfectionist
Fast iteration = fast test results
One huge feature tested... and rejected
Ten new tiny features testedTwo accepted
Failure is comfortable
Blame out, responsibility in
Consequences immediately visible
Okay, fine:Continuous Integration
Things still go wrong
After all that
#4 Plan for failure
Take backups
Test backups
Automate servers
Test server crashes
Netflix’s Chaos Monkey
And cousins: the Simian Army
Server failures predicted and foiled
What about code? New features?
#5 Future Compatibility
ALTER TABLE `user` ADD COLUMN `twootr` VARCHAR(16);CREATE INDEX `twootr_idx` ON `user` (`twootr`);
Don’t do this.(on live)
“Future compatible” schemas
“Future compatible” code
Normalized tables are performance heavy
Don’t assume any columns?
Shiny new Yucky old
?
ReadWrite
Migrate
What about other bad decisions?
#6 Wing It
- Django- MySQL
spof.yola.com
Scheduled for reboot
- Django- MySQL
spof.yola.com
- MySQL
Slave Replication
- Django- MySQL
spof.yola.com
- MySQL
- Django
spof.yola.com
- Django
- MySQL - MySQL
Slave replication
- Django
spof.yola.com
- Django
- MySQL - MySQL
LB
Slave replication
- Django
spof.yola.com
- Django
- MySQL - MySQL
LB
Slave replication
Drop DNS TTL
- Django
spof.yola.com
- Django
- MySQL - MySQL
LB
Slave replication
But it’s okay
Jonathan Hitchcock
@vhata
github.com/vhata