architecting for the cloud: hoping for the best, prepared for the worst

38
AWS Loft: Behind the scenes with Cotap Architecting for the Cloud: Hoping for the best, prepared for the worst.

Upload: cotap-engineering

Post on 06-Aug-2015

108 views

Category:

Engineering


4 download

TRANSCRIPT

AWS Loft: Behind the scenes with Cotap

Architecting for the Cloud:Hoping for the best, prepared for the worst.

Infrastructure as Code

Infrastructure as Code

● Current state

● Past decisions

● Tracking the evolution

● CloudFormation

● Design -> JSON

● Version Control!

Infrastructure as Code

Infrastructure as Code

Infrastructure as Code

Infrastructure as Code

Rule #1All changes have to be under Version Control

Design for automation

Design for automation

● AutoScalingGroups

● Hardware: CloudFormation

● Software: Configuration management

● Cattle not Cats

Rule #2No instances should be launched manually.

Monitoring & Alerting

Monitoring & Alerting

● Cost of○ Interruptions○ Waking somebody up

● Channels● Self-healing infrastructure● External monitoring● Page only when critical

Monitoring & Alerting

Situation Channel Page

Disk full 60% Chat, Email ✗

Disk full 90% Chat, Email, PagerDuty ✓

Chef not running for > 30m Chat, Email ✗

Redis not running for > 3 x 5s Chat, Email, PagerDuty ✓

ElasticSearch N-1 Chat, Email ✗

ElasticSearch N-2 Chat, Email, PagerDuty ✓

Monitoring & Alerting

● Cost of○ Interruptions○ Waking somebody up

● Channels● Self-healing infrastructure● External monitoring● Page only when critical

Platform to fail

Platform to fail

● Easy creation of temporary “Stacks”● Branches can get their own hardware● Clients can talk to a branch● QA happens on Sandbox● Exact copy of Production● Scale up/down based on needs● Different Region (us-east-1)

Platform to fail

Platform to fail

● Easy creation of temporary “Stacks”● Branches can get their own hardware● Clients can talk to a branch● QA happens on Sandbox● Exact copy of Production● Scale up/down based on needs● Different Region (us-east-1)

All changes have to go through Sandbox.

Rule #3

Rule #4Production is just a more powerful Sandbox

Disaster Recovery

Disaster Recovery

● Multi-AZs● Traffic routing● Multi-Regions (S3 too)● AutoScalingGroups Min:1 Max:1● Off-site backups (VPN + Disks)● RPO + RTO

Security

Security

● MFA● Public key distribution● Root key rotation● Private/Public Subnets● ACLs/Security Groups● Update AMIs● Trusted Advisor!

Security

Scaling

Scaling

● Preemptive

● Automatic

● Vertically

● Horizontally

● Bottlenecks

Scaling

Cost Control

Cost Control

● Tags○ Role○ Environment

● Cost explorer● Threshold alerting

● Share monthly● Export to CSV● Right-Scale (ASG)

Cost Control

Cost Control

Cost Control

● Tags○ Role○ Environment

● Cost explorer● Threshold alerting

● Share monthly● Export to CSV● Right-Scale (ASG)

4 rules of 5 nines.

● All changes have to be under VC

● No instance should be launched manually

● All changes are deployed to Sandbox first

● Production is just a more powerful Sandbox

Questions?t: @martincozzi

e: [email protected]