the good parts / the hard parts
TRANSCRIPT
╔══════════════════════════════════════════╗ ║ The Good Parts / The Hard Parts ║ ║ ║ ║ Noah Zoschke ║ ║ [email protected] ║ ║ @nzoschke ║ ║ ║ ║ 03/01/2016 ║ ╚══════════════════════════════════════════╝
CONVOX
Open Source PaaS https://github.com/convox/rack
• Provision new infrastructure
• Update base operating system
• Add capacity with horizontal and vertical scaling
• Monitor health
• Handle failures automatically
• Create new apps
• Deploy new code
• Add capacity with horizontal and vertical scaling
• Configure secrets and services
• Debug problems and tune performance
• Monitor health
• Handle failures automatically
MAKE DEVOPS BORING
CONVOX OPEN SOURCE TOOLKIT ⟷ IAAS
Racks ⟷ ASG, CF, Dynamo, EC2, ECS, IAM, VPC
Apps ⟷ CF, ECS, ELB
Scale ⟷ ASG, CF, ECS
Environments ⟷ KMS, S3
Builds ⟷ ECR, S3
Logs ⟷ CloudWatch, Kinesis, Lambda
Metrics ⟷ CloudWatch Metrics
Notifications ⟷ SNS
$ convox install
___ ___ ___ __ __ ___ __ _ / ___\ / __ \ / _ \/\ \/\ \ / __ \/\ \/ \ /\ \__//\ \_\ \/\ \/\ \ \ \_/ |/\ \_\ \/> </ \ \____\ \____/\ \_\ \_\ \___/ \ \____//\_/\_\ \/____/\/___/ \/_/\/_/\/__/ \/___/ \//\/_/
Installing Convox (20160301181624-ps-docker)... Created CloudWatch Log Group: convox-629-LogGroup-15GUSB6EN2K2X Created ECS Cluster: convox-629-Cluster-MEMQU17FHAI Created VPC Internet Gateway: igw-f976db9d Created VPC: vpc-b97c50dd Created DynamoDB Table: convox-629-builds Created Kinesis Stream: convox-629-Kinesis-1W4W11098ATSZ Created DynamoDB Table: convox-629-releases Created Security Group: sg-a48528dc Created Security Group: sg-a58528dd Created Routing Table: rtb-d7fb0db0 Created Lambda Function: convox-629-CustomTopic-V5MWTXYOE3WK Created KMS Key: EncryptionKey Created VPC Subnet: subnet-5c2f8004 Created Elastic Load Balancer: convox-629 Created ECS TaskDefinition: ApiWebTasks Created ECS TaskDefinition: ApiMonitorTasks Created ECS Service: ApiMonitor Created ECS Service: ApiWeb Created AutoScalingGroup: convox-629-Instances-90LARL67DSMD Created CloudFormation Stack: convox-629 Waiting for load balancer... Logging in... Success, try `convox apps`
CLI PROVISION NEW INFRASTRUCTURE
CLI CREATE + DEPLOY APPS
$ convox apps create httpd Creating app httpd... CREATING
$ convox deploy Deploying httpd Creating tarball... OK Uploading... OK RUNNING: docker pull httpd ...
RUNNING: docker tag -f httpd httpd/web RUNNING: docker tag -f httpd/web 568149725493.dkr.ecr.us-east-1.amazonaws.com/httpd-lokxbjnlam:web.BDDAIVOGDRV RUNNING: docker push 568149725493.dkr.ecr.us-east-1.amazonaws.com/httpd-lokxbjnlam:web.BDDAIVOGDRV ...
Promoting RLDKBXUUMLV... UPDATING
$ convox apps APP STATUS myapp running
$ convox apps info Name myapp Status running Release REXIQURVKXE Processes admin web Hostname myapp-1749418666.us-east-1.elb.amazonaws.com Ports web:80 web:443 admin:9322
$ convox ps ID NAME RELEASE CPU MEM STARTED COMMAND 13254981d20 admin REXIQURVKXE 0.47% 2.21% 17 hours ago bin/admin 92d4a822c13 web REXIQURVKXE 3.29% 20.68% 17 hours ago bin/web
$ convox env PASSWORD=xyzzy
$ convox logs web: [01/Jan/2015:00:00:00] "GET / HTTP/1.1" 200 554 0.0027 web: [01/Jan/2015:00:00:00] "POST /users HTTP/1.1" 303 - 0.0049
$ convox rack update Updating to 20160220003627
CLI MANAGE EVERYTHING
$ convox api get /apps/myapp/processes [ { "app": "myapp", "command": "bin/web", "cpu": 0.0329, "host": "10.0.3.135", "id": "13254981d20", "image": "registry.internal:5000/myapp-web:BHLRYHSMXNM", "memory": 0.2068, "name": "web", "ports": [ "80:3000", "443:3001" ], "release": "REXIQURVKXE", "started": "2015-01-01T00:00:00Z" } ]
API WE DESERVE A REST FROM AWS APIS
CONVOX OPEN SOURCE TOOLKIT ⟷ IAAS
Manage ⟷ CloudFormation
Schedule ⟷ EC2 Container Service
Glue ⟷ Lambda
INFRASTRUCTURE AUTOMATIONwith CloudFormation
PARAMETERIZED INFRASTRUCTURE
→ Ami ami-c5fa5aae → InstanceCount 3 → InstanceType t2.small → Password PuDpyqGTmxBN8ziGJ9UiMfrfGZfHDG → Tenancy default → Version 20151204013151 → VolumeSize 30 → VPCCIDR 10.0.0.0/16
↑ Balancer convox AWS::ElasticLoadBalancing::LoadBalancer ↑ Cluster convox-Cluster-1JI343QBLSMYJ AWS::ECS::Cluster ↑ DynamoBuilds convox-builds AWS::DynamoDB::Table ↑ DynamoReleases convox-releases AWS::DynamoDB::Table ↑ EncryptionKey arn:aws:kms:...:key/d40c0153... Custom::KMSKey ↑ IamRole convox-IamRole-M1YZSNXNS1F7 AWS::IAM::Role ↑ Instances convox-Instances-PCWRQ6OWDWTT AWS::AutoScaling::AutoScalingGroup ↑ Kinesis convox-Kinesis-C09RDWFR8NOE AWS::Kinesis::Stream ↑ NotificationTopic arn:aws:sns:...:convox-notifications AWS::SNS::Topic ↑ Settings convox-settings-13c91daqrj90z AWS::S3::Bucket ↑ Vpc vpc-b27ff8d6 AWS::EC2::VPC
← Dashboard convox-820546104.us-east-1.elb.amazonaws.com ← Kinesis convox-Kinesis-C09RDWFR8NOE
PARAMETERIZED CONTAINERS
→ Cluster convox-Cluster-1JI343QBLSMYJ → Cpu 200 → Environment https://httpd-settings-1e3ej4u01z4bv.s3.amazonaws.com/releases/RSAQCOYHGPV/env → Key arn:aws:kms:us-east-1:901416387788:key/d40c0153-4a57-4d50-9ca0-99a974daca11 → Release RSAQCOYHGPV → VPC vpc-b27ff8d6 → WebCommand → WebDesiredCount 1 → WebImage convox-820546104.us-east-1.elb.amazonaws.com:5000/httpd-web:BQIWNCMIYZG → WebMemory 256 → WebPort80Balancer 80 → WebPort80Certificate → WebPort80Host 42563 → WebPort80Secure No
↑ Balancer httpd AWS::ElasticLoadBalancing::LoadBalancer ↑ Kinesis httpd-Kinesis-FO32SUUFLX24 AWS::Kinesis::Stream ↑ LogsAccess AKIAIFI65IDSEURPK62Q AWS::IAM::AccessKey ↑ LogsUser httpd-LogsUser-96BAE2EL9TNL AWS::IAM::User ↑ ServiceRole httpd-ServiceRole-19LN8R18BIVRW AWS::IAM::Role ↑ Settings httpd-settings-1e3ej4u01z4bv AWS::S3::Bucket ↑ WebECSService arn:aws:ecs:...:service/httpd-web-SATOEEBOQNF Custom::ECSService ↑ WebECSTaskDefinition arn:aws:ecs:...:task-definition/httpd-web:6 Custom::ECSTaskDefinition
← BalancerWebHost httpd-908645489.us-east-1.elb.amazonaws.com ← Kinesis httpd-Kinesis-FO32SUUFLX24 ← Settings httpd-settings-1e3ej4u01z4bv ← WebPort80Balancer 80
APP MANIFEST ⟷ IAAS┌──────────────────────────────────────────────────────────────────────────────────────────────────┐ │web: Task Definition httpd-web:6 │ │ command: bin/web Service httpd-web-SATOEEBOQNF │ │ build: . Docker Image httpd-web:BQIWNCMIYZG │ │ ports: │ │ - 80:80 ELB 80 : 52452 : 80 │ │ - 443:80 ELB (SSL) 443 : 52452 : 80 │ │ │ │worker: Task Definition httpd-worker:6 │ │ command: bin/worker Service httpd-worker-SHAOPEQONEF │ │ build: . Docker Image httpd-worker:BQIWNCMIYZG (same image, new tag) │ │ links: │ │ - redis REDIS_URL=rer45wxl0uj8jn6.1qae5u.ng.0001.usw2.cache.amazonaws.com:6379│ │ - rabbit RABBIT_URL=httpd-1222973998.us-west-2.elb.amazonaws.com:5672 │ │ │ │rabbit: Task Definition httpd-rabbit:6 │ │ command: rabbitmq-server Service httpd-rabbit-SPNFHGMWNUU │ │ image: rabbitmq Docker Image httpd-rabbit:BQUWNCMIYZG │ │ ports: │ │ - 5672 ELB (Internal) 5672 : 24324 : 5672 │ │ │ │redis: │ │ image: convox/redis AWS::ElastiCache::CacheCluster │ └──────────────────────────────────────────────────────────────────────────────────────────────────┘
GLUELambda
CLOUDFORMATION LAMBDA CUSTOM RESOURCES
┌─────────────────────────────────────┐ │POST arn:aws:lambda:... │ │{ │ │ ResourceProperties: { │ │ Description: "Master Encryption",│ ┌─────────────────────────────────────┐ ┌───────────────────────────┐ │ KeyUsage: "ENCRYPT_DECRYPT" │ │aws kms create-key \│ │200: OK │ │ } │ │ --description "Master Encryption" \│ │400: LimitExceededException│ │} │ │ --key-usage ENCRYPT_DECRYPT │ │500: KMSInternalException │ └─────────────────────────────────────┘ └─────────────────────────────────────┘ └───────────────────────────┘ ┌────────────────┐ ┌──────────────┐──────────────────────▶┌───────────┐ │ CloudFormation │──────────────────────▶│ Lambda │ │AWS KMS API│ └────────────────┘ CREATE_IN_PROGRESS └──────────────┘◀──────────────────────└───────────┘ ▲ │ │ │ │ CREATE_COMPLETE │ │ OR ▼ │ CREATE_FAILED ┌─────────────┐ └────────────────────────────────│ S3 │ └─────────────┘
• Writing templates
• DependsOn
• Transient internal errors
• UPDATE_ROLLBACK_FAILED and DELETE_FAILED
• Migrating custom resources to native resources
• Debugging Lambda
• Sitting helpless during a Lambda outage
• Waiting for things to provision
THE HARD PARTS CLOUDFORMATION + LAMBDA
THE HARD PARTS 100% CORRECTNESS
2800+ test clusters across 3 regions...
THE GREAT PARTS$ convox rack update
$ convox rack scale --type c3.xlarge --count 10
$ convox rack update <previous release>
• Update convox API quickly
• Update cluster AMIs one at a time and with zero downtime
• Resize instances one at a time and with zero downtime
• Roll out new subsystems like ECR, CloudWatch Logs and NAT Gateways
• Fail towards not modifying working infrastructure
• Roll back to previous good state if something truly unexpected happens
CONTAINER AUTOMATIONECS
BATTERIES NOT INCLUDEDAPI
• Clusters
• TaskDefinitions
• Tasks
• Services
Bring Your Own
• Instances
• ecs-agent
• Load Balancers
• Logging
• Builds / Images
• Tools...
SCALING ONE APP ⟶ MANY SERVICES
Service Name Task Definition Desired Running ═══════════════════════════════════════════════════════════════════════════ myapp-clock-SVQQEUPGZPS myapp-clock:106 1 1 myapp-scheduler-SSMOCJRAGOM myapp-scheduler:183 1 1 myapp-web-SLHARAVBAWZ myapp-web:119 2 2 myapp-runner-SEGBMHLWREH myapp-runner:163 4 4
DEBUGGING RUN, EXEC, SSH OVER WEB SOCKETS
$ convox run web bash root@3e4160f0c4d0:/app#
$ convox ps ID NAME RELEASE CPU MEM STARTED COMMAND 551967b75abd web RHQZEJZFCSD 0.39% 21.04% 2 hours ago rails server -b 0.0.0.0 f5ec95c38f58 worker RHQZEJZFCSD 0.00% 30.35% 2 hours ago sidekiq
$ convox exec 551967b75abd bash root@281d0a9c33a:/app#
$ convox exec 551967b75abd ps ax PID USER TIME COMMAND 1 root 0:00 sh -c bin/web 6 root 0:00 {web} /bin/sh bin/web 9 root 0:00 unicorn master -c unicorn.rb 11 root 0:00 unicorn worker[0] -c
GLUELambda
APP LOGS AGENT, DOCKER APIS, KINESIS, LAMBDA
┌──────────────────────────────────────────────────────────┐ ┌──────────────────┐ │ EC2 Instance in ECS Cluster │ │ app1 Kinesis │ │ │ │ ┌────────┐ │ ┌───────────────────────────────────────────┐ │ ┌──────────────┐ ┌──────────────────────────────────┐ │ ┌─┼───▶│shard 1 │ │──┐ │ Lambda w/ EventSourceMapping │ │ │ │ │ │ │ │ │ └────────┘ │ │ │ ┌──────────────────────────────────────┐ │ │ │ │ │ │ │ │ └──────────────────┘ │ │ │function(event, context) { │ │ │ │ app1 │ │ app2 │ │ │ │ │ │ event.records.forEach(function(r) { │ │ │ │ web.1 │ │ worker.1 │ │ │ │ │ │ winston.info(r.kinesis.data) │ │ │ │ │ │ │ │ │ └─┼▶│ }) │──┼────────▶┌───────────────┐ │ │ │ │ │ │ │ │ │ context.done() │ │ │ │ │ └──────────────┘ └──────────────────────────────────┘ │ │ ┌──────────────────┐ │ │} │ │ │ │ │ │ │ │ │ │ app2 Kinesis │ │ │ │ │ │ │ │ │ ┌─────────────────────┘ │ │ │ ┌────────┐ │ │ └──────────────────────────────────────┘ │ │ Syslog Server │ │ ▼ ▼ │ │ │ ┌─▶│shard 1 │ │ │ ┌────────────────────────────────┐ │ │ │ │ ┌────────────┐ ┌────────────┐─────────────┼───┘ │ │ └────────┘ │ │ │function(event, context) { ... }│──┼────────▶│ │ │ │ dockerd │◀─────────────│convox/agent│─────────────┼─────┼─┘ ┌────────┐ │ │ └────────────────────────────────┘ │ │ │ │ └────────────┘ └────────────┘─────────────┼─────┼───▶│shard 2 │ │ │ ┌────────────────────────────────┐ │ │ │ │ ▲ ┌────────────────────────────────────┐ │ │ └────────┘ │────┼─▶│function(event, context) { ... }│───────┼────────▶└───────────────┘ │ │ │GET docker /events (create) │ │ │ . │ │ └────────────────────────────────┘ │ │ ▼ │ GET ENV "Kinesis", "Process"│ │ │ . │ │ │ │ ┌────────────┐ │ GET Docker /logs?follow=1 │ │ │ . │ └───────────────────────────────────────────┘ │ │ ecs-agent │ │ PUT Kinesis /records │ │ │ ┌────────┐ │ │ └────────────┘ └────────────────────────────────────┘ │ │ │shard N │ │ │ │ │ └────────┘ │ └──────────────────────────────────────────────────────────┘ └──────────────────┘
• Setting it all up: VPC, ASG, ELBs, health checks
• Managing instances
• Understanding its distributed state machine
• Rolling deploys
• Container scheduling and re-scheduling
• Capacity problems
• Collecting and making sense of logs and events
THE HARD PARTS ECS
• CloudFormation updates
• ECS Task Definition and Service updates
• On-instance observations
• ecs-agent
• dockerd
• convox/agent
• App failures
• crashes
• port unresponsive
• Instance failures
• filesystem lockups
• kernel panics
• General EC2 / ASG health
THE HARD PARTS COMPLEX INTERACTIONS AND FEEDBACK LOOPS
ecs-agent dockerd ecs-agent dockerd ecs-agent dockerd
api128 MB
registry256 MB
rails web.21024 MB
data worker.1512 MB
rails web.31024 MB
data worker.2512 MB
rails worker.2256 MB
rails worker.3256 MB
rails web.11024 MB
rails worker.1256 MB
rails worker.4256 MB
ECS
ASG
api ELB rails ELB
THE HARD PARTS CONTAINERS EXERCISE NEW KERNEL, NETWORK,
FILESYSTEM PATHS
THE GREAT PARTS$ convox deploy
• Configure desired container formation with one API call
• Watch extremely sophisticated automation execute it
• Assure new containers start and are healthy
• Drain old containers
• Trust automation will try its hardest to keep it running
• Re-schedule on observed failures
• Provision new infrastructure
• Update base operating system
• Add capacity with horizontal and vertical scaling
• Monitor health
• Handle failures automatically
• Create new apps
• Deploy new code
• Add capacity with horizontal and vertical scaling
• Configure secrets and services
• Debug problems and tune performance
• Monitor health
• Handle failures automatically
CONVOX MAKE DEVOPS BORING
[email protected] @nzoschke
Discuss these techniques and get involvedGitHub https://github.com/convox Slack http://invite.convox.com/
_ _ _ _ | |_| |__ __ _ _ __ | | _____| | | __| '_ \ / _` | '_ \| |/ / __| | | |_| | | | (_| | | | | <\__ \_| \__|_| |_|\__,_|_| |_|_|\_\___(_)
(we are hiring)