© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Development Workflows
with Docker and Amazon ECS Jon Todd, Chief Architect, Okta
Tim Secor, Manager of Developer Productivity, Okta
Danielle Greshock, Manager, Solutions Architecture, AWS
CON302
December 1, 2016
What to Expect from the Session
• Review the CI/CD Pipeline
• How would you use containers with CI/CD?
• Okta Engineering: How they work and ship code
• CI with Docker and ECS
The Continuous Everything… Nirvana
Goal Design Develop Deploy TestRun and
monitor
Continuous integration
Continuous delivery
Continuous deployment
Continuous feedback
Virtual machine Container
Why Use Containers for Continuous Delivery?
• Roll out features as quickly as possible
• Predictable and reproducible environment
• They are immutable! They will run the same in every
environment
• Fast feedback
The Lifecycle:
Stage 1 – Source
Docker and Docker Toolbox
• Docker (Linux > 3.10)
• Docker Toolbox or Docker Beta (OS X, Windows)
• Define app environment with Dockerfile
Dockerfile
FROM ruby:2.2.2
RUN apt-get update -qq && apt-get install -y build-
essential libpq-dev
RUN mkdir -p /opt/web
WORKDIR /tmp
ADD Gemfile /tmp/
ADD Gemfile.lock /tmp/
RUN bundle install
ADD . /opt/web
WORKDIR /opt/web
Docker Compose
Define and run multi-container applications:
1. Define app environment with Dockerfile
2. Define services that make up your app in docker-
compose.yml
3. Run docker-compose up to start and run entire app
The Lifecycle:
Stage 2 – Build
Containers as Build Execution Environment
Containers as Build Artifacts
Amazon EC2 Container Registry
• Security
• IAM resource-based policies
• CloudTrail audit logs
• Images encrypted at transit and at rest
• Easily manage & deploy images
• Tight integration with ECS
• Integration with Docker toolset
• AWS Management Console & AWS CLI
• Reliability & performance
• S3-backed
The Lifecycle:
Stage 3 – Test
Running Tests Inside a Container
Usual Docker commands available within your test
environment
Run the container with the commands necessary to
execute your tests, e.g.:
docker run web bundle exec rake test
Running Tests Against a Container
Start a container running in detached mode with an
exposed port serving your app
Run browser tests or other black box tests against the
container, e.g., headless browser tests
The Lifecycle:
Stage 4 – Deploy
Amazon EC2 Container Service
• Highly scalable container management service
• Easily manage clusters for any scale
• Flexible container placement
• Integrated with other AWS services
• Extensible
• ECS concepts
• Cluster and container instances
• Task definition and task
AWS Elastic Beanstalk
• Deploy and manage applications without worrying about
the infrastructure
• Elastic Beanstalk manages your database, Elastic Load
Balancing, ECS cluster, monitoring, and logging
• Docker support
• Single container (on EC2)
• Multi container (on ECS)
Amazon ECS CLI
• Easily create ECS clusters & supporting resources
such as EC2 instances
• Run Docker Compose configuration files on ECS
• Available today – http://amzn.to/1jBf45a
Continuous Delivery
Workflows
Continuous Delivery To ECS with Jenkins
4. Push image to
Docker registry
2. Build image from
sources 3. Run test on image
1. Code push
triggers build
5. Update service
6. Pull image
Continuous Delivery To ECS with Jenkins
Easy deployment
Developers – Merge into master, done!
Jenkins build steps
Trigger via webhooks, monitoring, Lambda
Build Docker image via Build and Publish plugin
Push Docker image into registry
Register updated job with ECS API
Continuous Delivery To ECS with CodePipeline
1. Code push
triggers pipeline
2. Lambda function
creates EC2 instance
3. Image is built and
pushed to ECR
4. Lambda function
terminates EC2 instance
5. Lambda function
deploy new task
revision to ECS
Continuous Delivery To ECS with CodePipeline
• Lambda custom actions
• Create and terminate EC2 instance
• Update ECS service
• EC2 instance uses user data to build an image and push
it to ECR
Continuous Delivery To ECS with Shippable
About Okta
Millions of People Use Okta Every DayMillions of People Use Okta Every Day
An identity platform for developers
1. Connect to any data source
© Okta and/or its affiliates. All rights reserved.
An identity platform for developers
2. Customizable login w/ MFA
© Okta and/or its affiliates. All rights reserved.
An identity platform for developers
3. Support all application types w/
modern identity standards
© Okta and/or its affiliates. All rights reserved.
An identity platform for developers
Learn more at: developer.okta.com
The case for ECS & Docker
The problem
Inspired by: http://dev2ops.org/2010/02/what-is-devops/
Dev OpsWall of turmoil
Dev Ops
I want stabilityI want change
Domain boundary
Container frameworks
Cluster schedulerDev Ops
Continuous integration
© Okta and/or its affiliates. All rights reserved. Okta Confidential
Options
Container frameworks Cluster schedulers
Amazon ECSLXC
Okta’s CI with ECS
Okta Engineering
Okta Engineering—How Do We Work, How Do
We Ship Our Code?
• 200 engineers, split into teams with embedded
specialists
• 1 week sprints, and deploy to production weekly
• Capability to do more than one hotfix per day at
customers’ request or for bugs found in CI or pre-prod
• Every merge to master is a potential release candidate
Okta Engineering—How Do We Test Our
Code?
• Every topic branch goes through the same amount of
vigor in testing as release candidates.
• Passing automated tests is enforced at commit time.
• Largest repo: 33K tests, takes 60 minutes (22 parallel
runs)
• Smallest repo: 100 tests, 5 minutes
• The Developer Productivity team is responsible for
supporting engineering.
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
Developers expect fast turn-
around time and reliable results
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
We need to run all the tests
required to guarantee quality
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
We need to run an
infrastructure which is as cost-
effective as possible
Challenge of Developer Productivity Team
• Developer experience
• Quality
• Cost
• Cloud first
We aim to use cloud services
first, wherever possible
Problems
CI Using Open Source, Monolithic Applications
Vision
Vision
• Clean testing environments
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Vision
• Clean testing
environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Isolate test environments from
others, parallel and serial runs
Vision
• Clean testing environments
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Workers should survive the
loss of their build server
Worker pool should scale
quickly
Number of workers should not
affect memory footprint of build
server
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Run our services for cheaper
rates, as we have many short
lived tasks, and could certainly
handle a few failures
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned Testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Enable testing of infrastructure
changes in topic branches
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Should survive build server
reboots
Shouldn’t be tied to specific
workers or build servers
Centralized
Should have good visibility
Re-queuing of lost tasks
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure
flakiness
• The correct privileges, to
maintain security
Push testing and creation of
test machines to developers
Vision
• Clean testing environment
• Dynamic worker scaling
• Spot Instances for cost
• Versioned testing
• Improved queuing system
• Less infrastructure flakiness
• The correct privileges, to
maintain security
Launch tasks in secure
environments
Solutions
Custom Reporting
ECS and Docker
• AWS + Java app tailored to Okta process
• Immutable and disposable build workers—created for
one-time use, destroyed when job is done
• Near ZERO cost on weekends, scales with load
• ECS allows us to maximize usage of EC2 instances
• Same containers for multiple types and numbers of
builds
• Same AMI can run multiple Docker images
Amazon ECS
IAM separation per service
• Either service per cluster or use new IAM for ECS functionality
Sharing the docker daemon to allow running Docker within
Docker
Pre-fetching large data blobs and making them available
on the hosts is an option
Multiple containers: mysql, redis, kinesilite
Docker Update
• Update Dockerfile and our CI system builds the new image,
uploading it to our repository
• Update task definition for cluster updates
Docker Conventions
• Dockerfiles live with project code, versioned together
• docker-compose used for development, so a clone plus
build will have a full service running locally
• Single repo for library and third-party service definitions
• Secrets or any form of config NEVER baked in
containers
• Start from minimal, audited base OS
• Strict rules around “FROM” clause
• Build owns creating immutable version and publishing
Docker Build Process
Task Definitions
{
"taskDefinitionArn": "arn:aws:ecs:us-east-1:262205085595:task-definition/base-container-box-task:1",
"containerDefinitions": [
{
"memory": 15000,
"essential": true,
"mountPoints": [
{
"containerPath": "/usr/bin/docker",
"sourceVolume": "docker_daemon",
"readOnly": null
},
{
"containerPath": "/var/run/docker.sock",
"sourceVolume": "docker_socket",
"readOnly": null
}
Task Definitions
],
}
],
"volumes": [
{
"host": {
"sourcePath": "/var/run/docker.sock"
},
"name": "docker_socket"
},
{
"host": {
"sourcePath": "/usr/bin/docker"
},
"name": "docker_daemon"
}
],
"family": "base-container-box-task”
Clean Testing Environments
• Docker images
• Nearly instant machine refresh
• Easy for users to create and upload images that have
been tested to work locally
• Efficient machine use
• ECS with ECR and private repository back end
Dynamic Worker Scaling
SQS LambdaSNS
Lambda
Scaling
Bin packing
ECS
Dynamic Worker Scaling
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Dynamic Worker Scaling
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Dynamic Worker Scaling
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Dynamic Worker Scaling
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Dynamic Worker Scaling`
Lambda allocates jobs using bin packing
This is one of the changes we had to make in order to use
ECS for long running tasks, rather than services spread
across many stateless instances
Disconnects unneeded nodes from cluster, allowing
themselves to self-terminate when they are idle
VS
Dynamic Worker Scaling
Spot Instances
• We use Spot Instances across all Availability Zones
• Manually switch between On-Demand and Spot
Instances 3 times per week during Spot price spikes
• We are planning on moving to Spot Fleet soon
• Set pricing to On-Demand prices, we lose build slaves
whenever pricing goes above On-Demand prices
• 4000-6000 instance hours per day, about 1500 Spot
losses per week
Spot Instances
Spot Instances
Spot Instances
Versioned Jobs
Scripts checked into repositories Makes a transition to Docker jobs
easy
Versioned Jobs with ECS
• Versioned build and test scripts can now be run in
versioned Docker containers, using versioned task
definitions
• Creates extreme flexibility
• CloudFormation allows us to stand up whole new
clusters with all different versions in a matter of minutes
for long term testing
ECS + Docker Problems
• Docker containers not launching
• ECS agent failing
• Docker containers stopping
• Incompatibility with certain services
• Docker OS availability
• Cleanup - AWS has made this configurable
• Image size
Amazon Web Services
EC2
SQS
LambdaECS S3
RDS
Amazon
KinesisSpot Instances
ECR
CloudFormation
SNS
CloudWatch
CloudTrail
Building CI with Amazon Web Services
Future
Expand Use
• Use ECS for more services
• Allow developers to control their test suites and Docker
images more directly
• Developer environments
• Use Docker for local long running services
• Use a VM running the same version OS
• Remote updates to keep it in line with CD system
• Aim to enable running CD containers right out of the box
ECS Services In Production
© Okta and/or its affiliates. All rights reserved.
Requirements
• Support for our multi-AZ & multi-region architecture
• Compliance – SOC2 type 2, HIPAA, ISO 27001, FedRAMP
• Least-privilege principle - independent IAM roles per service
• Host to host encryption
• Deployment support for:
• Rollback
• Canary
• Blue-green
• 0-downtime deployments
0-Downtime Testing
https://github.com/jontodd/aries
© Okta and/or its affiliates. All rights reserved. Okta Confidential
Test Assumptions
• ECS config• Agent version 1.11.0
• Docker version 1.11.2
• Cluster config• 8 instances backed by ASG
• ASG config• 8 instances across 3 AZs• Default termination policy
• 5 min health check grace period
• ELB• Timeout 4s• Interval 5s
• Unhealthy threshold 2• Healthy threshold 10
• Enable connection draining 300s timeout
• Load generation
• 16 threads
• Throughput
• Interactive ➔ 490 r/s
• 10s long poll ➔ 1.5 r/s
© Okta and/or its affiliates. All rights reserved. Okta Confidential 89
Operation Interactive Errors
(~70ms latency, 490rps)
Long Poll Errors
(~10s latency, 1.5rps)
Upsize ECS service 4 → 8 0 0
Downsize ECS service 8 → 4 0 0
Deploy ECS service – 50% min healthy 0 0
Stop task* 0 0
Downsize Auto Scaling group 0 0
Terminate EC2 instance 0 0
Stop Docker daemon (service docker stop)* 0 0
Stop EC2 instance** 0 0
Kill Docker container (docker kill <containerId>)* 2 2
Fail health check 450 5
* No intention of running operation in practice ** Caused inconsistent state
Workflow
Auto Scaling group
Launch config
EC2
ECS cluster
ECS
serviceECS canary
serviceApplication YAML
Docker Registry
(Artifactory)
ELB
Images pulled
when tasks start
Conductor
(Bastion ECS controller)
CI Pipeline
Git repo
Promoted artifactsDockerfile
docker_compose.yml
Test / Preview / ProductionDev
Deploy new version
© Okta and/or its affiliates. All rights reserved. Okta Confidential
Application definition
• Developers define YAML for
their application
• Deploy time configuration is
supplied to the ECS task
definition
• Secrets are pulled by the
application at startup
Demo
© Okta and/or its affiliates. All rights reserved.
Feature requests
• Dynamic port mapping (Application load balancing)
• Service autoscaling
• Per container IAM roles
• Per-container security groups
• Bin-packing scheduler
© Okta and/or its affiliates. All rights reserved.
Lessons learned
• /etc/ecs/ecs.config• ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION for forensics (default 1hr)
• ECS_LOGLEVEL=debug
• Tune ELB health check
• Docker 1.10 for security enhancements
• Canary & blue/green separate service attached to same ELB
• ECS is incredibly easy to get up and running
• The ecosystem is changing quickly
Remember to complete
your evaluations!