aws webinar 201: designing scalable, available & resilient cloud applications
DESCRIPTION
Applications have become a vital aspect of everyday life in nearly every part of the world. No matter where we are, we interact with applications–whether that is by using our mobile phone, withdrawing money from an automated bank machine, or even by just shopping online. Because applications have become such an integral part of our daily lives, a great deal of work has to be done to ensure that these applications remain scalable, operational and available. Cloud-native applications are designed for failure, automation, horizontal scalability, anti-fragility, security, cost-optimization and resilience. Join this session to learn best practices on how to design and implement cloud-ready, or cloud-native applications and workloads. Reasons to attend: • Learn practical design patterns and anti-patterns, do's and don'ts. • Techniques to improve your operational efficiency, cost-control, security posture, availability and scalability. Who should attend • Architects • Developers • System administrators • DevOps practitioners • CTOsTRANSCRIPT
AWS 201
Designing Scalable, Available & Resilient Cloud Applica<ons
Markku Lepistö -‐ Technology Evangelist @markkulepisto
Housekeeping
• Presentation ~45mins • Q&A using the questions panel during the
presentation • Reminder – Fill in the survey!
AWS Global Presence
10 Regions
26 Availability Zones
52 Edge Locations
SCALABLE, AVAILABLE, RESILIENT CLOUD APPLICATIONS
What your users want…
What your users want…
Fast, performant experience
What your users want…
Fast, performant experience
Always on, accessible anywhere
What your users want…
Fast, performant experience
Always on, accessible anywhere
Personalized and rich applica<on
What your users want…
Fast, performant experience
Always on, accessible anywhere
Personalized and rich applica<on
Lots of new features all of the
<me
Fast, performant experience
Lots of new features all of the
<me
Always on, accessible anywhere
Personalized and rich applica<on
Powerful cloud applica<ons
How?
Building powerful cloud applica<ons
Rule 2: Service requests as fast as possible
Rule 1: Service all requests
Rule 3: Handle requests at any scale
Rule 4: Simplify architecture with services
Rule 5: Automate opera<onal management
Rule 6: Design for failure
DNS Applica<on Data
Rule 1: Service all requests a) Make sure requests get to your ‘front door’
DNS Applica<on Data Request
Rule 1: Service all requests a) Make sure requests get to your ‘front door’
DNS Applica<on Data Request
a) Make sure requests get to your ‘front door’ Rule 1: Service all requests
DNS Applica<on Data Request
…then this is irrelevant
Clients can’t resolve you?
Rule 1: Service all requests a) Make sure requests get to your ‘front door’
DNS Applica<on Data Request
“100% Available”
SLA
Rule 1: Service all requests
Route53
Feature Details
Global Supported from AWS global edge loca<ons for fast and reliable domain name resolu<on
Scalable Automa<cally scales based upon query volumes
Latency based rouCng Supports resolu<on of endpoints based upon latency, enabling mul<-‐region applica<on delivery
Integrated Integrates with other AWS services allowing Route 53 to front load balancers, S3 and EC2
Secure Integrates with IAM giving fine grained control over DNS record access
hbp://aws.amazon.com/route53/sla
a) Make sure requests get to your ‘front door’
DNS Applica<on Data Request
Rule 1: Service all requests a) Make sure requests get to your ‘front door’ b) Make sure you open the door when they arrive
Route53
Region
DNS Applica<on Data Request
Rule 1: Service all requests
Elas<c Load
Balancer Region
Availability Zone
Availability Zone
Availability Zone
Availability Zone
Route53
a) Make sure requests get to your ‘front door’ b) Make sure you open the door when they arrive
Elas<c load balancing Mul<-‐availability zone Mul<-‐region
Region
Rule 1: Service all requests
DNS Applica<on Data Request
Region
a) Make sure requests get to your ‘front door’ b) Make sure you open the door when they arrive c) Have the data to form a response
Elas<c Load
Balancer Region
Availability Zone
Availability Zone
Availability Zone
Availability Zone
Route53
Region
Rule 1: Service all requests
DNS Applica<on Data Request
Region
Elas<c Load
Balancer
Route53
Region
Availability Zone
Availability Zone
Availability Zone
Availability Zone
a) Make sure requests get to your ‘front door’ b) Make sure you open the door when they arrive c) Have the data to form a response
Mul<-‐AZ RDS Synchronous Intra-‐region Master/Slave Asynchronous Cross-‐region Read Replicas
Rule 2: Service requests as fast as possible
Rule 1: Service all requests
Rule 3: Handle requests at any scale
Rule 4: Simplify architecture with services
Rule 5: Automate opera<onal management
Rule 6: Design for failure
Rule 2: Service requests as fast as possible
Rule 2: Service requests as fast as possible a) Choose the fastest route
Region A
Route53
Region B
Request
Rule 2: Service requests as fast as possible a) Choose the fastest route
Region A
Route53
Region B
16ms 92ms
Request
Rule 2: Service requests as fast as possible a) Choose the fastest route
Region A
Route53
Region B
16ms 92ms
Request
Rule 2: Service requests as fast as possible
Region A
Route53
Region B
16ms
Request Region A DNS entry
a) Choose the fastest route
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers
Singapore
Tokyo
Sydney
Served from S3 /images/*
3
Served from EC2 *.php
2
Single CNAME www.mysite.com
1
CloudFront World-‐wide content distribu1on network Easily distribute content to end users with low latency, high data transfer speeds, and no commitments.
Without CloudFront EC2 webservers/app servers loaded by user requests
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers
With CloudFront Load of user requests pushed into CloudFront, EC2 cluster can scale down
Offload Scale Down
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers
Rule 2: Service requests as fast as possible
Respon
se Tim
e
Server Load
Respon
se Tim
e
Server
Load
Respon
se Tim
e
Server
Load
No CDN CDN for Sta<c
Content
CDN for Sta<c & Dynamic Content
Offload Scale Down
a) Choose the fastest route b) Offload your applica<on servers
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers c) Cache it if you can
Elas<Cache Memcached and Redis compa1ble caching layer
Serve frequently requested & slow changing data from scalable cache clusters
Reduce load on database and other servers
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers c) Cache it if you can d) Single digit latencies where it mabers
Scale Database Que
ry Perform
ance
Desired consistency, predictability
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers c) Cache it if you can d) Single digit latencies where it mabers
Scale Database Que
ry Perform
ance
Desired consistency, predictability
Actual degraded
performance with scale
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers c) Cache it if you can d) Single digit latencies where it mabers
Scale Database Que
ry Perform
ance
Desired consistency, predictability
Actual degraded
performance with scale
Management problems
Data sharding Data caching Provisioning
Cluster management Fault management
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers c) Cache it if you can d) Single digit latencies where it mabers
Scale Database Que
ry Perform
ance
Dynamo DB Query Performance
Rela<onal Database Query
Performance
DynamoDB
Low latency Large scale Zero admin
Predictable performance
Rule 2: Service requests as fast as possible a) Choose the fastest route b) Offload your applica<on servers c) Cache it if you can d) Single digit latencies where it mabers
Scale Database Que
ry Perform
ance
Dynamo DB Query Performance DynamoDB
Low latency Large scale Zero admin
Predictable performance Average single-‐digit milliseconds server side
latencies
Runs on solid state drives, and is built to maintain consistent, fast latencies at any scale
Rule 2: Service requests as fast as possible
Rule 1: Service all requests
Rule 3: Handle requests at any scale
Rule 4: Simplify architecture with services
Rule 5: Automate opera<onal management
Rule 6: Design for failure
Rule 3: Handle requests at any scale a) Scale up
Ver<cal Scaling From $0.013/hr
Basic unit of compute capacity Several families of instance types available, from micro to compute, storage, memory and GPU op1mized
Scale up with Elas<c Compute Cloud (EC2)
Rule 3: Handle requests at any scale a) Scale up
measure instance resource u<liza<on under load & select opCmal instance size per applica<on <er / service
Rule 3: Handle requests at any scale a) Scale up b) Scale out
Trigger auto-scaling policy
as-create-auto-scaling-group MyGroup --launch-configuration MyConfig --availability-zones ap-southeast-1a --min-size 4 --max-size 200
Auto-‐scaling Automa1c re-‐sizing of compute clusters based upon demand
Manually
Send an API call or use CLI to launch/terminate instances – Only need to
specify capacity change (+/-‐)
By Schedule
Scale up/down based on date and <me
a) Scale up b) Scale out
By Policy
Scale in response to changing condi<ons, based on user configured real-‐<me
monitoring and alerts
Auto-‐Rebalance
Instances are automa<cally launched/terminated to ensure the applica<on is
balanced across mul<ple AZs
Rule 3: Handle requests at any scale
Manually
Send an API call or use CLI to launch/terminate instances – Only need to
specify capacity change (+/-‐)
By Schedule
Scale up/down based on date and <me Preemp<ve manual scaling of capacity
e.g. before a marke1ng event add 10 more instances
Regular scaling up and down of instances
e.g. scale from 0 to 2 to process SQS messages every night or double capacity
on a Friday night
a) Scale up b) Scale out
By Policy
Scale in response to changing condi<ons, based on user configured real-‐<me
monitoring and alerts
Auto-‐Rebalance
Instances are automa<cally launched/terminated to ensure the applica<on is
balanced across mul<ple Azs
Rule 3: Handle requests at any scale
Dynamic scale based upon custom metrics
e.g. SQS queue depth, Average CPU load, ELB latency
Maintain capacity across availability zones
e.g. Instance availability maintained in event of AZ becoming unavailable
Rule 3: Handle requests at any scale a) Scale up b) Scale out c) Dial it up
Elas<c Block Store Provisioned IOPS up to 4000 per volume, up to 48 000 per instance
Predictable performance for demanding workloads such as
databases
DynamoDB Provisioned read/write performance per table
Predictable high performance scaled via console, API or
Dynamic DynamoDB, at hYp://dynamic-‐dynamodb.readthedocs.org
Rule 3: Handle requests at any scale a) Scale up b) Scale out c) Dial it up
Dynamic DynamoDB
Rule 2: Service requests as fast as possible
Rule 1: Service all requests
Rule 3: Handle requests at any scale
Rule 4: Simplify architecture with services
Rule 5: Automate opera<onal management
Rule 6: Design for failure
Your Business
70%
On-‐Premise Infrastructure
30%
Managing All of the “Undifferen<ated Heavy Liring”
Rule 4: Simplify architecture with services
AWS Cloud-‐Based
Infrastructure
Your Business
More Time to Focus on Your Business
Configuring Your Cloud Assets
70%
30% 70%
On-‐Premise Infrastructure
30%
Managing All of the “Undifferen<ated Heavy Liring”
Rule 4: Simplify architecture with services
Enterprise Applications
Virtual Desktops Collaboration and Sharing
Platform Services
Databases
Caching
Relational
No SQL
Analytics
Hadoop
Real-time
Data Workflows
Data Warehouse
App Services
Queuing
Orchestration
App Streaming
Transcoding
Search
Deployment & Management
Containers
Dev/ops Tools
Resource Templates
Usage Tracking
Monitoring and Logs
Mobile Services
Identity
Sync
Mobile Analytics
Notifications
Foundation Services
Compute (VMs, Auto-scaling and Load Balancing)
Storage (Object, Block and Archive)
Security & Access Control Networking
Infrastructure Regions CDN and Points of Presence Availability Zones
Rule 2: Service requests as fast as possible
Rule 1: Service all requests
Rule 3: Handle requests at any scale
Rule 4: Simplify architecture with services
Rule 5: Automate opera<onal management
Rule 6: Design for failure
Compute
Storage
Security Scaling
Database
Networking Monitoring
Messaging
Workflow
DNS Load Balancing
Backup CDN
Rule 5: Automate opera<onal management a) Everything is programmable
Access everything via CLI, API or
Console
Achieve the highest levels of automa<on
sophis<ca<on with ease
Rule 5: Automate opera<onal management a) Everything is programmable b) Think disposable, one click deployments
AWS OpsWorks AWS CloudFormaCon
AWS ElasCc Beanstalk
DevOps framework for applicaCon lifecycle management and
automaCon
Templates to deploy & update infrastructure
as code
Automated resource management – web apps made easy
DIY / On Demand DIY, on demand
resources: EC2, S3, custom AMI’s, etc.
Control Convenience
Rule 5: Automate opera<onal management a) Everything is programmable b) Think disposable, one click deployments
Rule 2: Service requests as fast as possible
Rule 1: Service all web requests
Rule 3: Handle requests at any scale
Rule 4: Simplify architecture with services
Rule 5: Automate opera<onal management
Rule 6: Design for failure
Rule 5: Automate opera<onal management a) Everything is programmable b) Think disposable, one click deployments c) Design for failure, implement self healing
Customize instance startup
Get instances to ask ‘who am I?’ ques<on on startup and be configured dynamically upon
being answered
Maintain capacity of instances
Using a minimum pool size will maintain
capacity in the event of instance failures
Know what’s going on, take automated ac<ons
Use CloudWatch standard and custom metrics to create
alarms.
Respond with automated administra<on ac<ons
Bootstrapping Auto-scaling Cloud Watch
YOUR GOAL
Applications should continue to function even if the underlying HW or SW unit fails or is removed
or replaced
Avoid single points of failure. Assume everything fails, and design backwards.
Avoid single points of failure. Assume everything fails, and design backwards.
MULTI-AZ DEPLOYMENT
AWS BUILDING BLOCKS Inherently Fault-Tolerant Services Fault-Tolerant with
the right architecture ! Amazon S3
! Amazon DynamoDB
! Amazon CloudFront
! Amazon SWF
! Amazon SQS
! Amazon SNS
! Amazon SES
! Amazon Route53
! Elastic Load Balancing
! AWS IAM
! AWS Elastic Beanstalk
! Amazon ElastiCache
! Amazon EMR
! Amazon CloudSearch
! Amazon Redshift
! etc
" Amazon EC2
" Amazon EBS
" Amazon RDS
" Amazon VPC
BUILD LOOSELY COUPLED SYSTEMS
The looser the are coupled, the bigger they scale
Create independent Components
Create independent Components
Design everything as a Black Box
Create independent Components
Design everything as a Black Box
Think in terms of (Micro) Services
Services are Black Boxes Exposed via APIs
My Cool Feature
Iterate, even re-write internal
implementation
API is stable, with few changes,
potentially versioning
API
Loose Coupling Enables Scale-out and Resiliency Use Message Queues
Simple Queue Service (SQS)
Loose Coupling Enables Scale-out and Resiliency Use Idempotent Interfaces
Loose Coupling Enables Scale-out and Resiliency Use Circuit Breakers
Loose Coupling Enables Scale-out and Resiliency Use Circuit Breakers
Temporarily bypass unresponsive
service. Switch to degraded mode
transactions
Auto Scale, Load Balance, Monitor, HA Assure Each Service Separately
Statelessness Enables Scale-out Separate State and Data from Compute Instances
Load Balanced, Auto Scaling pool of EC2 Workers
Scalable Services for State and Data
ElastiCache DynamoDB S3
TEST IT
Verify your design by generating failure modes
Rule 5: Automate operational management a) Everything is programmable b) Think disposable, one click deployments c) Design for failure, implement self healing
Chaos Monkey Introduce failures
GAME DAY!
Rule 5: Automate operational management a) Everything is programmable b) Think disposable, one click deployments c) Design for failure, implement self healing
Latency Monkey Slow down dependent service responses
Rule 5: Automate operational management a) Everything is programmable b) Think disposable, one click deployments c) Design for failure, implement self healing
Conformity Monkey Detect system entropy & drift
Rule 2: Service requests as fast as possible
Rule 1: Service all requests
Rule 3: Handle requests at any scale
Rule 4: Simplify architecture with services
Rule 5: Automate opera<onal management
Rule 6: Design for failure
What your users want…
Fast, performant experience
Lots of new features all of the
<me
Always on, accessible anywhere
Personalized and rich applica<on
With AWS
Elas<c u<lity capacity
✔
Lots of new features all of the
<me
Always on, accessible anywhere
Personalized and rich applica<on
With AWS
Elas<c u<lity capacity
✔ Highly available global coverage
✔
Lots of new features all of the
<me
Personalized and rich applica<on
With AWS
Elas<c u<lity capacity
✔ Highly available global coverage
✔
Personalized and rich applica<on
Agility & automated opera<ons
✔
With AWS
Elas<c u<lity capacity
✔ Highly available global coverage
✔
Agility & automated opera<ons
✔ Cost effec<ve storage, big data &
analy<cs ✔
aws.amazon.com
get started with the free <er
Thank you
Markku Lepistö -‐ Technology Evangelist @markkulepisto
Your feedback is important
Let’s have a Poll! Let us know what you want to see next
Your feedback is important
Please complete the Survey! What’s good, what’s not
What you want to see at these events
What you want AWS to deliver for you
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT401) | AWS re:Invent 2013
AWS re:Invent 2016: Fanatics: Deploying Scalable, Self-Service Business Intelligence on AWS (BDA207)