aws webinar 201: designing scalable, available & resilient cloud applications

AWS 201

Designing Scalable, Available & Resilient Cloud Applica<ons

Markku Lepistö -‐ Technology Evangelist @markkulepisto

Housekeeping

•  Presentation ~45mins •  Q&A using the questions panel during the

presentation •  Reminder – Fill in the survey!

AWS Global Presence

10 Regions

26 Availability Zones

52 Edge Locations

SCALABLE, AVAILABLE, RESILIENT CLOUD APPLICATIONS

What your users want…


Fast, performant experience



Always on, accessible anywhere




Personalized and rich applica<on





Lots of new features all of the

<me



<me



Powerful cloud applica<ons

Building powerful cloud applica<ons

Rule 2: Service requests as fast as possible

Rule 1: Service all requests

Rule 3: Handle requests at any scale

Rule 4: Simplify architecture with services

Rule 5: Automate opera<onal management

Rule 6: Design for failure

DNS Applica<on Data

Rule 1: Service all requests a) Make sure requests get to your ‘front door’

DNS Applica<on Data Request



a) Make sure requests get to your ‘front door’ Rule 1: Service all requests


…then this is irrelevant

Clients can’t resolve you?



“100% Available”

SLA


Route53

Feature Details

Global Supported from AWS global edge loca<ons for fast and reliable domain name resolu<on

Scalable Automa<cally scales based upon query volumes

Latency based rouCng Supports resolu<on of endpoints based upon latency, enabling mul<-‐region applica<on delivery

Integrated Integrates with other AWS services allowing Route 53 to front load balancers, S3 and EC2

Secure Integrates with IAM giving fine grained control over DNS record access

hbp://aws.amazon.com/route53/sla

a) Make sure requests get to your ‘front door’


Rule 1: Service all requests a)  Make sure requests get to your ‘front door’ b)  Make sure you open the door when they arrive

Route53

Region



Elas<c Load

Balancer Region

Availability Zone

Availability Zone

Availability Zone

Availability Zone

Route53

a)  Make sure requests get to your ‘front door’ b)  Make sure you open the door when they arrive

Elas<c load balancing Mul<-‐availability zone Mul<-‐region

Region



Region

a)  Make sure requests get to your ‘front door’ b)  Make sure you open the door when they arrive c)  Have the data to form a response

Elas<c Load

Balancer Region

Availability Zone

Availability Zone

Availability Zone

Availability Zone

Route53

Region



Region

Elas<c Load

Balancer

Route53

Region

Availability Zone

Availability Zone

Availability Zone

Availability Zone

a)  Make sure requests get to your ‘front door’ b)  Make sure you open the door when they arrive c)  Have the data to form a response

Mul<-‐AZ RDS Synchronous Intra-‐region Master/Slave Asynchronous Cross-‐region Read Replicas

Rule 2: Service requests as fast as possible a)  Choose the fastest route

Region A

Route53

Region B

Request

Rule 2: Service requests as fast as possible a)  Choose the fastest route

Region A

Route53

Region B

16ms 92ms

Request


Region A

Route53

Region B

16ms

Request Region A DNS entry

a)  Choose the fastest route

Rule 2: Service requests as fast as possible a)  Choose the fastest route b)  Offload your applica<on servers

Singapore

Tokyo

Sydney

Served from S3 /images/*

3

Served from EC2 *.php

2

Single CNAME www.mysite.com

1

CloudFront World-‐wide content distribu1on network Easily distribute content to end users with low latency, high data transfer speeds, and no commitments.

Without CloudFront EC2 webservers/app servers loaded by user requests


With CloudFront Load of user requests pushed into CloudFront, EC2 cluster can scale down

Offload Scale Down



Respon

se Tim

e

Server Load

Respon

se Tim

e

Server

Load

Respon

se Tim

e

Server

Load

No CDN CDN for Sta<c

Content

CDN for Sta<c & Dynamic Content

Offload Scale Down

a)  Choose the fastest route b)  Offload your applica<on servers

Rule 2: Service requests as fast as possible a)  Choose the fastest route b)  Offload your applica<on servers c)  Cache it if you can

Elas<Cache Memcached and Redis compa1ble caching layer

Serve frequently requested & slow changing data from scalable cache clusters

Reduce load on database and other servers

Rule 2: Service requests as fast as possible a)  Choose the fastest route b)  Offload your applica<on servers c)  Cache it if you can d)  Single digit latencies where it mabers

Scale Database Que

ry Perform

ance

Desired consistency, predictability


Scale Database Que

ry Perform

ance


Actual degraded

performance with scale


Scale Database Que

ry Perform

ance


Actual degraded

performance with scale

Management problems

Data sharding Data caching Provisioning

Cluster management Fault management


Scale Database Que

ry Perform

ance

Dynamo DB Query Performance

Rela<onal Database Query

Performance

DynamoDB

Low latency Large scale Zero admin

Predictable performance


Scale Database Que

ry Perform

ance

Dynamo DB Query Performance DynamoDB

Low latency Large scale Zero admin

Predictable performance Average single-‐digit milliseconds server side

latencies

Runs on solid state drives, and is built to maintain consistent, fast latencies at any scale

Rule 3: Handle requests at any scale a)  Scale up

Ver<cal Scaling From $0.013/hr

Basic unit of compute capacity Several families of instance types available, from micro to compute, storage, memory and GPU op1mized

Scale up with Elas<c Compute Cloud (EC2)

Rule 3: Handle requests at any scale a)  Scale up

measure instance resource u<liza<on under load & select opCmal instance size per applica<on <er / service

Rule 3: Handle requests at any scale a)  Scale up b)  Scale out

Trigger auto-scaling policy

as-create-auto-scaling-group MyGroup --launch-configuration MyConfig --availability-zones ap-southeast-1a --min-size 4 --max-size 200

Auto-‐scaling Automa1c re-‐sizing of compute clusters based upon demand

Manually

Send an API call or use CLI to launch/terminate instances – Only need to

specify capacity change (+/-‐)

By Schedule

Scale up/down based on date and <me

a)  Scale up b)  Scale out

By Policy

Scale in response to changing condi<ons, based on user configured real-‐<me

monitoring and alerts

Auto-‐Rebalance

Instances are automa<cally launched/terminated to ensure the applica<on is

balanced across mul<ple AZs


Manually

Send an API call or use CLI to launch/terminate instances – Only need to

specify capacity change (+/-‐)

By Schedule

Scale up/down based on date and <me Preemp<ve manual scaling of capacity

e.g. before a marke1ng event add 10 more instances

Regular scaling up and down of instances

e.g. scale from 0 to 2 to process SQS messages every night or double capacity

on a Friday night

a)  Scale up b)  Scale out

By Policy

Scale in response to changing condi<ons, based on user configured real-‐<me

monitoring and alerts

Auto-‐Rebalance

Instances are automa<cally launched/terminated to ensure the applica<on is

balanced across mul<ple Azs


Dynamic scale based upon custom metrics

e.g. SQS queue depth, Average CPU load, ELB latency

Maintain capacity across availability zones

e.g. Instance availability maintained in event of AZ becoming unavailable

Rule 3: Handle requests at any scale a)  Scale up b)  Scale out c)  Dial it up

Elas<c Block Store Provisioned IOPS up to 4000 per volume, up to 48 000 per instance

Predictable performance for demanding workloads such as

databases

DynamoDB Provisioned read/write performance per table

Predictable high performance scaled via console, API or

Dynamic DynamoDB, at hYp://dynamic-‐dynamodb.readthedocs.org

Rule 3: Handle requests at any scale a)  Scale up b)  Scale out c)  Dial it up

Dynamic DynamoDB

Your Business

70%

On-‐Premise Infrastructure

30%

Managing All of the “Undifferen<ated Heavy Liring”


AWS Cloud-‐Based

Infrastructure

Your Business

More Time to Focus on Your Business

Configuring Your Cloud Assets

70%

30% 70%

On-‐Premise Infrastructure

30%

Managing All of the “Undifferen<ated Heavy Liring”


Enterprise Applications

Virtual Desktops Collaboration and Sharing

Platform Services

Databases

Caching

Relational

No SQL

Analytics

Hadoop

Real-time

Data Workflows

Data Warehouse

App Services

Queuing

Orchestration

App Streaming

Transcoding

Email

Search

Deployment & Management

Containers

Dev/ops Tools

Resource Templates

Usage Tracking

Monitoring and Logs

Mobile Services

Identity

Sync

Mobile Analytics

Notifications

Foundation Services

Compute (VMs, Auto-scaling and Load Balancing)

Storage (Object, Block and Archive)

Security & Access Control Networking

Infrastructure Regions CDN and Points of Presence Availability Zones

Compute

Storage

Security Scaling

Database

Networking Monitoring

Messaging

Workflow

DNS Load Balancing

Backup CDN

Rule 5: Automate opera<onal management a)  Everything is programmable

Access everything via CLI, API or

Console

Achieve the highest levels of automa<on

sophis<ca<on with ease

Rule 5: Automate opera<onal management a)  Everything is programmable b)  Think disposable, one click deployments

AWS OpsWorks AWS CloudFormaCon

AWS ElasCc Beanstalk

DevOps framework for applicaCon lifecycle management and

automaCon

Templates to deploy & update infrastructure

as code

Automated resource management – web apps made easy

DIY / On Demand DIY, on demand

resources: EC2, S3, custom AMI’s, etc.

Control Convenience

Rule 5: Automate opera<onal management a)  Everything is programmable b)  Think disposable, one click deployments


Rule 1: Service all web requests





Rule 5: Automate opera<onal management a)  Everything is programmable b)  Think disposable, one click deployments c)  Design for failure, implement self healing

Customize instance startup

Get instances to ask ‘who am I?’ ques<on on startup and be configured dynamically upon

being answered

Maintain capacity of instances

Using a minimum pool size will maintain

capacity in the event of instance failures

Know what’s going on, take automated ac<ons

Use CloudWatch standard and custom metrics to create

alarms.

Respond with automated administra<on ac<ons

Bootstrapping Auto-scaling Cloud Watch

YOUR GOAL

Applications should continue to function even if the underlying HW or SW unit fails or is removed

or replaced

Avoid single points of failure. Assume everything fails, and design backwards.

MULTI-AZ DEPLOYMENT

AWS BUILDING BLOCKS Inherently Fault-Tolerant Services Fault-Tolerant with

the right architecture !  Amazon S3

!  Amazon DynamoDB

!  Amazon CloudFront

!  Amazon SWF

!  Amazon SQS

!  Amazon SNS

!  Amazon SES

!  Amazon Route53

!  Elastic Load Balancing

!  AWS IAM

!  AWS Elastic Beanstalk

!  Amazon ElastiCache

!  Amazon EMR

!  Amazon CloudSearch

!  Amazon Redshift

!  etc

"  Amazon EC2

"  Amazon EBS

"  Amazon RDS

"  Amazon VPC

BUILD LOOSELY COUPLED SYSTEMS

The looser the are coupled, the bigger they scale

Create independent Components


Design everything as a Black Box


Design everything as a Black Box

Think in terms of (Micro) Services

Services are Black Boxes Exposed via APIs

My Cool Feature

Iterate, even re-write internal

implementation

API is stable, with few changes,

potentially versioning

API

Loose Coupling Enables Scale-out and Resiliency Use Message Queues

Simple Queue Service (SQS)

Loose Coupling Enables Scale-out and Resiliency Use Idempotent Interfaces

Loose Coupling Enables Scale-out and Resiliency Use Circuit Breakers

Loose Coupling Enables Scale-out and Resiliency Use Circuit Breakers

Temporarily bypass unresponsive

service. Switch to degraded mode

transactions

Auto Scale, Load Balance, Monitor, HA Assure Each Service Separately

Statelessness Enables Scale-out Separate State and Data from Compute Instances

Load Balanced, Auto Scaling pool of EC2 Workers

Scalable Services for State and Data

ElastiCache DynamoDB S3

TEST IT

Verify your design by generating failure modes

Rule 5: Automate operational management a)  Everything is programmable b)  Think disposable, one click deployments c)  Design for failure, implement self healing

Chaos Monkey Introduce failures

GAME DAY!


Latency Monkey Slow down dependent service responses


Conformity Monkey Detect system entropy & drift




<me



With AWS

Elas<c u<lity capacity

✔


<me



With AWS


✔ Highly available global coverage

✔


<me


With AWS



✔


Agility & automated opera<ons

✔

With AWS



✔

Agility & automated opera<ons

✔ Cost effec<ve storage, big data &

analy<cs ✔

aws.amazon.com

get started with the free <er

Thank you

Markku Lepistö -‐ Technology Evangelist @markkulepisto

Your feedback is important

Let’s have a Poll! Let us know what you want to see next

Your feedback is important

Please complete the Survey! What’s good, what’s not

What you want to see at these events

What you want AWS to deliver for you

aws webinar 201: designing scalable, available & resilient cloud applications

Internet

launch terminate

automate operational

performant

fastest route

performant

aws elas

fastest route

automate opera