aws summit tel aviv - startup track - architecting for high availability
TRANSCRIPT
AWS Summit 2013 Tel Aviv Oct 16 – Tel Aviv, Israel
Alex Sinner
Solutions Architect, Amazon Web Services
ARCHITECTING FOR HIGH AVAILABILITY
“LET’S BUILD
A ________ WEB
APPLICATION”
“LET’S BUILD
A HIGHLY AVAILABLE
________ WEB
APPLICATION”
“LET’S BUILD
A HIGHLY AVAILABLE
AND SCALABLE
________ WEB
APPLICATION”
“LET’S BUILD A HIGHLY AVAILABLE,
SCALABLE, AND RESILIENT
________ WEB APPLICATION”
AWS BUILDING BLOCKS Inherently Fault-Tolerant Services Fault-Tolerant with the
right architecture Amazon S3
Amazon DynamoDB
Amazon CloudFront
Amazon SWF
Amazon SQS
Amazon SNS
Amazon SES
Amazon Route53
Elastic Load Balancing
AWS IAM
AWS Elastic Beanstalk
Amazon ElastiCache
Amazon EMR
Amazon Redshift
Amazon CloudSearch
Amazon EC2
Amazon EBS
Amazon RDS
Amazon VPC
1. DESIGN FOR FAILURE
2. USE MULTIPLE AZs
3. BUILD FOR SCALE
4. DECOUPLE COMPONENTS
« Everything fails all the time »
Werner Vogels
CTO of Amazon
YOUR GOAL
APPLICATIONS SHOULD CONTINUE TO FUNCTION
EVEN IF THE UNDERLYING PHYSICAL HARDWARE
FAILS OR IS REMOVED OR REPLACED
#1 DESIGN FOR FAILURE
AVOID SINGLE POINTS OF
FAILURE
ASSUME EVERYTHING FAILS,
AND WORK BACKWARDS
AVOID SINGLE POINTS OF
FAILURE
ASSUME EVERYTHING FAILS,
AND WORK BACKWARDS
HEALTH CHECKS
#2 USE MULTIPLE
AVAILABILITY ZONES
US-WEST (N. California) EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC
(Singapore)
US-WEST (Oregon)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
ASIA PAC (Sidney)
AMAZON RDS
MULTI-AZ
#3 BUILD FOR SCALE
AMAZON
CLOUDWATCH MONITORING FOR AWS RESOURCES
AUTO SCALING SCALE UP/DOWN EC2 CAPACITY
HEALTH CHECKS
+ AUTO SCALING
HEALTH CHECKS
+ AUTO SCALING
=
SELF-HEALING
WalkMe Architecture for High Availability
© Copyright 2013 WalkMe Inc. - Confidential
The WalkMe Platform
One of a kind Platform to guide and engage
prospects, customers, employees or partners
through any Web experience
WalkMe Reduces Complexity to Empower
Advanced Selling, Support , training and
improved user experience
Using WalkMe increases conversion rates,
reduces support costs, accelerates training and
improves customer experience
No integration or changes to the underlying
website required.
© Copyright 2013 WalkMe Inc. - Confidential
Introducing the Holistic Approach to Automated Engagement
Surveys Pinpointed feedback –
right on time Search
Pinpointed to site and any
other relevant resource (such
as help desk)
Promotion Personalized
“happy b-day” “top up”
“bag for your camera?”
Announcements all or groups
“scheduled maintenance” “sale
on shirts “ “happy 4th of July”
Launchers & Permalinks Boost the effectiveness of your
existing FAQ, chat and social
support
Task List On Board new Users
Introduce new version
Analytics & Goals Straight forward measurement
and improvement of critical
paths
Segmented Display Right people – Right time
Online Support Employee Training Advanced Online Selling Improved User Experience Onboarding
Selected Customers
And many more…
The Basics
i. WalkMe customer creates WalkThrus using the WalkMe Editor.
ii. WalkMe customer adds the WalkMe JavaScript code to his website.
iii. WalkMe customer publishes the WalkThrus to his users.
iv. Our customers’ users gets WalkMe when they surf the website.
v. Our customer can access WalkMe dashboards to view usage analytics.
Challenges
• Maximum availability for client side experience (100%)
• Low latency for fetching the WalkMe files
• Very high traffic volume from our customers users (over 1B requests a
month)
• Analyzing billions of records for WalkMe analytics
Evolution – Phase 1
Problems: • Low availability • High latency • Hard to scale • Database availability
Evolution – Phase 2
Solution: • Using AWS CloudFront to
host the static files.
Problems: • High volume of analytics
causes a scaling issues and availability
• Database availability
Evolution – Phase 3
Solution: • Adding AWS RDS Multi AZ • Adding AWS Beanstalk
New Challenge: • Collection of billions of
records for BigData analytics (RDS is a bottleneck)
Evolution – Phase 4
Solution: • Analytics BigData requests
are sent to CloudFront. • Analyzing CloudFront logs
using Hadoop.
Solution
Thank You
1. DESIGN FOR FAILURE
2. USE MULTIPLE AZs
3. BUILD FOR SCALE
4. DECOUPLE COMPONENTS
#4 DECOUPLE COMPONENTS
BUILD LOOSELY
COUPLED SYSTEMS
The looser they are coupled,
the bigger they scale,
the more fault tolerant they get…
REPORT&
NOTIFY UPLOAD ANALYZE
AMAZON SQS SIMPLE QUEUE SERVICE
REPORT&
NOTIFY UPLOAD ANALYZE
REPORT&
NOTIFY UPLOAD ANALYZE
REPORT&
NOTIFY UPLOAD
REPORT&
NOTIFY UPLOAD ANALYZE
ARCHITECTURE
DESIGN PATTERN
SQS VISIBILITY TIMEOUT
BUFFERING
CLOUDWATCH METRICS FOR AMAZON SQS
+ AUTO SCALING
1. DESIGN FOR FAILURE
2. USE MULTIPLE AZs
3. BUILD FOR SCALE
4. DECOUPLE COMPONENTS
YOUR GOAL
APPLICATIONS SHOULD CONTINUE TO FUNCTION
EVEN IF THE UNDERLYING PHYSICAL HARDWARE
FAILS OR IS REMOVED OR REPLACED
AWS ARCHITECTURE CENTER http://aws.amazon.com/architecture
AWS TECHNICAL ARTICLES http://aws.amazon.com/articles
AWS BLOG http://aws.typepad.com
AWS PODCAST http://aws.amazon.com/podcast
THANK YOU!