scaling a mobile web app to 100 million clients and beyond (mbl302) | aws re:invent 2013
DESCRIPTION
Mobile apps have different service requirements from their desktop and web-based analogs. Bandwidth, client processing, and other considerations can impose significant extra demands on a scalable service. This session is a technical discussion of the challenges Flipboard met while scaling a data-intensive mobile app from 0 to 100 million clients and how they are working on scaling 10x using AWS. At each major step, Flipboard has encountered many challenges. Learn about how they handled those challenges and the evolution of their systems architecture, design choices, and software selection.TRANSCRIPT
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Joey Parsons @joeyparsons
November 14th, 2013
Scaling a Mobile Web App to 100 Million Clients and Beyond
Friday, November 15, 13
Friday, November 15, 13
YOUR PERSONAL MAGAZINE
Friday, November 15, 13
The ultimate way to discover, consume & share content on the mobile, social web
Wednesday, June 5, 13Friday, November 15, 13
Friday, November 15, 13
Friday, November 15, 13
Friday, November 15, 13
How are mobile apps different?• WiFi vs Slow connectivity
• Variances in bandwidth and global carriers
• Taking advantage of the local cache • Control your behavior during latency
• Fast devices — significant opportunity for client computation
Friday, November 15, 13
Prototype Phase: From 0 to 1M users
Friday, November 15, 13
- Amazon EC2- Amazon S3- Amazon RDS
Friday, November 15, 13
The Initial Launch Night
Friday, November 15, 13
Things we should have done…• Make sure to prepare for Amazon limits if you need to
scale quickly
• Make sure your external partners understand the volumes you’ll be accessing them
Friday, November 15, 13
Challenges• Understanding the scale of our services• Little to no insight into performance• Beginning to build out tooling for Amazon EC2 but still
in its infancy• No centralized logging or way of detecting errors
Friday, November 15, 13
Getting Started:From 1M to 10M Users
Friday, November 15, 13
- Amazon EC2- Amazon RDS- Amazon S3 - Amazon CloudFront
Friday, November 15, 13
Architecture Changes• Different services have different scale profiles — began
the shift towards microservices• Image content moved to CloudFront• Moved primary data store to MySQL via Amazon RDS• Home grown bash scripts for deploys• Focus on instrumentation
• Logging, Metrics, Monitoring followed suit
Friday, November 15, 13
Host[i-76e33611] - Amazon Instance ID name[tsd01] - Name of the instance owner[ops] - Who owns the instance? service[OOS] - IS for in service, OOS for out of service ami[ami-3f622f56] - What AMI was used type[m1.xlarge] - Type of EC2 instance loc[us-east-1a] - Region and Availability Zone role[flops] - Type of role subclass[opentsdb] - Subclass of role group[0] - Group of node pool[production] - Production, staging, dev public[50.16.58.220] - Public IP address private[10.60.43.18] - Private IP Address
SimpleDB for CMDB
Friday, November 15, 13
# fl-inst-describe -r flip -p production -g 0 -s IS -o ops
Domain[flipboard.prod.instances] has count[1] hosts meeting criteria=======================================Servers of role flip=======================================
Host[i-5b8ae323]: name[flip05] owner[ops] service[IS] public[54.226.44.212] private[10.78.167.211] role[flip] group[0] pool[production] subclass[standard] type[c1.xlarge]
Querying our CMDB
Friday, November 15, 13
The iPhone Launch Night
Friday, November 15, 13
Scaling Fast:10M to 100M Users
Friday, November 15, 13
Storm Kafka GraphiteKibana
Friday, November 15, 13
Architecture Changes• Heavy focus on instrumentation of all services• Pipeline of batch processing using Hadoop• Pipeline of real-time processing using Storm + Kafka• Keen focus on using appropriately sized EC2
instances• Moving off of bash scripts, moving to puppet
Friday, November 15, 13
Mobile application instrumentation
Friday, November 15, 13
All at once?
fl-inst-upgrade -r flip -p production-q
… or …
By group?
fl-inst-upgrade -r flip -p production -g 0 -q
Deploy by groups
Friday, November 15, 13
Using CloudWatch metrics for errors
Friday, November 15, 13
fl-inst-upgrade -r flip -p production -g 1 -q
Continued your deploy
Friday, November 15, 13
Graphite for all metrics
Friday, November 15, 13
Millions of metrics with Graphite
Friday, November 15, 13
d3.js + cubism.js
Friday, November 15, 13
Monitoring via CloudWatchAlarm in PagerDuty
Details available in PagerDuty
Friday, November 15, 13
Lessons Learned• Use Amazon services when possible (Amazon RDS,
Amazon Redshift, Amazon Route 53)• Use SSDs where applicable• Understand your scale and your needs going forward
and invest in Reserved Instances (3 years!)• But, allow flexibility for changing needs and instance
types
Friday, November 15, 13
Amazon Technologies Used• Amazon CloudFront• Amazon Route 53• Amazon EC2• Amazon S3• Amazon Redshift
• Amazon RDS• Amazon SimpleDB• Amazon SQS• ElastiCache• Amazon CloudWatch
Friday, November 15, 13
Beyond:From 100M Users to 1B!
Friday, November 15, 13
What’s next?• Better use of Auto Scaling groups• Predictive analytics — lots of signals• Automated remediation• Heavy focus on using the right instance types for each
service• Take advantage of new AWS products
Friday, November 15, 13
The unknown is exciting …
Friday, November 15, 13
Questions?
Friday, November 15, 13
AWS re:Invent 2013Magazine
http://flip.it/NSNEi
Friday, November 15, 13
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
MBL302 Thank You
Friday, November 15, 13