Download - Scaling Systems: Architectures that grow
Scaling Systems: Architectures that Grow
Fundamental Patterns for scaling you can implement incrementally
Kendall Miller
Who Am I?
• Kendall Miller• One of the Founders of Gibraltar
Software• Small Independent Software Vendor Founded in 2008• Developers of VistaDB and Loupe• Engineers, not Sales People
• Enterprise Systems Architect & Developer since 1995
• BSE in Computer Engineering, University of Illinois Urbana-Champaign (UIUC)
What Do We Do?
LoupeAdvanced logging and analysis of errors, performance, and usage patterns for .NET web apps, desktop apps and services
VistaDB The easy-to-deploy, SQL Server-compatible, pure .NET embedded database.
Fair Warning
What is Scale?
Scaling is the ability to cope and perform
under an increasing workload.
What is Scale?
Scaling to a load = available sustaining
that load
What is Scale?
Being available is really about a request being completed in a
period of time.
What is Scale?
•Requests per Unit Time
•Maximum Request Latency
1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08
Microsoft.com
Twitter.com
Amazon.com
Target.com
Slashdot.org
DevExpress.com
Hanselman.com
Gibraltar Software
What’s your Target?
Average daily traffic in Visitors / Day
What’s your Target?
25,000 Visitors/Day = 125,000
Pages/Day
11 High Traffic Hours/Day = 12,000
Pages/Hour
12,000 Pages/Hour = 3.3
Pages/Second
Specific Architectures• Gossip • Map Reduce• Tree of
Responsibility• Stream Processing• Scalable Storage• Publish/Subscribe• Distributed Queues
• Load Balancers + Shared Nothing Units
• Load Balancers + Stateless Nodes + Scalable Storage
• Content Addressable Networks
• General Peer to Peer
ACD C
ACD/C
• Async – Do the work whenever• Caching – Don’t do any work
you don’t have to• Distribution – Get as many
people to do the work as you can• Consistency – We all agree on
these key things
Async
• Decouple operations so you do the minimum amount of work in performance critical paths
• Queue work that can be completed later to smooth out load
• Speculative Execution• Scheduled Requests (Nightly
processes)
Caching
• Save results of earlier work nearby where they are handy to use again later
• Apply in front of anything that’s time consuming
• Easiest to apply from the left to the right
• Simple strategies can be really effective (EF Dump all on update)
Why Caching?
• Loading the world is impractical• Apps ask a lot of repeating
questions.• Stateless applications even more so
• Answers don’t change often• Authoritative information is
expensive
Distribution
• Distribute requests across multiple systems
• Classic web “Scale Out” approach
• The less state held, the easier to distribute work. • Distributed database = hard• Distributed static content server = easy
• Request routing for distribution can serve other availability purposes
Consistency
• The degree to which all parties observe the same state of the system at the same time
• Scaling inevitably requires compromise• Forces one source of the truth for absolute
consistency and requires extensive locking to ensure parties agree
• The real world doesn’t require the consistency we tend to demand of our systems
Consistency Challenges
• Singleton Data Structures (Order numbers..)
• State held between the endpoints of a process
• Consistent results of queries across partitioned datasets
Typical Application
Client (Web
Browser)
Server(Web
Server)Storage
(Database)
Session StateSSL Session
Log ContentionMemory Allocation/GC
Network SocketsRequest Queue
Transaction IsolationReader/Writer Locks
Singleton Data Structures
Caching
Client (Web
Browser)
Server(Web
Server)Storage
(Database)
Browser Cache
Output Cache
Content Cache
Query Cache
100% 50% 10% 1%
Client (Web
Browser)
Distribution
Server(Web
Server)
Storage(Database)
Client (Web
Browser)Client (Web
Browser)Client (Web
Browser)
Server(Web
Server)
Reverse Proxy
Session State and Identity need to be factored outPartition (Sticky Session)
First, then stateless nodes
Server(Web
Server)Client (Web
Browser)
Partitioned Storage Zones
Server(Web
Server)
Storage(Database)Client
(Web Browser)
Client (Web
Browser)Client (Web
Browser)
Server(Web
Server)
Customer A Server(Web
Server)Storage
(Database)
Customer B
Server(Web
Server)
Client (Web
Browser)
Partitioned Storage Intra-Zone
OrdersClient (Web
Browser)Client (Web
Browser)Client (Web
Browser)
Server(Web
Server)
Customer A
Products
Customer B
Server(Web
Server)Server(Web
Server)
Inventory
Server(Web
Server)
Asynchronous Processing
OrdersServer(Web
Server)
Products
Server(Web
Server)Server(Web
Server)
Inventory
Order Queue
Order Processing
Server
Fresh Problems
Fallacies of Distributed Computing
• The network is reliable• Latency is zero• Bandwidth is infinite• The network is secure• Topology doesn’t change• There is one administrator• Transport cost is zero• The network is homogeneous
Client (Web
Browser)
Fresh Problems: Partial Failures
Server(Web
Server)
Storage(Database)
Client (Web
Browser)Client (Web
Browser)Client (Web
Browser)
Server(Web
Server)
Fresh Problems: Partial Failures
• Break system into individual failure zones
• Monitor each instance of each zone for problems
• Route around bad instances
Without monitoring, redundancy is
worthless
Server(Web
Server)Client (Web
Browser)
Fresh Problems: Upgrades
Server(Web
Server)
Storage(Database)Client
(Web Browser)
Client (Web
Browser)Client (Web
Browser)
Server(Web
Server)
Customer A Server(Web
Server)Storage
(Database)
Customer B
Fresh Problems: Upgrades
• Break system into individual upgrade zones
• Upgrade each zone – Drain & Stop, Upgrade, Verify.
• Cut traffic over to updated zones
Design for Software Update From the Start• Don’t forget Data Schemas
Bring It All HomeDon’t worry, we got this.
Bringing Home the Bacon
TestingTestingTesting
Critical Lessons Learned
• ACD/C• Clear
Consistency Strategy
• Build in monitoring and management