scalability rules for web sites
DESCRIPTION
This presentation describes some interesting rules in Scaling Web SitesTRANSCRIPT
Scalability Rules forWeb Sites
Minh Tran – 12/2011
Distribute Your Work
Distribute Your Work
• Design to Clone Things (X axis)• Design to Split Different Things (Y Axis)• Design to Split Similar Things (Z Axis)
Design to Clone Things (X axis)
• When:– Databases with a VERY HIGH READ to write ratio
(5:1 or greater—the higher the better).– Any system where transaction growth exceeds
data growth.• How to use:
– Simply clone services and implement a load balancer.
– For databases, ensure the accessing code understands the difference between a read and a write.
Example
• Reservation system has– 400 searches (read) for 1 booking (write)
• How to scale??Scaled by creating read-only copies (or replicas)One way is to use a caching tier in front of the
database (high recommended 1st step)Most RDBMS allows replication “out of the box”
Master ~ primary transactional database (write)Slave ~ read-only copies of the master
Design to Split Different Things(Y Axis)
• When:– Very large data sets where relations between
data are not necessary.– Large, complex systems where scaling
engineering resources requires specialization.• How to use:
– Split up actions by using verbs or resources by using nouns or use a mix.
– Split both the services and the data along the lines defined by the verb/noun approach.
Example – Split up by verbs
Ecommercesystem
Login
Search
Browse
View
Add-to-cart
Signup
Purchase / Buy
Example – Split up by nouns
Ecommercesystem
Product
SKU
Catalog
Inventory
User Information
Design to Split Similar Things(Z Axis)
• When:– Very large, similar data sets such as large and rapidly
growing customer bases.• How to use:
– Identify something about the customer• Customer ID• Last name• Geography• Device
– Split or partition both data and services based on that attribute.
• Often referred to as sharding or horizontal partitioning
Use the Right Tool
• Use Databases Appropriately• Actively Use Log Files
Use Databases Appropriately
RDBMS File System
Example • Oracle, MySQL… • GFS, MogileFS, Ceph
Storage Structure
• Transactional integrity (ACID)• Relational structure within
tables
• No transactional• No relationships
Advantages • Minimize data redundancy• Improve transaction
processing
• Handle very large amount of files and data
Limitation • Scalability (ACID)• Sharding or Partitioning
(Relational structure)
• conflicting reads and writes over time
Use Databases Appropriately (cont)NoSQL
Example • Memcached, Tokyo Tyrant, Voldemort
• Google Big Table, Cassandra
• CouchDB, Amazon ‘s SimpleDB, Yahoo’s PNUTS, MongoDB,…
Storage Structure
• Key-value stores• Single key-value
index for data• Stored in memory
• Extensible record stores• Row and column data
model
• Document stores• Multi-indexed object
model• Documents can be
aggregated into collection of documents
Advantages • Significant scaling and performance
• Rows are sharded on primary keys (automatic)
• Columns are broken into groups (user definitions)
• Can be queried based on many different attributes
Limitation • Kind of data can be stored
• Synchronous replication
• Asynchronous replication • Asynchronous replication
• ACID
Actively Use Log Files
• When– Put a process in place that monitors log files– Forces people to take action on issues
identified.• How to use
– Use any number of monitoring tools from custom scripts to Splunk to watch your application logs for errors
– Export these and assign resources for identifying and solving the issue.
Use Caching Aggressively
• Leverage Content Delivery Networks• Use Expires Headers• Leverage Page Caches• Utilize Application Caches• Make Use of Object Caches• Put Object Caches on Their Own “Tier”
Leverage Content Delivery Networks
• When– Ensure it is cost justified and then choose
which content is most suitable.• How to use
– Most CDNs leverage DNS (Domain Name Services or Domain Name Servers) to serve content on your site’s behalf.
Use Expires Headers
• When– All object types
need to be considered.
• How to use– Headers can be set
on Web servers or through application code.
HTTP Status Code: HTTP/1.1 200 OK
Date: Thu, 21 Oct 2010 20:03:38 GMT
Server: Apache/2.2.9 (Fedora)
X-Powered-By: PHP/5.2.6
Expires: Mon, 26 Jul 2011 05:00:00 GMT
Last-Modified: Thu, 21 Oct 2010 20:03:38 GMT
Cache-Control: no-cache
Vary: Accept-Encoding, User-Agent
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Leverage Page Caches
• When– Always
• How to use– Choose a
caching system and deploy.
Use Caching Aggressively
• Leverage Content Delivery Networks• Use Expires Headers• Leverage Page Caches• Make Use of Object Caches• Put Object Caches on Their Own “Tier”
Make Use of Object Caches
• When:– Any time you have repetitive queries or
computations.• How to use:
– Select any one of the many open source or vendor supported solutions
– Implement the calls in your application code.• Some popular caches: Memcached,
Ehcache, Apache OJB, NCache
Put Object Caches on Their Own “Tier”
Learn Aggressively
• Take every opportunity to learn.• Be constantly learning from your mistakes
as well as successes.• Watch your customers or use A/B testing
to determine what works.• Use postmortems to learn from incidents
and problems in production.
Reference
• Scalability Rules: 50 Principles for Scaling Web Sites
• The Art of Scalability: Scalable Web Architecture, Processes, and Organizations for the Modern Enterprise
• http://www.codefutures.com/database-sharding/ • http://
highscalability.com/unorthodox-approach-database-design-coming-shard