wix architecture at scale - qcon london 2014
DESCRIPTION
In this talk I will go over Wix's architecture, how we evolved our system to be highly available even at the worst case scenarios when everything can break, how we built a self-healing eventual consistency system for website data distribution and will show some of the patterns we use that helps us render 45M websites while maintaining a relatively low number of servers.TRANSCRIPT
Aviran MordoHead of Back-End Engineering @ Wix
@aviranm
linkedin.com/in/aviran
aviransplace.com
Wix Architecture at Scale
Wix in Numbers
Over 45,000,000 users1M new users/month
Static storage is >800TB of data1.5TB new files/day
3 data centers + 2 clouds (Google, Amazon)300 servers
700M HTTP requests/day
600 people work at Wix, of which ~ 200 in R&D
Initial Architecture
Built for fast development
Stateful login (Tomcat session), Ehcache, file
uploads
No consideration for performance, scalability and
testing
Intended for short-term use
Tomcat, Hibernate, custom web framework
Lighttpd(file
serving)MySQL
DB
Wix(Tomcat)
The Monolithic Giant
One monolithic server that handled everything
Dependency between features
Changes in unrelated areas of the system caused deployment of the whole system
Failure in unrelated areas will cause system wide downtime
Breaking the System Apart
Concerns and SLA
Data Validation
Security / Authentication
Data consistency
Lots of data
Edit websites
High availability
High performance
Lots of static files
Very high traffic volume
Viewport optimization
Cacheable data
Serving Media
High availability
High performance
High traffic volume
Long tail
View sites, created by Wix editor
Wix Segmentation
1. Editor Segment 3. Public Segment2. Media Segment
Networking
Making SOA Guidelines
Each service has its own database (if one is needed)
Only one service can write to a specific DB
There may be additional read-only services that directly accesses the DB (for performance reasons)
Services are stateless
No DB transactions
Cache is not a building block, but an optimization
1. Editor Segment
Editor Server
Immutable JSON pages (~2.5M / day)
Site revisions
Active – standby MySQL cross datacenters
Editor Server
MySQL Active Sites
MySQL Archiv
e
Find Your Critical Path
Protect The Data
Protect against DB outage with fast recovery = replication
Protect against data poisoning/corruption = revisions / backup
Make the data available at all times = data distribution to multiple locations / providers
Browser
Editor Server
Static Grid
Notify
Google Cloud Storag
e
MySQL Active Sites
MySQL Archiv
e
Notify
Saving Editor Data
Archive (Amazon)
Archive (Google)
Save Page(s)
200 OK
Upload
Save Page
DC replication
Download Page
MySQL Archiv
e
MySQL Active Sites
Browser
Editor Server
Static Grid
Save Page(s)
Save Page
Upload
NotifyDownload Page
Google Cloud Storag
e
MySQL Archiv
e
MySQL Active Sites
MySQL Archiv
e
DC replication
Notify
Self Healing Process
Archive (Amazon)
Archive (Google)
MySQL Active Sites
200 OK
No DB Transactions
Save each page (JSON) as an atomic operation
Page ID is a content based hash (immutable/idempotent)
Finalize transaction by sending site header (list of pages)
Can generate orphaned pages, not a problem in practice
2. Media Segment
Prospero – Wix Media Storage
800TB user media files
3M files uploaded daily
500M metadata records
Dynamic media processing• Picture resize, crop and sharpen “on the fly”• Watermark• Audio format conversion
Prospero
Eventual consistent distributed file system
Multi datacenter aware
Automatic fallback cross DC
Run on commodity servers & cloud
x36Tx36Tx32
Austin
Prospero – Wix Media Manager
get image.jpg
First fallback
Second
fallback
If not in
CDN
Google Cloud
x36Tx36Tx32
Tampa
CDN
3. Public Segment
Public Segment Roles
Routing (resolve URLs)
Dispatching (to a renderer)
Rendering (HTML,XML,TXT) Public
Server
HTML Rendere
r
HTML SEO
Renderer
Flash Renderer
Sitemap Rendere
r
Robots.txt
Renderer
www.example.com
Flash SEO
Renderer
Public SLA
Response time <100ms at peak traffic
Publish A Site
Publish site header (a map of pages for a site)
Publish routing table
Publish site header / routes
Editor Segment Public Segment
Built For Speed
Minimize out-of-service hops (2 DB, 1 RPC)
Lookup tables are cached in memory, updated every 5 minutes
Denormalized data – optimize for read by primary key (MySQL)
Minimize business logic
How a Page Gets Rendered
Bootstrap HTML template that contains only data
Only JavaScript imports
JSON data (site-header + dynamic data)
No “real” HTML view
Offload rendering work to the browser
The average Intel Core i750 can push up to 7 GFLOPS without overclocking
Why JSON?
Easy to parse in JavaScript and Java/Scala
Fairly compact text format
Highly compressible (5:1 even for small payloads)
Easy to fix rendering bugs (just deploy a new client code)
Minimum Number of Public Servers Needed to Serve 45M Sites
4
Public SLABe Available 99.99999%
Serving a Site – Sunny Day
Archive
CDN Statics
Browserhttp://
example.wix.com
Store HTML to cache
HTTP Request
Notify site view
LB
Public
Renderer
HTML
Resources / Media
HTTP Request
Serving a Site – DC Lost
Archive
CDN Statics
Browserhttp://
example.wix.com
LB
Public
Renderer
LB
Public
RendererChange DNS
HTTP Request
Serving a Site – Public Lost
Archive
CDN Statics
Browserhttp://
example.wix.com
LB
Public
Renderer
Get Cached HTML Version
HTMLHTTP Request
Living in the Browser
Archive
CDN Statics
Browserhttp://
example.wix.com
LB
Public
Renderer
Editor
Fallback
JSON / Media
HTMLHTTP Request
Fallback
Summary
Identify your critical path and concerns
Build redundancy in critical path (for availability)
De-normalize data (for performance)
Minimize out-of-process hops (for performance)
Take advantage of client’s CPU power
Aviran MordoHead of Back-End Engineering @ Wix
@aviranm
linkedin.com/in/aviran
aviransplace.com
Q&A
http://goo.gl/Oo3lGr