scaling

76
SCALING Òscar Vilaplana @grimborg http://oscarvilaplana.cat

Upload: oscar-vilaplana

Post on 01-Nov-2014

187 views

Category:

Technology


4 download

DESCRIPTION

Scaling: a naïve approach A look at how to scale an existing monolythic system, and how companies such as Disqus and Eventbrite have done it.

TRANSCRIPT

Page 1: Scaling

SCALING

Òscar Vilaplana @grimborg http://oscarvilaplana.cat

Page 2: Scaling

WHAT’S THIS ABOUT?

People

Technology

Tools

Page 3: Scaling

PEOPLECare

Focus

Automate & Test.

Shared brain

Finish & DRY.

Page 4: Scaling

TECHDesign to clone

Separate pieces

API

Offload everything

Measure

Page 5: Scaling
Page 6: Scaling

VIRTUAL QUEUE

Queue Instance

Queue Instance

Queue Instance

Queue Instance

Page 7: Scaling

VIRTUAL QUEUE

Queue Instance

Queue Instance

Queue Instance

Page 8: Scaling

VIRTUAL QUEUE

Queue Instance

Queue Instance

Queue Instance

Page 9: Scaling

VIRTUAL QUEUE

Queue Instance

Queue Instance

Queue Instance

Queue Instance

Page 10: Scaling

TECH• Design to clone

• Separate pieces

• API

• Offload everything

• Measure

Page 11: Scaling

TYPES OF TASKS

• Realtime

• ASAP

• When you have time } Async!

Page 12: Scaling

INSTAGRAM’S FEED

• Redis queue per follower.

• New media: push to queues

• Small chained tasks

Page 13: Scaling

INSTAGRAM’S FEED

harro wouter orestis siebejan oscar

Schedulenext

batch

Page 14: Scaling

SMALL TASKS• 10k followers per task

• < 2s

• Finer-grained load balancing

• Lower penalty of failure/reload

Page 15: Scaling

CELERY: REDIS• Good: Fast

• Bad:

• Polling for task distribution

• Messy non-synchronous replication

• Memory limits task capacity

Page 16: Scaling

CELERY: BEANSTALK• Good:

• Fast

• Push to consumers

• Writes to disk

• Bad:

• No replication

• Only useful for Celery

Page 17: Scaling

CELERY: RABBITMQ• Fast

• Writes to disk

• Low-maintenance synchronous replication

• Excellent Celery compatibility

• Supports other use cases

Page 18: Scaling
Page 19: Scaling
Page 20: Scaling
Page 21: Scaling

RESERVATIONS• UI

• Room locking

• Room availability

• Registration manager

• Email, PDF invoice

• Payment

• Login

• …

Page 22: Scaling

WE DON’T DO THISdef do_everything(request): hotel_id = request.GET.hotel_id room_number = request.GET.room_number with room_mutex(hotel_id, room_number): room = (session.query(Room) .filter(Room.hotel_id == hotel_id) .filter(Room.room_number == room_number).one()) if not room.available: return Response("Room not available”, template=room_template) reservation = Reservation(client=request.client, room=room) session.add(reservation) room.available = False price = # price_calculation payment = Payment(reservation=reservation, price=price) session.add(payment) session.commit() url = payment.get_psp_url() return Redirect(url)

Page 23: Scaling

BUT WE DO THIS• Frontend UI

• Locking rooms

• Calculating room availability

• Temporarily locking rooms

• Payment processing

• Mail

• PDF invoice generation

Page 24: Scaling

BUT WE CAN SCALE!

Page 25: Scaling

SCALE DB: HARD• Slaves

• Master-Master?

• Sharding?

Page 26: Scaling

SCALING

Page 27: Scaling

MINOR SCALE

Page 28: Scaling

MAJOR SCALE

Page 29: Scaling
Page 30: Scaling

FRONTEND

Everything Frontend

Externalpaymentproviders

User

Everything Frontend

Master

Read slaves

Page 31: Scaling

SPLIT

• Responsibility

• Stateful/stateless

• Type of system

Page 32: Scaling

TYPES OF SYSTEMS

• Unique (mutex, datastore)

• Multiple

Page 33: Scaling

TYPES OF TASKS

• Realtime

• ASAP

• When you have time } Async!

Page 34: Scaling

SPLIT THIS

Everything Frontend

Externalpaymentproviders

User

Everything Frontend

Master

Read slaves

Page 35: Scaling

AUTONOMOUS SYSTEMS

Payment

Externalpaymentproviders

Locking

InvoicePDF

Mailer

UI Reservations ManagerUser

SessionStorage

DatawarehouseReporting

Configuration

Payout

Page 36: Scaling

CLONABILITY

Page 37: Scaling

CLONABILITY

Page 38: Scaling

CLONABILITY

Frontend

Page 39: Scaling

CLONABILITY

Everything Frontend

Externalpaymentproviders

User

Everything Frontend

Master

Read slaves

Page 40: Scaling

WHAT’S IN AN EASY STEPAs little change as possible.

Reuse.

Unintrusive.

Measure.

Go on the right direction.

Page 41: Scaling

SMALL STEPS

PROBLEMS? !

Oversells Configuration Reporting Payout

Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Page 42: Scaling

SMALL STEPSPROBLEMS? !

Oversells Configuration Reporting Payout SessionsRoom

Availability

Lock

ReadEverything FrontendEverything Frontend

Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Page 43: Scaling

ISOLATED SYSTEM Best technology

Decoupled

API

Testable

Page 44: Scaling

SMALL STEPSPROBLEMS? !

Oversells Configuration Reporting Payout Sessions

Everything FrontendConfig Backend

Settings

Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Page 45: Scaling

INITIAL SYSTEM

Everything Frontend

Page 46: Scaling

INITIAL SYSTEM (MODIFIED)

Everything Frontend Sales

Sync

Page 47: Scaling

INITIAL SYSTEM (MODIFIED)

Sales Backend

Page 48: Scaling

SMALL STEPSPROBLEMS? !

Oversells Configuration Reporting Payout Sessions

Everything FrontendSales Backend

Sales

Main DB

Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Page 49: Scaling

SMALL STEPSPROBLEMS? !

Oversells Configuration Reporting Payout SessionsSession

Storage Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Everything FrontendEverything Frontend

Page 50: Scaling

WHEN?• Difficult.

• Measure everything.

• Find patterns.

• Define thresholds.

• Design: address as risk.

• Don’t overenigneer — Don’t ignore.

Page 51: Scaling

EVENTBRITE

• 2012: $600M ticket sales

• Accumulated: $1B

Page 52: Scaling

TECHNOLOGY• Monitoring: nagios, ganglia, pingdom

• Email: offloaded to StrongMail

• Load-balanced read slave pool

• Feature flags

• Automated server configuration and release with Puppet and Jenkins

Page 53: Scaling

TECHNOLOGY• Feature flags

• Develop on Vagrant

• Celery + RabbitMQ

• Virtual customer queue

• Big data for reporting, fraud, spam, event recommendations

Page 54: Scaling

TECHNOLOGY

• Hadoop

• Cassandra

• HBase

• Hive

• Separated into independent services

Page 55: Scaling

TIPS

• Instrument and monitor everything

• Lean

Page 56: Scaling

HOW BIG?

• 2Gb/day database transactions

• 3.5Tb/day social data analyzed

• 15Gb/day logs

Page 57: Scaling

ORDER PROCESSOR

• Pub/sub queue with Cassandra and Zookeeper

Page 58: Scaling

PUBLISHING

Publisher

Get queue lock+last batch id

Create new batch“process orders 10, 11, 12”

Store batch id, release lock

Page 59: Scaling

SUBSCRIBING

Subscriber

Get my latest processed batch id

Store result

Update my latest processed batch id

Page 60: Scaling

SCALING STORAGE• Move to NoSQL

• Aggressively move queries to slaves

• Different indexes per slave

• Better hardware

• Most optimal tables for large and highly-utilized datasets

Page 61: Scaling

EMAIL ADDRESSES

• Users have many email addresses.

• Lookup by email, join to users table

Page 62: Scaling

FIRST ATTEMPTCREATE TABLE `user_emails` (

`id` int NOT NULL AUTO_INCREMENT,

`email_address` varchar(255) NOT NULL,

... --other columns about the user

`user_id` int, --foreign key to users

KEY (`email_address`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Page 63: Scaling

FIRST ATTEMPT

Page 64: Scaling

LOOKUP

Page 65: Scaling

CAN IT BE IMPROVED?

Page 66: Scaling

INDEX VS PK• InnoDB: B+trees, O(log n)

• Known user id: index on email not needed.

• Small win on lookup: O(1)

• Big win on not storing the index.

Page 67: Scaling

INNODB INDEXES

Page 68: Scaling

HASH TABLE

Page 69: Scaling

DISQUS• >165K messages per second

• <10ms latency

• 1.3B unique visitors

• 10B page views

• 500M users in discussions

• 3M communitios

• 25M comments

Page 70: Scaling

ORIGINAL REALTIME BACKEND

• Python + gevent

• NginxPushStream

• Network IO: great

• CPU: choking at peaks

• <15ms latency

Page 71: Scaling

CURRENT REALTIME BACKEND

• Go

• Handles all users

• Normal load:3200 connections/machine/sec

• <10ms latency

• Only 10%-20% CPU

Page 72: Scaling

Workers

CURRENT REALTIME BACKEND

Subscribed to results

Push result to userNginxPushStream

Page 73: Scaling

TESTING

• Test with real traffic

• Measure everything

Page 74: Scaling

LESSONS• Do work once, distribute results.

• Most likely to fail: your code. Don’t reinvent. Keep team small.

• End-to-end ACKs are expensive. Avoid.

• Understand use cases when load testing.

• Tune architecture to scale.

Page 75: Scaling

LEARN MORE• Instagram

• Braintree

• highscalability.com

• VelocityConf (youtube, nov 2014 @ bcn?)

Page 76: Scaling

QUESTIONS? ANSWERS?

THANKS!

Òscar Vilaplana @grimborg http://oscarvilaplana.cat