engineering scalable social games

59

Upload: amity

Post on 23-Feb-2016

23 views

Category:

Documents


1 download

DESCRIPTION

Engineering Scalable Social Games. Dr . Robert Zubek. Top social game developer. Growth Example Roller Coaster Kingdom . 1M DAUs over a weekend. source: developeranalytics.com. Growth Example FishVille. 6 M DAUs in its first week. source: developeranalytics.com. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Engineering Scalable Social Games
Page 2: Engineering Scalable Social Games

Engineering ScalableSocial Games

Dr. Robert Zubek

Page 3: Engineering Scalable Social Games

Top social game developer

Page 4: Engineering Scalable Social Games

Growth ExampleRoller Coaster Kingdom

1M DAUs over a weekend

source: developeranalytics.com

Page 5: Engineering Scalable Social Games

Growth ExampleFishVille

6M DAUs in its first week

source: developeranalytics.com

Page 6: Engineering Scalable Social Games

Growth ExampleFarmVille

25M DAUs over five months

source: developeranalytics.com

Page 7: Engineering Scalable Social Games

Talk Overview

Part I. Game Architectures

Part II. Scaling Solutions

Introduce game developers to bestpractices for large-scale web development

Page 8: Engineering Scalable Social Games

• server approaches• client approaches

Part I. Game Architectures

Page 10: Engineering Scalable Social Games

• Game logic in MMO server (eg. Java)

• Web stack for everything else

Server Side

• Usually based on a LAMP stack

• Game logic in PHP• HTTP communication

Mixed stackWeb stack

Page 11: Engineering Scalable Social Games

Example 1FishVille

Web Servers + CDNCache and

Queue Servers

DB Servers

Page 12: Engineering Scalable Social Games

Example 2 YoVille

Web Servers + CDN

MMO Servers

Cache and Queue Servers

DB Servers

Page 13: Engineering Scalable Social Games

• HTTP scales very well• Stateless request/response• Easy to load-balance across servers• Easy to add more servers

Some limitations• Server-initiated actions• Storing state between requests

Why web stack?

Page 14: Engineering Scalable Social Games

caches

exampleWeb stack and state

Web #1

DB Servers

Web #2Web #3 Client

Web #4

xxx

Page 15: Engineering Scalable Social Games

MMO servers

• Minimally “MMO”:• Persistent socket connection per client• Live game support: chat, live events• Supports data push• Keeps game state in memory

• Very different from web!

Page 16: Engineering Scalable Social Games

Why MMO servers?

• Harder to scale out!• But on the plus side:

1. Supports live communication and events2. Game logic can run on the server,

independent of client’s actions3. Game state can be entirely in RAM

Page 17: Engineering Scalable Social Games

Example Web + MMO stack

Web Servers

MMO Servers

Client

caches

DB Servers

Page 18: Engineering Scalable Social Games

• The game is “just” a web page

• Minimal sys reqs• Limited graphics and

animation

Client Side

• High production quality games

• Game logic on client• Can keep open

socket

HTML + AJAXFlash

Page 19: Engineering Scalable Social Games

Social Network Integration

• “Easy” because not related to scaling

• Calling the host network to get a list of friends, post something, etc.

• Networks provide REST APIs, and sometimes client libraries

Page 20: Engineering Scalable Social Games

Architectures

Data

Server

Client

database, cache, etc.

Web server stack

HTML Flash

Web + MMO

Flash

Page 21: Engineering Scalable Social Games

Part I. Game Architectures

Part II. Scaling Solutions

Introduce game developers to bestpractices for large-scale web development

not blowing upas you grow

Page 22: Engineering Scalable Social Games

ScalingScale up vs. scale out

I need more boxes

I need a bigger box

Page 23: Engineering Scalable Social Games

ScalingScale up vs. scale out

Scale out has been a clear win• At some point you can’t

get a box big enough, quickly enough

• Much easier to add more boxes

• But: requires architectural support from the app!

Page 24: Engineering Scalable Social Games

Example:Scaling up vs. out

• 500,000 daily players in first week• Started with just one DB, which quickly

became a bottleneck• Short term: scaled up, optimized DB accesses,

fixed slow queries, caching, etc.• Long term: must scale out with the size

of the player base!

Page 25: Engineering Scalable Social Games

ScalingWe’ll look at four elements:

Databases

Game servers (web + MMO)

Caching

Capacity planning

Page 26: Engineering Scalable Social Games

II A. Databases

App Servers

App Servers

Client

caches

queues

DBDB

DBs are the first to fall over

Page 27: Engineering Scalable Social Games

App

W R

Page 28: Engineering Scalable Social Games

App

W R

M S S

App

Page 29: Engineering Scalable Social Games

M S S

App

M

App

M

Page 30: Engineering Scalable Social Games

M

App

MM0

App

M1

Page 31: Engineering Scalable Social Games

How to partition data?

• Two most common ways:• Vertical – by table

• Easy but doesn’t scale with DAUs• Horizontal – by row

• Harder to do, but gives best results!• Different rows live on different DBs• Need a good mapping from row # to DB

Page 32: Engineering Scalable Social Games

id Data

100 foo

101 bar

102 baz

103 xyzzy

Row stripingM0 M1• Row-to-shard mapping:

• Primary key modulo # of DBs• Like a “logical RAID 0”• More clever schemes exist, of course

Page 33: Engineering Scalable Social Games

S0 S1M2 M3

x xScaling out your shards

M0

ServersServersServers

M1

Page 34: Engineering Scalable Social Games

Sharding in a nutshell

• It’s just data striping across databases (“logical RAID 0”)• There’s no automatic replication, etc.• No magic about how it works• That’s also what makes it robust,

easy to set up and maintain!

Page 35: Engineering Scalable Social Games

Example: Sharding

• Biggest challenge: designing how to partition huge existing dataset

• Partitioned both horizontally and vertically• Lots of joins had to be broken • Data patterns needed redesigned

• With sharding, you need to know the shard ID for each query!

• Data replication had difficulty catching up with high-volume usage

Page 36: Engineering Scalable Social Games

Sharding surprises

• Can’t do joins across shards• Instead, do multiple selects,

or denormalize your data

Be careful with joins

Page 37: Engineering Scalable Social Games

Sharding surprises

• CPU-expensive; easier to pay this cost in the application layer (just be careful)

• The more you keep in RAM, the less you’ll need these

Skip transactions and foreign key constraints

Page 38: Engineering Scalable Social Games

II B. Caching

• DBs can be too slow• Spend memory to buy speed!

caches

queues

caches

queues

App Servers

App Servers

ClientDB

Page 39: Engineering Scalable Social Games

Caching

• Two main uses:• Speed up DB access to commonly

used data• Shared game state storage for stateless

web game servers

• Memcached: network-attached RAM cache

Page 40: Engineering Scalable Social Games

Memcache

• Stores simple key-value pairs• eg. “uid_123” => {name:”Robert”, …}

• What to put there?• Structured game data• Mutexes for actions across servers

• Caveat• It’s an LRU cache, not a DB!

No persistence!

Page 41: Engineering Scalable Social Games

Memcache scaling out

DBs MCs App

Shard it just like the DB!

Page 42: Engineering Scalable Social Games

II C. Game Servers

Web Servers

MMO Servers

Client

caches

queues

DBWeb

Servers

MMO Servers

Page 43: Engineering Scalable Social Games

LB

LB

DNSLB

Scaling web servers

LB

Page 44: Engineering Scalable Social Games

Scaling content delivery

CDN

game dataonly

mediafiles

mediafiles

Web Servers Client

Page 45: Engineering Scalable Social Games

Scaling MMO servers

• Scaling out is easiest when servers don’t need to talk to each other• Databases: sharding• Memcache: sharding• Web: load-balanced farms• MMO: ???

• Our MMO server approach: shard just like other servers

Page 46: Engineering Scalable Social Games

Scaling MMO servers

• Keep all servers separate from each other• Minimize inter-server communication• All participants in a live event should be on

the same server• Scaling out means no direct sharing

• Sharing through third parties is okay

Page 47: Engineering Scalable Social Games

Problem #1:Load balancing • What if a server gets overloaded?

• Can’t move a live player to a different server

• Poor man’s load-balancing:• Players don’t pick servers; the LB does• If a server gets hot, remove it from the

LB pool• If enough servers get hot, add more server

instances and send new connections there

Page 48: Engineering Scalable Social Games

Problem #2: Deployment

Downtime = lost revenues• You can measure cost of

downtime in $/minute!

Page 49: Engineering Scalable Social Games

Problem #2: Deployment

Deployment comparison:• Web: just copy over PHP files• Socket servers – much harder

• How to deploy with zero downtime?

• Ideally: set up shadow new server and slowly transition players over

Page 50: Engineering Scalable Social Games

II D. Capacity planning

Page 51: Engineering Scalable Social Games

Capacity

We believein scaling out

• Demand could change rapidly• How to provision enough servers?

Different logistics thanscaling up

Page 52: Engineering Scalable Social Games

Capacity:colo vs. cloud

• Low or zero fixed cost• Higher variable cost• Little or no lead time on

additional capacity• Less control over ops• Limited choice of VMs• Virtualized storage• Can’t scale up! Only out.

CloudColo

• Higher fixed cost• Lower variable cost• Significant lead time on

additional capacity• More control over ops• Any chips you want• Any disks you want• Scale up or out

Page 53: Engineering Scalable Social Games

Operations

• Scaling out: a legion of servers

• You’ll need tools• App health – custom dashboard• Server monitoring – Munin• Automatic alerts – Nagios

Page 54: Engineering Scalable Social Games

Ops: Dashboard

Page 55: Engineering Scalable Social Games

Ops: Munin

Server stats from the entire farm

Page 56: Engineering Scalable Social Games

Ops: Nagios

• SMS alerts for your server stats• Connections drop to zero• Or CPU load exceeds 4• Or test account can’t connect, etc.

• Put alerts on business stats, too!• DAUs dropping below daily avg, etc.

Page 57: Engineering Scalable Social Games

Server failures

• Common in the cloud• Dropping off the net• Sometimes even shutting down

• Be defensive• Ops should be prepared• Devs should program defensively

Page 58: Engineering Scalable Social Games

The EndMany thanks to those who taught me a lot:

Virtual Worlds: Scott Dale, Justin WaldronCoaster: Biren Gandhi, Nathan Brown, Tatung Mei

http://robert.zubek.net/gdc2010

Page 59: Engineering Scalable Social Games