Transcript
Page 1: Spil Storage Platform (Erlang) @ EUG-NL

SSP : Spil Storage Platform Thijs Terlouw – Senior Backend Engineer

12th July 2012

Page 2: Spil Storage Platform (Erlang) @ EUG-NL

2

1. Background • Problems • Wish list

2. Solution 3. Challenges 4. Performance 5. Lessons learned

Schedule

Page 3: Spil Storage Platform (Erlang) @ EUG-NL

Mission Spil Games: “ unite the world in play “ • localized social-gaming platforms • focus on : teens, girls and family • many portals:

• girlsgogames.com • agame.com

Background

3

Page 4: Spil Storage Platform (Erlang) @ EUG-NL

4

Page 5: Spil Storage Platform (Erlang) @ EUG-NL

• Over 200 countries, 15+ different languages

• On average 85 minutes per month per user

• Over 4000 online games

• 200 million unique users per month

Background

5

Page 6: Spil Storage Platform (Erlang) @ EUG-NL

• Traditional LAMP stack • Tweaked over time to keep up with growth • Reaching limits of current system • One of largest problems is the database

Background

6

Page 7: Spil Storage Platform (Erlang) @ EUG-NL

• Not all developers are DB experts • security • performance • caching

• Changing requirements • Difficult to shard the databases

Problems: the database

7

Page 8: Spil Storage Platform (Erlang) @ EUG-NL

1. Transparent scalability • Sharding data • Scalable applications on top of sharded data

2. Multi-database transactions • atomic operations across machines

3. Fast enough (low-ish latency, high throughput) 4. Highly available (central system) 5. Can handle large dataset 6. Offer flexibility (trade consistency for speed for instance) 7. Use MySQL (experience in-house DB-team) 8. Don’t expose SQL to devs, offer business-specific model

• Storage specific security measures (character escaping) 9. Allow changes to storage layer without affecting business (versioning) 10. Centralize ownership of caching

Wish list

8

Page 9: Spil Storage Platform (Erlang) @ EUG-NL

9

1. Background 2. Solution 3. Challenges 4. Performance 5. Lessons learned

Schedule

Page 10: Spil Storage Platform (Erlang) @ EUG-NL

• No matching Open Source projects • So we want a massively scalable, soft real-time,

highly available system • Implement it ourselves: Erlang obvious candidate

Not the first to think of this: • Amazon SimpleDB • Riak

• Use Open Source where possible

Solution

10

Page 11: Spil Storage Platform (Erlang) @ EUG-NL

1. Our system should be always on 2. No global locks 3. Inconsistencies are the norm

• Hardware breaks down (power failures etc) • Version mismatches (upgrading system non atomic) • State mismatches (adding new machine)

Solution : mindset

11

Page 12: Spil Storage Platform (Erlang) @ EUG-NL

SSP: Spil Storage Platform

12

Buckets:

Erlang

Bucket

Page 13: Spil Storage Platform (Erlang) @ EUG-NL

• Bucket is a list of records of a specific type. Structured data! A bucket can map to one or several MySQL database tables and offers a CRUD-like interface (with filters)

• All data is identified by a unique GID (64 bit integer) • All requests for a particular GID are handled by one

Pipeline process (sequentially)

SSP : Overview

13

Page 14: Spil Storage Platform (Erlang) @ EUG-NL

14

SSP Overview

Page 15: Spil Storage Platform (Erlang) @ EUG-NL

• Why do we need Pipelines? • Sequential = bottleneck !?! • Don’t you guys know Erlang is about PARALLELIZING work?

SSP: Pipeline

15

Page 16: Spil Storage Platform (Erlang) @ EUG-NL

• Drawbacks: • For hotspots (game with a gazillion ) sequential (read)

access is bad indeed • Optimization: allow dirty read (try local cache first , outside

pipeline), other solutions possible.

• Advantages: • Facilitates scalability (no global locks, but per bucket/GID sync) • Pipelines make multi-database consistency easier

Requests to most GIDs (users) are evenly distributed

SSP: Pipeline

16

Page 17: Spil Storage Platform (Erlang) @ EUG-NL

SSP: Finding the Pipeline

17

{bucket, phash2(Gid, Ringsize)}

Page 18: Spil Storage Platform (Erlang) @ EUG-NL

• Each bucket is an OTP application • Buckets are largely generated • XML -> SQL + PIQI -> Erlang

– Using XSLT – Piqic

SSP: Bucket

19

Page 19: Spil Storage Platform (Erlang) @ EUG-NL

• PIQI is • data definition language • cross-language data serialization system

compatible with Protocol Buffers • Piqi-RPC — an RPC-over-HTTP system for Erlang

• Would be better if transport was pluggable

• http://piqi.org/

Piqi?

20

Page 20: Spil Storage Platform (Erlang) @ EUG-NL

SSP: Example Bucket XML definition

21

Page 21: Spil Storage Platform (Erlang) @ EUG-NL

gidlog.piqi

22

Mostly templated via xslt

Page 22: Spil Storage Platform (Erlang) @ EUG-NL

gidlog_accessors.hrl

23

Parse piqi generated hrl: epp:parse_file/3 mostly template added as dep

Page 23: Spil Storage Platform (Erlang) @ EUG-NL

• bucketX.erl – include_lib(“…/bucketX_accessors.hrl”) – verify_record(R) – start/0 and start_link/0 – init/1 – get_fun(Version), del_fun(V), insert_fun(V),…

• bucketX_v1.erl – del, insert, … (Gid, Shard, Filters) – get mysql pool – build some SQL – emysql:execute(Poolname, Sql)

SSP: bucket implementation

24

Page 24: Spil Storage Platform (Erlang) @ EUG-NL

1. A bucket is versioned. The interface of a bucket is stable, but implementation can vary

2. We can go up or down a version, migration is automatic • Mirror-mode is introduced so we can write to multiple

versions (but read from only one version)

SSP: Versions

25

Page 25: Spil Storage Platform (Erlang) @ EUG-NL

1. GIDs (eg users) are sharded automatically. • Each version might have multiple shards

2. Redundancy (of data) is handled by MySQL

{bucket, GID} -> {Version, Shard} mapping

• Version default: config • Shard default: default rule GID % shards • Actual version/shard per GID stored in DB (cached)

SSP: Shards (storage level)

26

Page 26: Spil Storage Platform (Erlang) @ EUG-NL

• Each node has a private Memcached instance • We store all data for a GID/bucket in this cache

• Filters applied after retrieving data from cache

• Don t change data in storage outside of the SSP!

SSP: Cache

27

Page 27: Spil Storage Platform (Erlang) @ EUG-NL

28

1. Background 2. Solution 3. Challenges 4. Performance 5. Lessons learned

Schedule

Page 28: Spil Storage Platform (Erlang) @ EUG-NL

Challenge: controlled shutdown node

29

Page 29: Spil Storage Platform (Erlang) @ EUG-NL

How do we shutdown a node without losing jobs? • Shutdown bucketX application on a node

• stop pipeline factories on this node (for bucketX) • hand over work to other PF (on other nodes)

– couple of mnesia ring reads – move ETS table contents to new PF – remember which PF took over (so we can forward)

• If we go to another node, clone Pipeline (gen2 pri) • remove this node from the lookup ring • all PFs fix their hash range based on ring

• Because there is a race condition handing over many to one (non-continuous blocks) PF

• Sleep a while (actually wait for pipeline handovers)

Challenge: controlled shutdown node

30

Page 30: Spil Storage Platform (Erlang) @ EUG-NL

• if you terminate an application, all processes that were started (even if not linked) are terminated!

• bit hidden in documentation of application:start/2 and stop/1

• so we need to explicitly set the group_leader to something that never shuts down:

init(#state{} = S ) -> group_leader(whereis(init), self()), {ok, S}.

Note: shutdown application

31

Page 31: Spil Storage Platform (Erlang) @ EUG-NL

• The Pipeline process that we spawn per Gid needs to shutdown when done (less memory)

• When is it actually done? • Work might be assigned to the Pipeline just when

the Pipeline decides it is done: race conditions!

Challenge: shutdown pipeline

32

Page 32: Spil Storage Platform (Erlang) @ EUG-NL

• All requests for a GID are handled by a single Pipeline Factory

• The pipeline will issue a ‘work done’ command to the PF with a ‘CommandCounter’

• PF maintains an ETS table • Lookup if the registered CommandCounter for

that GID is the same as the reported number • If so: tell the Pipeline to die

Challenge: shutdown pipeline (2)

33

Page 33: Spil Storage Platform (Erlang) @ EUG-NL

• We want continuous usage of SSP – Even while upgrading bucket versions – So there can be multiple versions running

simultaneously

• Take care of creating closures • Atomic behavior per GID

Challenge: high uptime

34

Page 34: Spil Storage Platform (Erlang) @ EUG-NL

Challenge: quite complex system

35

Page 35: Spil Storage Platform (Erlang) @ EUG-NL

36

1. Background 2. Solution 3. Challenges 4. Performance 5. Lessons learned

Schedule

Page 36: Spil Storage Platform (Erlang) @ EUG-NL

• Currently we run SSP in shadow mode, so no real data yet. Making realistic benchmarks is quite a lot of work.

• Latency (local machine): – 6-26ms to do a GET request on a primary key (cache miss) – 0.6ms with a cache hit – Cache stores Erlang terms currently (term_to_binary)

• Always read from cache – Does not detect changes in storage done outside SSP

Performance

37

Page 37: Spil Storage Platform (Erlang) @ EUG-NL

• Requests (local): – Getting from cache at about 13.5K req/sec

• elibs_benchmark:test_fun(gidlog_get, fun() -> gidlog:get(123456) end, 10, 10000).

– Getting from mysql about 615 req/sec incl cache miss • elibs_benchmark:test_fun(gidlog_get, fun() -> {_,_,C} =

os:timestamp(), gidlog:get(C) end, 10, 100). – ~2 SSP machines can saturate a MySQL machine – 8K writes/sec for 2 MySQL + 4 SSP machines (old

hardware)

Performance

38

Page 38: Spil Storage Platform (Erlang) @ EUG-NL

39

1. Background 2. Solution 3. Challenges 4. Performance 5. Lessons learned

Schedule

Page 39: Spil Storage Platform (Erlang) @ EUG-NL

• There are many good Open Source libraries • Emysql : we have added transaction support • Eep0018 : fast json encoder/decoder (yajl c++) • Estatsd : graphite-capable monitoring • Poolboy : Erlang worker pool factory (for

memcached) • Twig/Lager : logging (syslog)

Lessons learned (1)

40

Page 40: Spil Storage Platform (Erlang) @ EUG-NL

• Mnesia is great to replicate state across machines • Faster local lookups • Less error prone

• Encapsulate all Mnesia usage in a module • Adding nodes to Mnesia • Use ram_copies • Transactions are great

• We deploy an Erlang cluster (with Mnesia replication) only inside a single DataCenter • Not across unreliable connections!

Lessons learned (2)

41

Page 41: Spil Storage Platform (Erlang) @ EUG-NL

• XML + XSD + XSLT are great to define API • They might have a bad name, but work great • Can transform in any other format • Used to generate documentation

Todo: • generate more code (Buckets) • write gen_bucket behaviour • don t start with generating code

Lessons learned (3)

42

Page 42: Spil Storage Platform (Erlang) @ EUG-NL

• Rebar is great • Compilation is pretty convenient, but the best part

are the “dependencies” • Also the worst part

• We have proposed two improvements: • Allow different projects to share dependencies

(major speedup for compiling) • Smarter version conflict resolution (semantic

versioning: [ “>= 1.3.1”, “< 2.0.0” ] )

Lessons learned (4)

43

Page 43: Spil Storage Platform (Erlang) @ EUG-NL

• We use #records{} for all APIs – Piqi input/output – Stable and well-defined – Will move to ProtocolBuffers

• Use OTP applications everywhere – Start/stop stuff – See started apps: application:which_applications()

• Terminate on fatal errors – Memcached down : terminate all buckets, don t

try to recover (prevent overload DB)

Lessons learned (5)

44

Page 44: Spil Storage Platform (Erlang) @ EUG-NL

• You need to add admin/monitoring interface

Lessons learned (6)

45

Page 45: Spil Storage Platform (Erlang) @ EUG-NL

We will not open-source SSP, but we do actively

contribute to libraries used in SSP (so far Emysql, Rebar, Piqi)

Open Source

46

Page 46: Spil Storage Platform (Erlang) @ EUG-NL

THANKS!

Questions? [email protected]

47


Top Related