the art of database sharding maxym kharchenko amazon.com

52
The Art of Database Sharding Maxym Kharchenko Amazon.com

Upload: felicity-barker

Post on 13-Dec-2015

233 views

Category:

Documents


2 download

TRANSCRIPT

The Artof Database Sharding

Maxym Kharchenko

Amazon.com

Whoami

• Started as a database kernel developer– Network database: db_VISTA

• ORACLE DBA for ~ 10-12 years– Starting with ORACLE 8

• Last 3 years: Sr. Persistence Engineer @Amazon.com

• OCM, ORACLE Ace Associate

• Blog: http://intermediatesql.com• Twitter: @maxymkh

Agenda

• The “big data” scaling problem

• Solving scaling with “sharding”

• Practical sharding

• Your sharding experience: Good and bad

How to scale a database

Old System

New SystemProblem

2013 2014 2015 2016 2017

The Big Data problem

Vertical Scaling

Scaling Up …

Scaling Up …

Scaled!

“Scaling up” math:System capabilities

2+2=3

“Scaling up” math:System cost

2+2=7

Scale out, not up

Use lots of cheap machines

Not bigger machines

Commodity hardware

=

$$$$$ $$

Distributed System

Distributed System

Distributed System

Distributed computing is hard

Shared Nothing (“Sharded”) System

Sharding is (relatively) easy

Split your datainto small independent chunks

And run each chunkon cheap commodity hardware

How to split your data

Data

How to split your data

How to split your data

How to split your data

How to split your data

Vertical Partitioning

Vertical Partitioning

Vertical Partitioning

Horizontal Partitioning

Sharding

Sharding

CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200));

CREATE TABLE books ( id number PRIMARY KEY, title varchar2(200), author varchar2(200)

) SHARD BY <method> (<shard_key>) ( SPLIT SIZE evenly SPLIT LOAD evenly DISCOURAGE CROSS SHARD ACCESS DISCOURAGE DATA MOVE USING 4 DATABASES);

Sharding

Split size evenly

SHARD BY LIST ( first_letter(author) ) ( SPLIT SIZE evenly);

A-G H-M N-TU-Z

Split load evenly

SHARD BY RANGE (id) ( SPLIT SIZE evenly SPLIT LOAD evenly);

1-100 101-200 201-300 301-400

Split load evenly

SHARD BY HASH (id) ( SPLIT SIZE evenly SPLIT LOAD evenly);

0 1 2 3

Discourage cross shard access

SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS);

SELECT title FROM booksWHERE id = 34567876;

Discourage cross shard access

SHARD BY HASH (id) ( DISCOURAGE CROSS SHARD ACCESS);

SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;

Discourage cross shard access

SHARD BY HASH (author) ( DISCOURAGE CROSS SHARD ACCESS);

0 1 2 3

SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;

Discourage data move

SHARD BY mod(hash(author), 4) ( DISCOURAGE DATA MOVE);

0 1 2 3

Discourage data move

SHARD BY mod(hash_function(author), 6) ( DISCOURAGE DATA MOVE);

0 1 2 3

4 5

ReshardingHash Mod/4

1 12 23 34 05 16 27 38 09 110 211 312 0

Hash Mod/4 Mod/61 1 12 2 23 3 34 0 45 1 56 2 07 3 18 0 29 1 310 2 411 3 512 0 0

Physical and Logical shards

SHARD BY mod(hash(author), 1200) ( DISCOURAGE DATA MOVE);

DB 1 DB 2 DB 3 DB 4

Executing queriesdef shard_query(sql, binds, shard_key): """ Execute query in the correct db """

shard_hash = hash(shard_key) logical_bucket = mod(shard_hash, TOTAL_BUCKETS) physical_db = memcached_get_db(logical_bucket) execute_query(physical_db, sql, binds)

SELECT title FROM booksWHERE author = 'Isaac Asimov'ORDER BY title;

Implementing Shards: Standbys

Unsharded StandbyShard 1 Shard 2

Apps

Read Only

Drop non-qualifying data Drop non-qualifying data

Implementing Shards: Tables

Shard1

Apps

TabA

Shard 2

MVA

TabA

Create materialized view … as select …from a@shard1

Dropmaterialized view … preserve table

Read Only

Implementing Shards:Moving “data head”

Shard 1

Apps

Shard 2

Logical Shard

Physical Shard

(1,2,3,4) 1(5,6,7,8) 2

Time Logical Shard

Physical Shard

2011(1,2,3,4) 12011(5,6,7,8) 2

Time Logical Shard

Physical Shard

2011(1,2,3,4) 12011(5,6,7,8) 22012(1,2) 12012(3,4) 32012(5,6) 22012(7,8) 4

Shard 3 Shard 4

Data protection

Shard 1 Shard 2 Shard 4Shard 3

Stb 1 Stb 2 Stb 4Stb 3

App App

Why shards are awesome

• (potentially) Unlimited scaling

• Local ACID + relational

• Better maintenance

• Eggs not in one basket

• “Apples to apples comparison” with other shards

Why shards are NOT so great

• More systems– Power, rack space etc– Needs automation … bad– More likely to fail overall

• Some operations become difficult:– Transactions across shards– Foreign keys across shards

• More work:– Applications, developers, DBAs– High skill, DIY everything

Takeaways

More > Bigger

ORACLE is still cool

Thank you!

[email protected]: maxymkh@Blog: http://intermediatesql.com