deep dive on amazon dynamodb

65
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Rick Houlihan, Principal Solutions Architect, AWS April 19, 2016 Deep Dive on DynamoDB

Upload: amazon-web-services

Post on 13-Apr-2017

770 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Deep Dive on Amazon DynamoDB

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Rick Houlihan, Principal Solutions Architect, AWS

April 19, 2016

Deep Dive on DynamoDB

Page 2: Deep Dive on Amazon DynamoDB

Agenda

• Why NoSQL?

• Scaling DynamoDB

• Design patterns and best practices

• Event driven applications and DynamoDB

Streams

Page 3: Deep Dive on Amazon DynamoDB

Timeline of database technologyD

ata

Pre

ssu

re

Page 4: Deep Dive on Amazon DynamoDB

Data volume since 2010

Data

Vo

lum

e

Historical Current

90% of stored data generated

in last 2 years

1 terabyte of data in 2010

equals 6.5 petabytes today

Linear correlation between

data pressure and technical

innovation

No reason these trends will

not continue over time

Page 5: Deep Dive on Amazon DynamoDB

Technology adoption and the hype curve

Page 6: Deep Dive on Amazon DynamoDB

Why NoSQL?

Optimized for storage Optimized for compute

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

Good for OLAP Built for OLTP at scale

SQL NoSQL

Page 7: Deep Dive on Amazon DynamoDB

Scaling DynamoDB

Page 8: Deep Dive on Amazon DynamoDB

DynamoDB scaling dimensions

Throughput

• Provision any amount of throughput to a table

Size

• Add any number of items to a table

• Maximum item size is 400 KB

• LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning

Page 9: Deep Dive on Amazon DynamoDB

Throughput

Provisioned at the table level

• Write capacity units (WCUs) are measured in 1 KB per second

• Read capacity units (RCUs) are measured in 4 KB per second

• RCUs measure strictly consistent reads

• Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

WCURCU

Page 10: Deep Dive on Amazon DynamoDB

Partitioning math

In the future, these details might change…

Number of partitions

By capacity (Total RCU / 3000) + (Total WCU / 1000)

By size Total Size / 10 GB

Total partitions CEILING(MAX (Capacity, Size))

Page 11: Deep Dive on Amazon DynamoDB

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500

RCUs per partition = 5000/3 = 1666.67WCUs per partition = 500/3 = 166.67Data/partition = 10/3 = 3.33 GB

RCUs and WCUs are uniformly

spread across partitions

Number of partitions

By capacity (5000 / 3000) + (500 / 1000) = 2.17

By size 8 / 10 = 0.8

Total partitions CEILING(MAX (2.17, 0.8)) = 3

Page 12: Deep Dive on Amazon DynamoDB

What causes throttling?

If sustained throughput goes beyond provisioned throughput per partition

Nonuniform workloads• Hot keys/hot partitions

• Very large bursts

Mixing hot data with cold data• Use a table per time period

From the example before:• Table created with 5000 RCUs, 500 WCUs

• RCUs per partition = 1666.67

• WCUs per partition = 166.67

• If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB might throttle requests

• Solution: Increase provisioned throughput

Page 13: Deep Dive on Amazon DynamoDB

What bad NoSQL looks like…

Part

itio

n

Time

Heat

Page 14: Deep Dive on Amazon DynamoDB

Getting the most out of DynamoDB throughput

“To get the most out of DynamoDB

throughput, create tables where

the hash key element has a large

number of distinct values, and

values are requested fairly

uniformly, as randomly as

possible.”

—DynamoDB Developer Guide

Space: Access is evenly

spread over the key-space

Time: Requests arrive evenly

spaced in time

Page 15: Deep Dive on Amazon DynamoDB

Much better picture…

Page 16: Deep Dive on Amazon DynamoDB

Design patterns and best practices

Page 17: Deep Dive on Amazon DynamoDB

Event logging

Storing time series data

Page 18: Deep Dive on Amazon DynamoDB

Time series tables

Events_table_2015_April

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N

Events_table_2015_March

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N

Events_table_2015_Feburary

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N

Events_table_2015_January

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N

RCUs = 1000

WCUs = 1

RCUs = 10000

WCUs = 10000

RCUs = 100

WCUs = 1

RCUs = 10

WCUs = 1

Current table

Older tables

Hot

data

Cold

data

Don’t mix hot and cold data; archive cold data to Amazon S3

Page 19: Deep Dive on Amazon DynamoDB

Use a table per time period

Pre-create daily, weekly, and monthly tables

Provision required throughput for current table

Writes go to the current table

Turn off (or reduce) throughput for older tables

Dealing with time series data

Page 20: Deep Dive on Amazon DynamoDB

Product catalog

Popular items (read)

Page 21: Deep Dive on Amazon DynamoDB

Partition 1

2000 RCUs

Partition K

2000 RCUs

Partition M

2000 RCUs

Partition 50

2000 RCU

Scaling bottlenecks

Product A Product B

Shoppers

ProductCatalog Table

SELECT Id, Description, ...

FROM ProductCatalog

WHERE Id="POPULAR_PRODUCT"

Page 22: Deep Dive on Amazon DynamoDB
Page 23: Deep Dive on Amazon DynamoDB

Partition 1 Partition 2

ProductCatalog Table

User

DynamoDB

User

Cache

popular items

SELECT Id, Description, ...

FROM ProductCatalog

WHERE Id="POPULAR_PRODUCT"

Page 24: Deep Dive on Amazon DynamoDB
Page 25: Deep Dive on Amazon DynamoDB

Messaging app

Large items

Filters vs. indexes

M:N Modeling—inbox and outbox

Page 26: Deep Dive on Amazon DynamoDB

Messages

table

Messages App

David

SELECT *

FROM Messages

WHERE Recipient='David'

LIMIT 50

ORDER BY Date DESC

Inbox

SELECT *

FROM Messages

WHERE Sender ='David'

LIMIT 50

ORDER BY Date DESC

Outbox

Page 27: Deep Dive on Amazon DynamoDB

Recipient Date Sender Message

David 2014-10-02 Bob …

… 48 more messages for David …

David 2014-10-03 Alice …

Alice 2014-09-28 Bob …

Alice 2014-10-01 Carol …

Large and small attributes mixed

(Many more messages)

David

Messages Table

50 items × 256 KB each

Partition key Sort key

Large message bodies

Attachments

SELECT *

FROM Messages

WHERE Recipient='David'

LIMIT 50

ORDER BY Date DESC

Inbox

Page 28: Deep Dive on Amazon DynamoDB

Computing inbox query cost

Items evaluated by query

Average item size

Conversion ratio

Eventually consistent reads

50 * 256 KB * (1 RCU / 4 KB) * (1 / 2) = 1600 RCU

Page 29: Deep Dive on Amazon DynamoDB

Recipient Date Sender Subject MsgId

David 2014-10-02 Bob Hi!… afed

David 2014-10-03 Alice RE: The… 3kf8

Alice 2014-09-28 Bob FW: Ok… 9d2b

Alice 2014-10-01 Carol Hi!... ct7r

Separate the bulk data

Inbox-GSI Messages table

MsgId Body

9d2b …

3kf8 …

ct7r …

afed …

David1. Query Inbox-GSI: 1 RCU

2. BatchGetItem Messages: 1600 RCU

(50 separate items at 256 KB)

(50 sequential items at 128 bytes)

Uniformly distributes large item reads

Page 30: Deep Dive on Amazon DynamoDB

Inbox GSI

Define which attributes to copy into the index

Page 31: Deep Dive on Amazon DynamoDB

Outbox Sender

Outbox GSI

SELECT *

FROM Messages

WHERE Sender ='David'

LIMIT 50

ORDER BY Date DESC

Page 32: Deep Dive on Amazon DynamoDB

Messaging app

Messages

table

David

Inbox

global secondary

index

Inbox

Outbox

global secondary

index

Outbox

Page 33: Deep Dive on Amazon DynamoDB

Reduce one-to-many item sizes

Configure secondary index projections

Use GSIs to model M:N relationship

between sender and recipient

Distribute large items

Querying many large items at once

InboxMessagesOutbox

Page 34: Deep Dive on Amazon DynamoDB

Multiplayer online gaming

Query filters vs.

composite key indexes

Page 35: Deep Dive on Amazon DynamoDB

GameId Date Host Opponent Status

d9bl3 2014-10-02 David Alice DONE

72f49 2014-09-30 Alice Bob PENDING

o2pnb 2014-10-08 Bob Carol IN_PROGRESS

b932s 2014-10-03 Carol Bob PENDING

ef9ca 2014-10-03 David Bob IN_PROGRESS

Games Table

Hierarchical data structures

Partition key

Page 36: Deep Dive on Amazon DynamoDB

Query for incoming game requests

DynamoDB indexes provide partition and sort

What about queries for two equalities and a sort?

SELECT * FROM Game

WHERE Opponent='Bob‘

AND Status=‘PENDING'

ORDER BY Date DESC

(hash)

(range)

(?)

Page 37: Deep Dive on Amazon DynamoDB

Secondary index

Opponent Date GameId Status Host

Alice 2014-10-02 d9bl3 DONE David

Carol 2014-10-08 o2pnb IN_PROGRESS Bob

Bob 2014-09-30 72f49 PENDING Alice

Bob 2014-10-03 b932s PENDING Carol

Bob 2014-10-03 ef9ca IN_PROGRESS David

Approach 1: Query filter

BobPartition key Sort key

Page 38: Deep Dive on Amazon DynamoDB

Secondary index

Approach 1: Query filter

Bob

Opponent Date GameId Status Host

Alice 2014-10-02 d9bl3 DONE David

Carol 2014-10-08 o2pnb IN_PROGRESS Bob

Bob 2014-09-30 72f49 PENDING Alice

Bob 2014-10-03 b932s PENDING Carol

Bob 2014-10-03 ef9ca IN_PROGRESS David

SELECT * FROM Game

WHERE Opponent='Bob'

ORDER BY Date DESC

FILTER ON Status='PENDING'

(filtered out)

Page 39: Deep Dive on Amazon DynamoDB

Needle in a haystack

Bob

Page 40: Deep Dive on Amazon DynamoDB

Send back less data “on the wire”

Simplify application code

Simple SQL-like expressions

• AND, OR, NOT, ()

Use query filter

Your index isn’t entirely selective

Page 41: Deep Dive on Amazon DynamoDB

Approach 2: Composite key

StatusDate

DONE_2014-10-02

IN_PROGRESS_2014-10-08

IN_PROGRESS_2014-10-03

PENDING_2014-09-30

PENDING_2014-10-03

Status

DONE

IN_PROGRESS

IN_PROGRESS

PENDING

PENDING

Date

2014-10-02

2014-10-08

2014-10-03

2014-10-03

2014-09-30

+ =

Page 42: Deep Dive on Amazon DynamoDB

Secondary index

Approach 2: Composite key

Opponent StatusDate GameId Host

Alice DONE_2014-10-02 d9bl3 David

Carol IN_PROGRESS_2014-10-08 o2pnb Bob

Bob IN_PROGRESS_2014-10-03 ef9ca David

Bob PENDING_2014-09-30 72f49 Alice

Bob PENDING_2014-10-03 b932s Carol

Partition key Sort key

Page 43: Deep Dive on Amazon DynamoDB

Opponent StatusDate GameId Host

Alice DONE_2014-10-02 d9bl3 David

Carol IN_PROGRESS_2014-10-08 o2pnb Bob

Bob IN_PROGRESS_2014-10-03 ef9ca David

Bob PENDING_2014-09-30 72f49 Alice

Bob PENDING_2014-10-03 b932s Carol

Secondary index

Approach 2: Composite key

Bob

SELECT * FROM Game

WHERE Opponent='Bob'

AND StatusDate BEGINS_WITH 'PENDING'

Page 44: Deep Dive on Amazon DynamoDB

Needle in a sorted haystack

Bob

Page 45: Deep Dive on Amazon DynamoDB

Sparse indexes

Id (Partition)

User Game Score Date Award

1 Bob G1 1300 2012-12-23

2 Bob G1 1450 2012-12-233 Jay G1 1600 2012-12-24

4 Mary G1 2000 2012-10-24 Champ5 Ryan G2 123 2012-03-10

6 Jones G2 345 2012-03-20

Game-scores-table

Award (Partition)

Id User Score

Champ 4 Mary 2000

Award-GSI

Scan sparse GSIs

Page 46: Deep Dive on Amazon DynamoDB

Real-time voting

Write-heavy items

Page 47: Deep Dive on Amazon DynamoDB

Partition 1

1000 WCUs

Partition K

1000 WCUs

Partition M

1000 WCUs

Partition N

1000 WCUs

Votes Table

Candidate A Candidate B

Scaling bottlenecks

Voters

Provision 200,000 WCUs

Page 48: Deep Dive on Amazon DynamoDB

Write sharding

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

Candidate A_6 Candidate A_8

Candidate A_5

Voter

Votes Table

Page 49: Deep Dive on Amazon DynamoDB

Write sharding

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

UpdateItem: “CandidateA_” + rand(0, 10)

ADD 1 to Votes

Candidate A_6 Candidate A_8

Candidate A_5

Voter

Votes Table

Page 50: Deep Dive on Amazon DynamoDB

Votes Table

Shard aggregation

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4

Candidate A_5

Candidate A_6 Candidate A_8

Candidate A_7 Candidate B_8

Periodic

Process

Candidate A

Total: 2.5M

1. Sum

2. Store Voter

Page 51: Deep Dive on Amazon DynamoDB

Trade off read cost for write scalability

Consider throughput per partition key

Shard write-heavy partition keys

Your write workload is not horizontally

scalable

Page 52: Deep Dive on Amazon DynamoDB

Concatenate attributes to form useful

secondary index keys

Take advantage of sparse indexes

Replace filter with indexes

You want to optimize a query as much

as possible

Status + Date

Page 53: Deep Dive on Amazon DynamoDB

Event driven applications and

DynamoDB Streams

Page 54: Deep Dive on Amazon DynamoDB

Stream of updates to a table

Asynchronous

Exactly once

Strictly ordered

• Per item

Highly durable

• Scale with table

24-hour lifetime

Subsecond latency

DynamoDB Streams

Page 55: Deep Dive on Amazon DynamoDB

View Type Destination

Old image—before update Name = John, Destination = Mars

New image—after update Name = John, Destination = Pluto

Old and new images Name = John, Destination = Mars

Name = John, Destination = Pluto

Keys only Name = John

View typesUpdateItem (Name = John, Destination = Pluto)

Page 56: Deep Dive on Amazon DynamoDB

Stream

Table

Partition 1

Partition 2

Partition 3

Partition 4

Partition 5

Table

Shard 1

Shard 2

Shard 3

Shard 4

KCL

worker

KCL

worker

KCL

worker

KCL

worker

Amazon Kinesis Client

Library application

DynamoDB

client application

Updates

DynamoDB Streams and

Amazon Kinesis Client Library

Page 57: Deep Dive on Amazon DynamoDB

DynamoDB Streams

Open-source cross-region

replication library

Asia Pacific (Sydney) EU (Ireland) replica

US East (N. Virginia)

Cross-region replication

Page 58: Deep Dive on Amazon DynamoDB

DynamoDB Streams and AWS Lambda

Page 59: Deep Dive on Amazon DynamoDB

Triggers

Lambda functionNotify change

Derivative tables

Amazon

CloudSearch

Amazon ElastiCache

Page 60: Deep Dive on Amazon DynamoDB

Analytics with

DynamoDB Streams

Collect and de-dupe data in DynamoDB

Aggregate data in-memory and flush periodically

Performing real-time aggregation and

analytics

Page 61: Deep Dive on Amazon DynamoDB

Architecture

Page 62: Deep Dive on Amazon DynamoDB

Reference architecture

Page 63: Deep Dive on Amazon DynamoDB

Elastic event driven applications

Page 64: Deep Dive on Amazon DynamoDB

Remember to complete

your evaluations!

Page 65: Deep Dive on Amazon DynamoDB