choosing the right database

109
CHOOSING THE RIGHT DATABASE David Simons @SwamWithTurtles

Upload: david-simons

Post on 22-Jan-2018

445 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Choosing the right database

C H O O S I N G T H E R I G H T D ATA B A S E

D a v i d S i m o n s @ S w a m W i t h Tu r t l e s

Page 2: Choosing the right database

C H O O S I N G T H E R I G H T D ATA B A S E

D a v i d S i m o n s @ S w a m W i t h Tu r t l e s

Page 3: Choosing the right database

W H O A M I ?

• Tech Lead/Consultant at Softwire

• Consultancy/Implementation

• Design & UX

• Training

Page 4: Choosing the right database

W H O A M I ?

• David Simons

• @SwamWithTurtles

• swamwithturtles.com

• Background in Statistics/Simulation

• Enjoys Databases and JavaScript

Page 5: Choosing the right database

M Y E X P E R I E N C E W I T H A P I S

• Focused on web development

• Utilise data from our own, or third party APIs to affect rendered data

Page 6: Choosing the right database

W H Y A R E W E H E R E ?B U T F I R S T … .

Page 7: Choosing the right database

W E T H I N K A P I S A R E I M P O R TA N T

W E ’ R E H E R E B E C A U S E

Page 8: Choosing the right database

W E W A N T T O K N O W H O W T O M A K E G O O D A P I S

W E ’ R E H E R E B E C A U S E

Page 9: Choosing the right database

W E W A N T T O B U C K T R E N D S ( S A F E LY )

W E ’ R E H E R E B E C A U S E

Page 10: Choosing the right database

A P I A R C H I T E C T U R E

D ATA B A S E C O D E Data

Page 11: Choosing the right database

A P I A R C H I T E C T U R E

D ATA B A S E C O D E Data

Page 12: Choosing the right database

[ O U R W AY ] W O R K S F O R U S !

B U T …

Page 13: Choosing the right database

N O S Q L S O L U T I O N S C A N ’ T [ V E R B ]

B U T …

Page 14: Choosing the right database

A ( P O L I T I C A L ) A S I D E

Page 15: Choosing the right database

T H E AV R E F E R E N D U M

Page 16: Choosing the right database

“ N O T O AV ”

Page 17: Choosing the right database

W E H AV E T O A S K O U R S E LV E S T H E R I G H T Q U E S T I O N S

W H Y A M I T E L L I N G Y O U T H I S ?

Page 18: Choosing the right database

C A N A N O S Q L A P P R O A C H D O E V E R Y T H I N G S Q L C A N ?

S O W E S H O U L D N ’ T A S K

Page 19: Choosing the right database

W H AT W O U L D W E W A N T O U R I D E A L D ATA B A S E T O L O O K L I K E ?

I N S T E A D W E S H O U L D A S K

Page 20: Choosing the right database

I W A N T T O D I S C U S S

• What sorts of questions should be asking about your data when architecting a system?

• What do the results of those questions mean for your database?

Page 21: Choosing the right database

H O W B I G D O E S I T N E E D T O B E ?

A S K Y O U R S E L F

Page 22: Choosing the right database

W I T H E N O U G H D ATA , Y O U H AV E T O D I S T R I B U T E

T H E H A R D T R U T H

Page 23: Choosing the right database

W H AT H A P P E N S W H E N O N E I N S TA N C E G O E S D O W N ?

B U T …

Page 24: Choosing the right database

– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E

“Reliability at massive scale is one of the biggest challenges we face at Amazon.com. Even the

slightest outage has significant financial consequences and impacts customer trust.”

Page 25: Choosing the right database

– D Y N A M O : A M A Z O N ’ S H I G H LY AVA I L A B L E K E Y- VA L U E S T O R E

“Dynamo targets applications that operate with weaker consistency if this results in high

availability.”

Page 26: Choosing the right database

CONS I STENCY

A

B C

Page 27: Choosing the right database

CONS I STENCY

A

B C

Page 28: Choosing the right database

D Y N A M O I M P L E M E N TAT I O N S

Page 29: Choosing the right database

N O T G U A R A N T E E D C O N S I S T E N C Y

T H E C O S T ?

Page 30: Choosing the right database
Page 31: Choosing the right database
Page 32: Choosing the right database
Page 33: Choosing the right database
Page 34: Choosing the right database
Page 35: Choosing the right database
Page 36: Choosing the right database

“open source software project that enables distributed processing of large data sets across clusters of

commodity servers”

Page 37: Choosing the right database

N O PA R T I T I O N T O L E R A N C E

T H E C O S T ?

Page 38: Choosing the right database

C A P T H E O R E M

• Choose Two:

• Consistency

• Availability

• Partition Tolerance

Page 39: Choosing the right database

W H AT I F W E D O N ’ T N E E D T O D I S T R I B U T E ?

B U T …

Page 40: Choosing the right database

W H AT S H A P E I S Y O U R D ATA ?

A S K Y O U R S E L F

Page 41: Choosing the right database

R D B M S ( R E L AT I O N A L D ATA B A S E M A N A G E M E N T S Y S T E M )

Page 42: Choosing the right database

E V E R Y R O W I S A “ T H I N G ”

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

Page 43: Choosing the right database

S E T- B A S E D O P E R AT I O N

R E A D D A TA O U T W I T H

Page 44: Choosing the right database

“ W H E R E ” ( I N T E R S E C T I O N )

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

Page 45: Choosing the right database

U N I O N S

Name Species

1 Puss

2 Dinah

3 Einstein

4 Jess

5 Nemo

6 Moby Dick

7 Wanda

Page 46: Choosing the right database

J O I N S

Name SpeciesSpecies Coolness

Rating

1 Puss 0

2 Dinah 0

3 Einstein 10

4 Jess 0

Page 47: Choosing the right database

C A R T E S I A N P R O D U C T S

0 10

0 10

0 10

Page 48: Choosing the right database

C A R T E S I A N P R O D U C T S

0 10

0 10

0 10

Page 49: Choosing the right database

W H AT S Q L D O E S W E L L

• Modelling objects:

• With a fixed structure and shape

• With a limited number of relations

• With no opinion or opinion of any deeper underlying domain

R D B M S ( R E L AT I O N A L D ATA B A S E M A N A G E M E N T S Y S T E M )

Page 50: Choosing the right database

T H E R E A R E P R O B L E M S T H I S I S B A D F O R

B U T …

Page 51: Choosing the right database

K E V I N B A C O NS I X D E G R E E S O F …

Page 52: Choosing the right database
Page 53: Choosing the right database
Page 54: Choosing the right database
Page 55: Choosing the right database
Page 56: Choosing the right database

T H E R E I S N O O P E N E L E C T I O N D ATA

T H E P R O B L E M

Page 57: Choosing the right database

E L E C T I O N D ATA

Page 58: Choosing the right database

E L E C T I O N D ATA

Page 59: Choosing the right database

E L E C T I O N D ATA

E = (e.g.) member of, held in,

stood in…

V = elections, constituencies,

years, politicians and parties

Page 60: Choosing the right database

W O R L D ’ S L E A D I N G G R A P H D B :

Page 61: Choosing the right database

"embedded, disk-based, fully transactional Java persistence engine that stores data structured in

graphs rather than in tables"

Page 62: Choosing the right database

D ATA S T O R A G E

Page 63: Choosing the right database

D ATA S T O R A G E

Page 64: Choosing the right database

D ATA S T O R A G E

• Nodes and edges are all:

• Stored as first-class objects on the file system

• “typed”

• Key-value stores

Page 65: Choosing the right database

D ATA I N T H E R E L AT I O N S

• “Joins” are first class objects in the database that can be queried at no additional cost

• Certain queries become trivial (e.g. Joins)

Page 66: Choosing the right database

P R O T O T Y P I N G

• Easy to see and work with data

• Schemaless

• Active community with a lot of libraries

Page 67: Choosing the right database

N E O 4 J U S E R S

Page 68: Choosing the right database

T I M E S E R I E S D ATA B A S E

Page 69: Choosing the right database
Page 70: Choosing the right database

Timestamp Value

2014-06-10T12:00:00+0100 17

2014-06-10T12:15:00+0100 17

2014-06-10T12:30:00+0100 20

2014-06-10T12:45:00+0100 22

2014-06-10T13:00:00+0100 24

2014-06-10T13:15:00+0100 28

2014-06-10T13:30:00+0100 32

Page 71: Choosing the right database
Page 72: Choosing the right database

SCHEMALESS

Page 73: Choosing the right database

S O C I A L M E D I A S I T E

Page 74: Choosing the right database

S O C I A L M E D I A S I T E

Page 75: Choosing the right database

E V E R Y R O W I S A “ T H I N G ”

N A M E = P U S S C O O L N E S S = 0

!

N A M E = J E S S C O O L N E S S = 0

!

N A M E = D I N A H C O O L N E S S = 0

!

N A M E = E I N S T E I N C O O L N E S S = 1 0

!

DOCUMENT

Page 76: Choosing the right database

Y O U C A N N O T D E N O R M A L I S E D ATA *

WA R N I N G

Page 77: Choosing the right database

H O W M U C H A R E Y O U W I L L I N G T O S P E N D ?

A S K Y O U R S E L F

Page 78: Choosing the right database

T H I S M AY S E E M L I K E A T R I V I A L P O I N T…

Page 79: Choosing the right database

C O S T S …

• Oracle: $50,000+

• SQL Server: $10,000+

Page 80: Choosing the right database

O P E N - S O U R C E F R A M E W O R K S

• MySQL

• PostgreSQL

• Riak

• Voldemort

• MariaDB

• Cassandra

• MongoDB

Page 81: Choosing the right database

H O W A R E Y O U R E T R I E V I N G T H E D ATA ?

A S K Y O U R S E L F

Page 82: Choosing the right database

T H I N G S H AV E T O U S E O U R D ATA …

R E M E M B E R T H A T

Page 83: Choosing the right database

A P I A R C H I T E C T U R E

D ATA B A S E C O D E Data

Page 84: Choosing the right database

A P I A R C H I T E C T U R E

D ATA B A S E C O D E Data E N D U S E R

Page 85: Choosing the right database

A P I A R C H I T E C T U R E

E N D U S E R What is the average age of …?

Page 86: Choosing the right database

A P I A R C H I T E C T U R E

E N D U S E REr….

I think it was something like “Campbell”?

Page 87: Choosing the right database

O U R C H O I C E I S I N F O R M E D B Y O U R P L A N S F O R T H E A P I

R E M E M B E R T H A T

Page 88: Choosing the right database
Page 89: Choosing the right database

G E O S PAT I A L I N D E X E S

Page 90: Choosing the right database
Page 91: Choosing the right database

L I N K E D M E D I A F R A M E W O R K

Page 92: Choosing the right database

A PA C H E M A R M O T TA

O U T O F T H E B O X …

Page 93: Choosing the right database

T R I P L E VA L U E S T O R E

• Video A contains Alice McSkaterton

• Alice McSkaterton is married to Brock Windsurferling

• Article B contains Brock Windsurferling

Page 94: Choosing the right database
Page 95: Choosing the right database

D O C U M E N T S T O R E

S E A R C H A P I S

Page 96: Choosing the right database

D O C U M E N T S T O R E

Page 97: Choosing the right database

E V E R Y R O W I S A “ T H I N G ”

N A M E = P U S S C O O L N E S S = 0

!

N A M E = J E S S C O O L N E S S = 0

!

N A M E = D I N A H C O O L N E S S = 0

!

N A M E = E I N S T E I N C O O L N E S S = 1 0

!

DOCUMENT

Page 98: Choosing the right database

A PA C H E L U C E N E

Page 99: Choosing the right database

“Apache Lucene is a high-performance, full-featured text search engine library … It is a

technology suitable for nearly any application that requires full-text search”

Page 100: Choosing the right database

F O C U S E D A R O U N D T E X T S E A R C H I N G Q U E R I E S

Page 101: Choosing the right database

{ "query": { "match": {"hobbies": "skateboard"} } }

Page 102: Choosing the right database

{ "query": { {"fuzzy": {"hobbies": “skateboarig"}} } }

Page 103: Choosing the right database

{ "query": { {"match": {"hobbies": {"query": "writing reddit comments", "type": "phrase"}}} } }

Page 104: Choosing the right database

T O S U M U P…

Page 105: Choosing the right database

I S Y O U R D ATA B I G ?

• You’ll need to distribute it - which means you have to choose one of CAP to sacrifice

• Hadoop will provide consistent results, but isn’t built to fail

• Cassandra and Riak will be highly available but slow to propagate new data

Page 106: Choosing the right database

I F I T ’ S N O T B I G …

• You can store the data in a high-fidelity, way:

• Denormalised - SQL

• As a graph with Neo4j

• Choosing an unsuitable method is costly!

Page 107: Choosing the right database

H O W D O Y O U W A N T I T B A C K ?

• Some databases have custom indexing which we are optimised for some queries

• Geospatial Indexes

• In-text Indexing

• Time Series

Page 108: Choosing the right database

S O …

• Your requirements will be unique, so don’t lump with the industry standards

• There are a range of modern DB solutions that are no longer bleeding-edge

• Consider them on their own merits!

Page 109: Choosing the right database

A N Y Q U E S T I O N S ?D a v i d S i m o n s

@ S w a m W i t h Tu r t l e s