when to nosql and when to know sql

55
Simon Elliston Ball Head of Big Data @sireb When to NoSQL and When to Know SQL #noSQLknowSQL http://nosqlknowsql.io

Upload: simon-elliston-ball

Post on 05-Dec-2014

1.546 views

Category:

Technology


0 download

DESCRIPTION

A quick guide and overview for a range of NoSQL technologies, as delivered at Code PaLOUsa 2014

TRANSCRIPT

Page 1: When to NoSQL and when to know SQL

Simon Elliston Ball Head of Big Data !

@sireb !

!

!

!

When to NoSQL and When !to Know SQL

#noSQLknowSQL

http://nosqlknowsql.io

Page 2: When to NoSQL and when to know SQL

Not only SQL

SQL

what is NoSQL?

Many many things

NoSQL

No, SQL

Page 3: When to NoSQL and when to know SQL

files

before SQL

multi-value

ur… hash maps?

Page 4: When to NoSQL and when to know SQL

everything is relational

after SQL

ORMs fill in the other data structures

scale up rules

data first design

Page 5: When to NoSQL and when to know SQL

datastores that suit applications

and now NoSQL

polyglot persistence: the right tools

scale out rules

APIs not EDWs

Page 6: When to NoSQL and when to know SQL

data growth

why should you care?

machine learning

social

rapid development

fewer migration headaches… maybe

Page 7: When to NoSQL and when to know SQL

big bucks.

9. Salary figures are for US respondents only.

number of tools. Median base salary is constant at $100k for thoseusing up to 10 tools, but increases with new tools after that.9

Given the two patterns we have just examined—the relationships be‐tween cluster tools and respondents’ overall tool counts, and betweentool counts and salary—it should not be surprising that there is a sig‐nificant difference in how each cluster correlates with salary. Usingmore tools from the Hadoop cluster correlates positively with salary,while using more tools from the SQL/Excel cluster correlates (slightly)negatively with salary.

12 | 2013 Data Science Salary Survey

O’Reilly 2013 Data Science Salary Survey

Page 8: When to NoSQL and when to know SQL

So many NoSQLs…

Page 9: When to NoSQL and when to know SQL
Page 10: When to NoSQL and when to know SQL

document databases

Page 11: When to NoSQL and when to know SQL

document databases

known access pattern

JSON docs

complex, variable models

rapid development

Page 12: When to NoSQL and when to know SQL

document databases

JUST DON’Tjoins?

learn a very new query language

denormalize

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

document form

Page 13: When to NoSQL and when to know SQL

document vs SQLwhat can SQL do?

query all the angles

sure, you can use blobs…

… but you can’t get into them

Page 14: When to NoSQL and when to know SQL

documents in SQLSQL xml fields

mapping xquery paths is painful

native JSON

Page 15: When to NoSQL and when to know SQL

class of database database

query everything: search

full-text indexing

Page 16: When to NoSQL and when to know SQL

range query

you know google, right…

span query

keyword query

Page 17: When to NoSQL and when to know SQL

scores

you know the score"query": { "function_score": { "query": { "match": { "title": "NoSQL"} }, "functions": [ "boost": 1, "gauss": { "timestamp": { "scale": "4w" } }, "script_score" : { "script" : "_score * doc['important_document'].value ? 2 : 1" } ], "score_mode": "sum" } }

Page 18: When to NoSQL and when to know SQL

scores

SQL knows the score toodeclare @origin float = 0; declare @delay_weeks float = 4; !select top 10 * from ( select title, score * case when p.important = 1 then 2.0 when p.important = 0 then 1.0 end * exp(-power(timestamp-@origin,2)/(2*@delay*7*24*3600)) + 1 as score from posts p where title like '%NoSQL%' ) as found order by score

Page 19: When to NoSQL and when to know SQL

more like this: instant tf-idf

you know google, right…

{ "more_like_this" : { "fields" : ["name.first", "name.last"], "like_text" : "text like this one", "min_term_freq" : 1, "max_query_terms" : 12 } }

Page 20: When to NoSQL and when to know SQL

Facets

Page 21: When to NoSQL and when to know SQL

Facets

Page 22: When to NoSQL and when to know SQL

FacetsSQL:

x lots

SELECT a.name, count(p.id) FROM people p JOIN industry a on a.id = p.industry_id JOIN people_keywords pk on pk.person_id = p.id JOIN keywords k on k.id = pk.keyword_id WHERE CONTAINS(p.description, 'NoSQL') OR k.name = 'NoSQL' ... GROUP BY a.name

SELECT a.name, count(p.id) FROM people p JOIN area a on a.id = p.area_id JOIN people_keywords pk on pk.person_id = p.id JOIN keywords k on k.id = pk.keyword_id WHERE CONTAINS(p.description, 'NoSQL') OR k.name = 'NoSQL' ... GROUP BY a.name

Page 23: When to NoSQL and when to know SQL

FacetsElastic search:{ "query": { “query_string": { “default_field”: “content”, “query”: “keywords” } }, "facets": { “myTerms": { "terms": { "field" : "lang", "all_terms" : true } } } }

Page 24: When to NoSQL and when to know SQL

untyped free-text documents

logs

timestamped

semi-structured

discovery

aggregation and statistics

Page 25: When to NoSQL and when to know SQL

close to your programming model

key: value

distributed map | list | set

keys can be objects

Page 26: When to NoSQL and when to know SQL

hash types

SQL extensions

hstore

Page 27: When to NoSQL and when to know SQL

inheritance

SQL and polymorphism

ORMs hide the horror

Page 28: When to NoSQL and when to know SQL

columnar databases

turning round the rows

physical layout matters

Page 29: When to NoSQL and when to know SQL

turning round the rowskey! value type

1 A Home2 B Work3 C Work4 D Work

00001 1 A Home 00002 2 B Work 00003 3 C Work …

Row storage

A B C D Home Work Work Work …

Column storage

Page 30: When to NoSQL and when to know SQL

teaching an old SQL new tricksMySQL InfoBright

SQL Server Columnar IndexesCREATE NONCLUSTERED COLUMNSTORE INDEX idx_col ON Orders (OrderDate, DueDate, ShipDate)

Great for your data warehouse, but no use for OLTP

Page 31: When to NoSQL and when to know SQL

millions of columns

eventually consistent

CQL

http://cassandra.apache.org/

http://www.datastax.com/

set | list | map types

Page 32: When to NoSQL and when to know SQL

Parquet

ORC files

column for hadoop and other animalshttp://parquet.io

Page 33: When to NoSQL and when to know SQL

SQL: so many views, so much confusion

cell level security

accumulo https://accumulo.apache.org/

Page 34: When to NoSQL and when to know SQL

Tim

e se

ries

Page 35: When to NoSQL and when to know SQL

timeretrieving time series and graphs

window functionsSELECT business_date, ticker, close, close / LAG(close,1) OVER (PARTITION BY ticker ORDER BY business_date ASC) - 1 AS ret FROM sp500

Page 36: When to NoSQL and when to know SQL

Queues

Page 37: When to NoSQL and when to know SQL

CREATE procedure [dbo].[Dequeue] AS !set nocount on !declare @BatchSize int set @BatchSize = 10 !declare @Batch table (QueueID int, QueueDateTime datetime, Title nvarchar(255)) !begin tran !insert into @Batch select Top (@BatchSize) QueueID, QueueDateTime, Title from QueueMeta WITH (UPDLOCK, HOLDLOCK) where Status = 0 order by QueueDateTime ASC !declare @ItemsToUpdate int set @ItemsToUpdate = @@ROWCOUNT !update QueueMeta SET Status = 1 WHERE QueueID IN (select QueueID from @Batch) AND Status = 0 !if @@ROWCOUNT = @ItemsToUpdate begin commit tran select b.*, q.TextData from @Batch b inner join QueueData q on q.QueueID = b.QueueID print 'SUCCESS' end else begin rollback tran print 'FAILED' end

queues in SQL

Page 38: When to NoSQL and when to know SQL

index fragmentation is a problem

queues in SQL

but built in logs of a sort

Page 39: When to NoSQL and when to know SQL

specialised apis

message queues

capabilities like fan-out

routing

acknowledgement

Page 40: When to NoSQL and when to know SQL

relationships countGraph databases

Page 41: When to NoSQL and when to know SQL

relationships counttrees and hierarchies

overloaded relationships

fancy algorithms

Page 42: When to NoSQL and when to know SQL

hierarchies with SQLadjacency lists

nested sets (MPTT)

materialised path

CONSTRAIN fk_parent_id_id FOREIGN KEY parent_id REFERENCES some_table.id

path = 1.2.23.55.786.33425

Node Left Right DepthA 1 22 1B 2 9 2C 10 21 2D 3 4 3E 5 6 3F 7 8 3G 11 18 3H 19 20 3I 12 13 4J 14 15 4K 16 17 4

Page 43: When to NoSQL and when to know SQL

graph demo

Page 44: When to NoSQL and when to know SQL

Velocity

https://www.flickr.com/photos/jiteshjagadish

Page 45: When to NoSQL and when to know SQL

Don’t get ACID on the cuts

when locks attack…

Page 46: When to NoSQL and when to know SQL

Big Data

Page 47: When to NoSQL and when to know SQL

SQL on Hadoop

Page 48: When to NoSQL and when to know SQL

System issues, !Speed issues,!Soft issues

Page 49: When to NoSQL and when to know SQL

Atomic

the ACID, BASE litmus

Consistent

Isolated

Durable

Basically Available

Soft-state

Eventually consistent

what matters to you?

Page 50: When to NoSQL and when to know SQL

CAP it all

Consistency

AvailabilityPartition

Page 51: When to NoSQL and when to know SQL

most noSQL scales well

is it web scale?

but clusters still need management

are you facebook? one machine is easier than n

Page 52: When to NoSQL and when to know SQL

SQL writes cost a lot

write fast, ask questions later

mainly write workload: NoSQL

low latency write workload: NoSQL

Page 53: When to NoSQL and when to know SQL

analysts: they want SQL

who is going to use it?

developers: they want applications

data scientists: the want access

Page 54: When to NoSQL and when to know SQL

choose the !right tool

Photo: http://www.homespothq.com/

Page 55: When to NoSQL and when to know SQL

QuestionsSimon Elliston Ball [email protected] !

@sireb

http://bit.ly/knowSQLnoSQL

#noSQLknowSQL