introduction to nosql - chewbii.comchewbii.com/wp-content/uploads/2017/04/nosql_intro_en.pdf ·...

19
Introduction to NoSQL N. Travers CEDRIC Lab - Vertigo Introduction to NoSQL Nicolas Travers CNAM – France Introduction to NoSQL N. Travers CEDRIC Lab - Vertigo Schedule & Organization Introduction to NoSQL databases 3V, ACID vs BASE, families, CAP theorem, JSon Presentation of MongoDB Language, distribution, replication, application Practice Works on MongoDB Queries : find + aggregate

Upload: others

Post on 20-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Introduction to NoSQL

Nicolas TraversCNAM – France

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Schedule&Organization• IntroductiontoNoSQL databases▫ 3V,ACIDvsBASE,families,CAPtheorem,JSon• PresentationofMongoDB▫ Language,distribution,replication,application• PracticeWorks onMongoDB▫ Queries:find+aggregate

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

DBMSvsNoSQL

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Context• ApplicationsandWebplatforms▫ ExponentialgrowthoftheamountofData(x2/2years)▫ Unprecedent managementofthisvolume

� Needtodistributebothcomputationanddata� Hugenumberofservers� Heterogeneousdata,maybecomplexandoftenlinked

• Ex:▫ Google,Amazon,Facebook▫ GoogleDataCenter :

� 5000servers/datacenter,~1Mdeservers▫ Facebook:

� 1PetaBytes ofdata

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo 8

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

BI:Traditional methods

9

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Decisional vs3VIncompatibleclassical approach with the3V ofBigData :

• Volume:Designed tostoreGB/TBofdata,butneedsPB(maybe EB).

• Variety:Heterogeneous andvariabletypesofdata,text,semi-structured

• Velocity:Dataareproduced moreandmorequickly

10

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

DBMS:Limitations• Standarddatabases▫ Functionalities� Joinsbetweentables� Complexqueries� Strongcoherencyofdata

ØRequirementsinadistributedcontext▫ Linksbetweenentities=>sameserver▫ ++links=>difficultiesfordataorganization

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

DBMS:Limitations(2)• ACID propertiesfortransactions▫ Setofoperations▫ Atomicity(integralcompletionornone)▫ Consistency(consistentatstartandend)▫ Isolation(nocommunicationbetweenthem)▫ Durability(anoperationcannotbereversed)▫ Pessimisticviewonconsistency

ØRequirementsinadistributedcontext▫ Difficultiesininsuringthoseproperties▫ Conflictwithefficiency/performances

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

ACIDvs BASE• ModernsystemsusetheBASEmodel▫ Optimisticviewonconsistency▫ BasicallyAvailable:� Anyrequest=>Ananswer� Eveninachangingstate▫ SoftState:� OppositetoDurability.� System’sstate(serversordata)couldchangeovertime(withoutanyupdate)

▫ Eventuallyconsistent:� Withtime,datacanbeconsistent� Updateshavetobepropagated

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Solution:NoSQL• NoSQL :NotOnly SQL▫ Newdatastorage/managementapproach▫ Scalesupthesystem(throughdistribution)▫ Complexmetadatamanagement� Noschema

Ø DonotsubstituteDBMS,dedicatedto:▫ Veryhugevolumeofdata(PetaBytes)▫ Veryshortresponsetime▫ Consistencyisnotmandatory

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

DatabasesandNoSQL

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

NoSQL DB:Characteristics• Norelations=>Collections▫ Nofixstructures(naynone)• Complexdata(e.g.documents)▫ Objects,nesting,arrays• Datadistribution▫ Highparallelization(Map/Reduce)• Datareplication▫ Disponibility vsConsistency(notransactions)▫ Fewwrites,manyreads

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Sharding :Scalability• Datablocks aredistributedinaclusterofservers• Horizontalpartitioning• 3typesoftechnics:

1. Resourceallocationbased:HDFS2. Tree-basedstructure:Clusteredindex(sort)3. Hash-basedstructure:ConsistentHashing

17

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

NoSQL Systems

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

SeveralNoSQL systems• Key-ValueStore▫ Dataareidentifiedbyauniquekey(usedforquerying)▫ DynamoDB,Voldemort,Redis,Riak,MemcacheDB• Columndata▫ Relation1-n“one-to-many”(messages,posts)▫ HBase,Hypertable,Spark,Elasticsearch• Documents▫ Complexesdata,attributes/values▫ MongoDB ,Cassandra,CouchDB,Terrastore• Graphs▫ Highlyconnectedentities,SocialNetworking▫ Neo4j,OrientDB,FlockDB

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

I- NoSQL&Key-Valuestore• Similartoadistributed“HashMap”• Key+Value▫ Nofixedschemaonvalues(strings,object,integer,binaries…)

• Drawbacks:▫ Nostructuresnortyping▫ Nostructured-basedqueries

Ø DynamoDB (Amazon),Redis (VMWare),Voldemort(LinkedIn)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

I- NoSQL &Key-Valuestore(2)• CRUDOperations(HTTP)▫ Create(key,value)▫ Read(key)▫ Update(key,value)▫ Delete(key)• Horizontalscaling(partitionning/distribution)• Noverticaldistribution(datasegmentation)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

I- NoSQL&Key-Valuestore(3)

22

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

II– NoSQL &Columns• Column-basedstorage▫ DBMS:tuples(lines)• Easytoinsertanewcolumn▫ Dynamicschema

ØBigTable/Hbase (Google),Cassandra(Facebook&Apache),SimpleDB (Amazon)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

II– NoSQL &Columns(2)• Advantages:▫ XML/JSon support▫ Columnindexing▫ Horizontalscaling• Drawbacks:▫ Hardtoquerycomplexdata▫ Difficultforlinkeddata(distances,paths,time)▫ Pre-definedqueries(notonthefly)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

II– NoSQL&Columns(3)

25

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

III– NoSQL &Documents• Basedonthekey-valuestore▫ Addsemi-structureddata(JSon/XML)• HTTPAPI▫ MorecomplexthanCRUD

ØMongoDB,CouchDB (Apache),RavenDB,Terrastore

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

III– NoSQL &Documents(2)• Documentmanagement▫ Simpletypes(Int,String,Date)▫ Nofixschema(docsmayvary)▫ Nesteddata• Advantages:▫ Richnessforqueries▫ Indexingseveralattributs▫ Easytoscaleup• Drawbacks:▫ Difficultiesfordatainterconnexions▫ Dedicatedtokey-value(id)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

III– NoSQL&Documents(3)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

IV– NoSQL &Graph• Storage:nodes,relationsandproperties▫ GraphTheory▫ Pathqueryingonthegraph▫ Dataareloadedondemand▫ Difficultiesformodeling

ØNeo4j,OrientDB (Apache),FlockDB (Twitter)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

IV– NoSQL &Graph(2)• Availablestorage▫ Object(cf.documents)▫ Edges(withproperties)• DifficultforSharding

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

IV– NoSQL&Graph(3)

31

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Brewer’s CAP Theorem (2000)• 3mainpropertiesfordistributedmanagement1. Consistency:

� Adatahavethesamevalueatthesametime(coherency)

2. Availability:� Evenifaserverisdown,dataisavailable

3. PartitionTolerance:� Evenifthesystemispartitioned,aquerymusthaveananswer(unlessforglobalfailures)

Ø Theorem:Adistributed,networkedsystemcanhaveonlytwo ofthesethreeproperties.

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

• InitiallyXMLusedforcomplexinternetcommunications(WebServices)▫ Tooverbose• JSON (JavaScript Object Notation)▫ Lightweight, text-oriented, language independent▫ Used for several Web services (Google API, Twitter

API)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

JSon : Structures

• Key + Value

▫ “lastname” : “Travers”

▫ Keys with quotations

• Objects/documents

▫ Collection of key/values

▫ { “lastname” : “Travers”,

“firstname” : “Nicolas”,

“kind” : 1}

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Data types

• Scalar : String, Integer, float, boolean, null…

• List : arrays [ … ]

• Documents : objetcs {…}

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

Arrays

• No typing inside arrays

▫ “lessons” : [“SQL”, 1, 4.2, null, “NoSQL”]

• Can nest documents

▫ “doc” : [ {“test” : 1},

{“test” : {“nesting” : 1.0}},

{“key” : “text”, “value” : null}]

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

JSon : Identifiers

• Key « _id » commonly used to identify documents

▫ Overwrite already stored ids

▫ Can be automaticaly generated

� Ex MongoDB : "_id" : ObjectId(1234567890)

Introduction to NoSQL

N. TraversCEDRIC Lab - Vertigo

JSon : complete example

{

“_id” : 1234,

“lastname” : “Travers”, “firstname” : “Nicolas”,

“work” : {

“company” : “Cnam”,

“location” : {

“street” : “2 rue conté”,

“city” : “Paris”,

“zip” : 75141

},

},

“fields” : [ “DB” , “DB tuning”, “XML”, “NoSQL”, “IR” ]

}