lenawiese advanceddatamanagement degruyterwiese.free.fr/docs/wiese2015tocindex.pdf · author dr....

Lena WieseAdvanced Data ManagementDeGruyter

Lena Wiese

Advanced DataManagement

|for SQL, NoSQL, Cloud and Distributed Databases

AuthorDr. Lena WieseGeorg-August-Universität GöttingenFakultät für Mathematik und InformatikInstitut für InformatikGoldschmidtstraße 737077 Gö[email protected]

ISBN 978-3-11-044140-6e-ISBN (PDF) 978-3-11-044141-3

Library of Congress Cataloging-in-Publication DataA CIP catalog record for this book has been applied for at the Library of Congress.

Bibliographic information published by the Deutsche NationalbibliothekThe Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliogra�e;detailed bibliographic data are available on the Internet at http://dnb.dnb.de.

© 2015 Walter de Gruyter GmbH, Berlin/Munich/Boston♾ Printed on acid-free paperPrinted in Germany

www.degruyter.com

|To my family

PrefaceDuring the last two decades, the landscape of database management systems haschanged immensely. Based on the fact that data are nowadays stored andmanaged innetwork of distributed servers (“clusters”) and these servers consist of cheaphardware(“commodity hardware”), data of previously unthinkable magnitude (“big data”) areproduced, transferred, stored,modi�ed, transformed, and in the endpossibly deleted.This form of continuous change calls for �exible data structures and e�cient dis-tributed storage systems with both a high read and write throughput. In many novelapplications, the conventional table-like (“relational”) data format may not the datastructure of choice – for example, when easy exchange of data or fast retrieval becomevital requirements. For historical reasons, conventional database management sys-tems are not explicitly geared toward distribution and continuous change, asmost im-plementations of databasemanagement systems date back to a timewhere distributedstorage was not a major requirement. These de�ciencies might as well be attributedto the fact that conventional database management systems try to incorporate severaldatabase standards as well as have high safety guarantees (for example, regardingconcurrent user accesses or correctness and consistency of data).

Several kinds of database systems have emerged and evolved over the last yearsthat depart from the established tracks of datamanagement anddata formats in di�er-entways. Development of these emergent systems started fromscratch andgave rise tonew data models, new query engines and languages, and new storage organizations.Two things are particularly remarkable features of these systems: on the one hand, awide range of open source products are available (though some systems are supportedby or even originated from large international companies) and development can beobserved or even be in�uenced by the public; on the other hand, several results andapproaches achieved by long-standing database research (having its roots at least asearly as the 1960s) have been put into practice in these database systems and theseresearch results now show theirmerits for novel applications inmodern datamanage-ment. On the downside, there are basically no standards (with respect to data formatsor query languages) in this novel area and hence portability of application code orlong-term support can usually not be guaranteed. Moreover, these emerging systemsare not as mature (and probably not as reliable) as conventional established systems.

The termNOSQLhas beenused as anumbrella term for several emergingdatabasesystems without an exact formal de�nition. Starting with the notion of NoSQL (whichcan be interpreted as saying no to SQL as a query language) it has evolved to mean“not only SQL” (and hence written as NOSQL with a capital O). The actual originof the term is ascribed to the 2009 “NOSQL meetup”: a meeting with presentationsof six database systems (Voldemort, Cassandra, Dynomite, HBase, Hypertable, andCouchDB). Still, the question of what exactly a NOSQL database system is cannot beanswered unanimously; nevertheless, some structure slowly becomes visible in the

VIII � Preface

NOSQL �eld and has led to a broad categorization of NOSQL database systems. Maincategories of NOSQL systems are key-value stores, document stores, extensible recordstores (also known as column family stores) and graph databases. Yet, other creatureslive out there in the database jungle: object databases and XML databases do notespouse the relational data model nor SQL as a query language – but they typicallywould not be considered NOSQL database systems (probably because they predatethe NOSQL systems). Moreover, column stores are an interesting variant of relationaldatabase systems.

This book is meant as a textbook for computer science lectures. It is based onMaster-level database lectures and seminars held at the universities of HildesheimandGöttingen. As such it provides a formal analysis of alternative, non-relational datamodels and storage mechanisms and gives a decent overview of non-SQL query lan-guages. However, it does not put much focus on installing or setting up database sys-tems andhence complements other books that concentrate onmore technical aspects.This book also surveys storage internals and implementation details from an abstractpoint of view anddescribes commonnotions aswell as possible design choices (ratherthan singling out one particular database system and specializing on its technical fea-tures).

This book intends to give students a perspective beyond SQL and relationaldatabase management systems and thus covers the theoretical background of mod-ern datamanagement. Nevertheless this book is also aimed at database practitioners:it wants to help developers or database administrators coming to an informed de-cision about what database systems are most bene�cial for their data managementrequirements.

OverviewThis book consists of four parts. Part I Introduction commences thebookwith a generalintroduction to the basics of data management and data modeling.

Chapter 1 Background (page 3) provides a justi�cation why we need databasesin modern society. Desired properties of modern database systems like scalabil-ity and reliability are de�ned. Technical internals of database management sys-tems (DBMSs) are explained with a focus on memory management. Central com-ponents of a DBMS (like bu�er manager or recovery manager) are explored. Next,database design is discussed; a brief review of Entity-Relationship Models (ERM)and the Uni�ed Modeling Language (UML) rounds this chapter o�.Chapter 2 Relational Database Management Systems (page 17) contains a reviewof the relational data model by de�ning relation schemas, database schemas anddatabase constraints. It continueswith a example of how to transformanERM intoa relational database schema. Next, it illustrates the core concepts of relationaldatabase theory like normalization to avoid anomalies, referential integrity, rela-tional query languages (relational calculus, relational algebra and SQL), concur-rencymanagement and transactions (including the ACID properties, concurrencycontrol and scheduling).

Part II NOSQL And Non-Relational Databases comprises the main part of this book. Inits eight chapters it gives an in-depth discussion of datamodels and database systemsthat depart from the conventional relational data model.

Chapter 3 New Requirements, “Not only SQL” and the Cloud (page 33) admits thatrelational databasesmangement systems (RDMBSs) have their strengths andmer-its but then contrasts them with cases where the relational data model mightbe inadequate and touches on weaknesses that current implementations of re-lational DBMSs might have. The chapter concludes with a description of currentchallenges in data management and a de�nition of NOSQL databases.Chapter 4 Graph Databases (page 41) begins by explaining some basics of graphtheory. Having presented several choices for graph data structures (from adja-cencymatrix to incidence list), it describes the predominant datamodel for graphdatabases: the property graph model. After a brief digression of how to mapgraphs to an RDBMS, two advanced types of graphs are introduced: hypergraphsand nested graphs.Chapter 5 XML Databases (page 69) expounds the basics of XML (like XML docu-ments and schemas, and numbering schemes) and surveys XML query languages.Then, the chapter shifts to the issue of storingXML inanRDBMS. Finally, the chap-ter describes the core concepts of native XML storage (like indexing, storageman-agement and concurrency control).

X � Overview

Chapter 6 Key-value Stores and Document Databases (page 105) puts forward thesimple data structure of key-value pairs and introduces the map-reduce conceptas a pattern for parallelized processing of key-value pairs. Next, as a form ofnested key-value pairs, the Java Script Object Notation (JSON) is introduced. JSONSchema and Representational State Transfer are further topics of this chapter.Chapter 7 Column Stores (page 143) outlines the column-wise storage of tabulardata (in contrast to row-wise storage). Next, the chapter delineates several waysfor compressed storage of data to achieve a more compact representation basedon the fact that data in a column is usuallymoreuniform thandata in a row. Lastly,column striping is introduced as a recent methodology to convert nested recordsinto a columnar representation.Chapter 8 Extensible Record Stores (page 161) describes a �exible multidimen-sional datamodel based on column families. The surveyed database technologiesalso include ordered storage and versioning. After de�ning the logical model, thechapter explains the core concepts of the storage structures used on disk and theways to handle writes, reads and deletes with immutable data �les. This also in-cludes optimizations like indexing, compaction and Bloom �lters.Chapter 9 Object Databases (page 193) starts with a review of object-oriented no-tions and concepts; this review gives particular focus to object identi�ers, objectnormalization and referential integrity. Next, several options for object-relationalmapping (ORM) – that is, how to store object in an RDBMS – are discussed; theORM approach is exempli�ed with the Java Persistence API (JPA). The chaptermoves on to object-relational databases that o�er object-oriented extensions inaddition to their basic RDBMS functionalities. Lastly, several issues of storing ob-jects natively with an Object Database Management System (ODBMS) – like forexample, object persistence and reference management – are attended to.

Part III Distributed Data Management treats the core concepts of data managementwhen data are scaled out – that is, data are distributed in a network of databaseservers.

Chapter 10 Distributed Database Systems (page 235) looks at the basics of datadistribution. Failures in distributed systems and requirements for distributeddatabase management systems are addressed.Chapter 11 Data Fragmentation (page 245) targets ways to split data across a setof servers which are also known under the terms partitioning or sharding. Sev-eral fragmentation strategies for each of the di�erent data models are discussed.Special focus is given to consistent hashing.Chapter 12 Replication And Synchronization (page 261) elucidates the backgroundon replication for sake of increased availability and reliability of the database sys-tems. Afterwards, replication-related issues like distributed concurrency controland consensus protocols as well hinted hando� and Merkle trees are discussed.

Overview � XI

Chapter 13 Consistency (page 295) touches upon the topic of relaxing strong con-sistency requirements known from RDBMSs into weaker forms of consistency.

Part IV Conclusion is the �nal part of this book.Chapter 14 Further Database Technologies (page 311) gives a cursory overviewof related database topics that are out of the scope of this book. Among othertopics, it glimpses at data stream processing, in-memory databases and NewSQLdatabases.Chapter 15 Concluding Remarks (page 317) summarizes the main points of thisbook and discusses approaches for database reengineering and data migration.Lastly, it advocates the idea of polyglot architectures: for each of the di�erent datastorage and processing tasks in an enterprise, users are free to choose a databasesystem that is most appropriate for one task while using di�erent database sys-tems for other tasks and lastly integrating these systems into a common storageand processing architecture.

XX � List of Figures

�.� Write-ahead log on disk| 172�.� Compaction on disk| 173�.� Leveled compaction| 175�.� Bloom �lter for a data �le| 176�.� A Bloom �lter of length m = �� with three hash functions| 178�.�� A partitioned Bloom �lter with k = � and partition length m′ = �| 181

�.� Generalization (left) versus abstraction (right)| 195�.� Unnormalized objects| 197�.� First object normal form| 198�.� Second object normal form| 198�.� Third object normal form| 199�.� Fourth object normal form| 200�.� Simple class hierarchy| 205�.� Resident Object Table (grey: resident, white: non-resident)| 227�.� Edge Marking (grey: resident, white: non-resident)| 228�.�� Node Marking (grey: resident, white: non-resident)| 228

��.� A hash tree for four messages| 242

��.� XML fragmentation with shadow nodes| 252��.� Graph partitioning with shadow nodes and shadow edges| 253��.� Data allocation with consistent hashing| 257��.� Server removal with consistent hashing| 258��.� Server addition with consistent hashing| 259

��.� Master-slave replication| 262��.� Master-slave replication with multiple records| 263��.� Multi-master replication| 263��.� Failure and recovery of a server| 264��.� Failure and recovery of two servers| 264��.� Two-phase commit: commit case| 267��.� Two-phase commit: abort case| 268��.� A basic Paxos run without failures| 270��.� A basic Paxos run with a failing leader| 272��.�� A basic Paxos run with a dueling proposers| 273��.�� A basic Paxos run with a minority of failing acceptors| 274��.�� A basic Paxos run with a majority of failing acceptors| 275��.�� Lamport clock with two processes| 279��.�� Lamport clock with three processes| 279��.�� Lamport clock totally ordered by process identi�ers| 280��.�� Lamport clock with independent events| 281

List of Figures � XXI

��.�� Vector clock| 283��.�� Vector clock with independent events| 284��.�� Version vector synchronization with union merge| 287��.�� Version vector synchronization with siblings| 288��.�� Version vector with replica IDs and stale context| 291��.�� Version vector with replica IDs and concurrent write| 292

��.� Interfering operations at three replicas| 296��.� Serial execution at three replicas| 297��.� Read-one write-all quorum (left) and majority quorum (right)| 298

��.� Polyglot persistence with integration layer| 321��.� Lambda architecture| 323��.� A multi-model database| 324

Index

2PC see two-phase commit2PL see two-phase locking

acceptor 269, 272ACID properties 26, 39, 295, 316adjacency 45, 60adjacency list 50–51adjacency matrix 46–48Aerospike 315a�nity 249agent 266all-or-nothing principle 3, 26AllegroGraph 311allocation see data allocationAmbari 120anomaly 19, 20, 33, 161, 196, 301anti-entropy 240ArangoDB 330array database 313association 13, 194, 197, 198, 204association class 13, 198, 204atomicity 26attribute 9, 11–14, 17, 19, 22, 34, 53–56, 71, 72,

77, 85, 100, 103, 193, 213, 217–219– composite 9, 11, 18, 19, 217– key 18, 19, 207–multi-valued 9, 11, 13, 18, 203, 204, 218attribute table 56Avro 120, 325axis 81

B-tree 92, 94, 96, 171backward traversal 52big data 38bit-vector encoding 145Bloom �lter 175–181breadth-�rst search 44bucket 108Byzantine failure 239

candidate key 21CAP principle 306causal consistency 304causality 277, 280–e�ective 304

class 12, 193client-centric consistency 305–306clock 276–292cloud database 39clustering 249collision 176column family 161, 163column name 163column quali�er 163, 167–169, 176column store 143column striping 151combine 108comission failure 238compaction 173–175, 187, 189, 191, 243compatibility matrix 99complete graph 42, 43composite attribute 9, 11, 18, 19, 217compression 144concurrency 24, 25, 131, 237, 261, 263, 280,

282, 283, 288, 290, 319concurrency control 26–28, 97–100, 139,

266–276, 308concurrent events 280consensus problem 266consistency 4, 26, 237, 261, 263, 271, 295–307,

320, 321, 323–eventual see eventual consistency– trade-o�s 306–weak see weak consistencyconsistent hashing 257convergent replicated data types 130coordinator 266Couchbase 139CouchDB 136counter column 170crash failure 238

dangling references 200DAO see Data Access ObjectData Access Object 202data allocation 255–259data distribution problem 256data locality 108, 144, 163data replication problem 265data stream 312

348 � Index

data-centric consistency 303database-as-a-service 39DataNucleus 229decision phase 266de�nition level 152, 156depth-�rst search 44derived fragmentation 250DeweyID 80, 96dictionary encoding 146di�erence 22di�erential encoding 148directed graph 43directed hyperedge 58directed multigraph 43distribution transparency 236Document Object Model 76document order 76Document Type De�nition 71–73dotted version vectors 131, 291Dremel 151Drill 324Druid 326DTD see Document Type De�nitiondurability 26Dynamo 257

edge 41, 42edge cut 248, 254edge label 53, 54edge list 46edge marking 227edge table 56end tag 69entity 8, 17–19, 33, 71, 161, 163, 254entity lifecycle 211, 215entity-relationship model 8–11epidemic protocol 239–241, 265ERM see entity-relationship modelEulerian Cycle 45Eulerian Path 45eventual consistency 304eXistDB 100Extensible Markup Language 69–71extensible record store 161

fail-recover 239fail-stop 239failure 171, 237–239, 262, 264, 265, 267, 268,

271, 272

�nite state machine 156Flink 312FLOWR expression 82Flume 120foreign key 18–20, 33, 84, 86, 87, 89, 111, 163,

203–205, 212, 213forward traversal 52fragmentation 245–254frame of reference encoding 146

generalized hyperedge 59Geode 315geographic information system 314GeoJSON 314GeoServer 314GIS see geographic information systemgossip 239–241graph 41–45– complete 42, 43–directed 43–multi-relational 53–oriented 43– simple 42, 43– single-relational 53–undirected 42–weighted 44graph partitioning 252graph problems 45graph traversal 44GRASS GIS 314

Hadoop 118Hamilton Cycle 45Hamilton Path 45happened before 277happened-before relation 277hash function 176, 252, 255, 257hash tree 241–243Hazelcast 315HDFS 118head set 59hinted hando� 265history 300Hive 127homogeneity 35, 143horizontal fragmentation 249hybrid fragmentation 250hyperedge 58–directed 58

Index � 349

–generalized 59–oriented 59–undirected 58hypergraph 58HyperGraphDB 66hypernode 61

idempotent 117identi�er overflow 96identi�er stability 96immutable data �les 166in-memory database 315incidence 45–negative 53–positive 53incidence list 51–53, 61incidence matrix 48–49, 61inconsistency window 303, 306index 90, 100, 103, 171inlining 86integrity 4, 24, 25, 118interface 194, 201interrelational constraints 18intersection 22intrarelational constraints 18inverse attributes 200isolation 26, 97, 306

Java Data Objects 215–217Java Persistence API 209–214Java Persistence Query Language 209Java Script Object Notation 101, 110–112, 116,

229, 314, 319, 325, 326, 328JDO see Java Data ObjectsJena 311join 23, 34, 35, 89, 91, 92, 124, 162, 203, 206,

207, 209, 214, 247, 249, 250, 253, 327, 331JPQL see Java Persistence Query LanguageJSON see Java Script Object NotationJSON object 110JSON Schema 112–116

key attribute 18, 19, 207key-value pair 63, 101, 103, 105, 106, 109, 110,

113, 119, 155, 162, 169, 173

labeling scheme see numbering schemelambda architecture 322Lamport clock see scalar clock

leader 269, 271lean and mean 317learner 269, 271linked data 311lock escalation 99locking 7, 27, 98, 319Log-Structured Merge Tree 171logical clock 277lost update 282, 290, 291, 295, 298

main memory 4–7, 83, 92, 143, 162, 163, 170,171, 208, 209, 223–228, 315

main memory address 6, 226, 227main memory table 167map 106, 107map-reduce 106–109, 118master-slave replication 262memtable 167, 172Merkle tree see hash treemethod 12, 193method chaining 64migration 237, 317, 318MonetDB 158MongoDB 133multi-level index 171multi-master replication 263multi-model database 322, 324, 327, 330multi-relational graph 53multi-valued attribute 9, 11, 13, 18, 203, 204,

218multiedge 43multigraph 42, 43–directed 43–undirected 43multiplicities 13, 151multiversion concurrency control 276MVCC seemultiversion concurrency control

natural join 22negative incidence 53Neo4J 65nested graph 61network partition 238NewSQL 315node 41node label 53, 54, 56node marking 228node table 56node test 81

350 � Index

non-blocking reads 276non-redundancy 4, 246, 318non-resident 225normalization 19, 20, 33, 161, 196–199, 218Not only SQL 38null suppression 149nullipotent 117numbering scheme 78–81

Object Data Management Group 201object identi�er 194–196Object Management Group 201object normal form 196–199object-relational databases 217–222object-relational impedance mismatch 194object-relational mapping 202–217omission failure 238one-copy serializability 296, 297, 301operator tree 23optimistic concurrency control 26, 98OrdPath 80, 97OrientDB 327oriented graph 43oriented hyperedge 59

page bu�er 5, 167, 208, 209, 225, 226page split 95Parquet 158, 325partial quorum 299partition tolerance 302, 306, 307partitioning 108, 245Paxos 268–274peer-to-peer replication 263persistence 4, 200, 202, 213, 215, 216, 223, 224pessimistic concurrency control 26, 98Pig 121point query 168pointer swizzling 226–228polyglot persistence 320position bit-string 150position list 149position range 150positive incidence 53PostGIS 314postorder numbering 78Pre/Dist/Size encoding 102pre/post diagram 79pre/post numbering 78predicate 81

pre�x numbering 80preorder numbering 78primary key 19–21, 86, 88, 182, 196, 211, 215projection 22, 214, 249property 54property graph 53–55proposer 269, 271

QGIS 314quorum 265, 298–299

range query 168Rasdaman 313RDF see resouce description frameworkreachability 213, 215, 224read phase 269, 276read repair 265Read-one write-all 298recovery 172, 264recursion 34, 35Redis 132redo logging 171reduce 106, 107redundancy 4, 8, 19, 33, 203, 206, 218, 262reengineering 317referential integrity 20, 200relation schema 17, 18, 34, 37relational algebra 22relational calculus 22, 23relational query language 22relationship 9, 19, 34, 41, 58, 65, 194, 200, 213,

224reliability 4, 236, 261, 264, 291, 323renaming 22renumbering 80repetition level 152, 156replication 4, 237, 261–266, 301–303, 315replication factor 261, 263, 298Representational State Transfer 116–117resident 225resident object table 226resilient distributed datasets 121resouce description framework 311REST see Representational State TransferRiak 129round-tripping 90row key 162, 163, 167–170, 176ROWA see Read-one write-allrumor spreading 240

Index � 351

run-length encoding 144

Samza 312scalability 3, 38, 235, 236, 302, 320, 324scalar clock 277–281Scalaris 315schedule 7, 27schema evolution 8, 37–39, 174, 223, 319schema independence 37schema-based mapping 84, 86schemaless 37, 38, 105, 253, 318, 319, 325schemaless mapping 84, 89SciDB 313selection 22, 248, 250, 251, 253semantic overloading 34, 41semi-structured 3, 69, 109, 320sequential consistency 295, 296serializability 27, 296, 297, 299, 301service level agreement 39Sesame 311session guarantees 305shadow node 251, 253sharding 245, 253shared-nothing architecture 235shu�le 106, 107sibling version 287Simple API for XML 76simple graph 42, 43single-level storage 224single-relational graph 53sliding window 312snapshot 26, 118, 120, 223, 315snapshot isolation 300–301–non-monotonic 304–parallel 304source node 43, 46, 54source set 58spanning tree 45Spark 120SPARQL 311specialization 14, 194, 196, 202, 204split 106, 107SQL see Structured Query LanguageSQL object 217SQL/XML 84Sqoop 128start tag 69Storm 312strong clock property 281

Structured Query Language 22, 23subclass 12, 14, 194, 199, 204–208, 212superclass 14, 194, 199, 204, 205, 207, 208,

211, 212synchronization 263, 285, 286, 288, 290

tail set 58target node 43, 46, 54target set 59Tez 120three-phase commit 268time-to-live value 164, 166, 167, 169, 173, 187,

244, 291timestamp 166timestamp scheduler 27TinkerPop 62, 327TokuDB 316tombstone 167, 172trailer 171transaction 24–28, 33, 36, 65, 97, 99, 133, 172,

211, 216, 246, 250, 252, 266, 276, 296,297, 300–303, 319, 322, 328, 329

transitive closure 34transparency 236traversal 44–backward 52– forward 52triple store 311TTL see time-to-live valuetuple reconstruction 144, 148two-level storage 208two-phase commit 266two-phase locking 27typed table 217, 221, 222

UML see Uni�ed Modeling Languageundirected graph 42undirected hyperedge 58undirected multigraph 43Uni�ed Modeling Language 11, 201union 22, 34, 35, 207, 247, 250upsert 117, 166, 172

validation phase 276vector clock 281–284, 289–292vector clock bounding 289vector clock comparison 283version vector 284–289versioning 37, 166, 174, 223, 319

352 � Index

vertex 41, 42vertical fragmentation 249virtual heap 225, 254Virtuoso 311visibility 12, 191, 193, 306VoltDB 315voting phase 266

weak clock property 281weak consistency 39, 299, 302–303weighted graph 44wide column store 161write phase 269, 276write-ahead logging 172

XML see Extensible Markup LanguageXML Parser 75XML Schema 73–75XML tree 76XPath 81–82XQuery 82–83XSD see XML SchemaXSLT 83–84

YARN 120

ZooDB 230ZooKeeper 120

lenawiese advanceddatamanagement degruyterwiese.free.fr/docs/wiese2015tocindex.pdf · author dr....

Documents