no sql – rise of the clusters

42
September 19, 2013 Speaker: David Wolfe

Upload: responseteam

Post on 12-May-2015

500 views

Category:

Technology


0 download

DESCRIPTION

What is NoSQL? NoSQL describes a family of approaches to managing data at an enterprise level that have key similarities, but - at the same time - are very different from classic SQL based relational databases. NoSQL has emerged as a 'movement' over the last 5 years and many specific noSQL datastores - Mongo, Redis, HBase, Cassandra, Neo4J - are being used for mission critical systems by many organizations including Facebook, LinkedIn, Dropbox, American Express, NSA, & the CIA. Does NoSQL spell the end of SQL based relational datastores like Oracle, MySQL, SQLServer, & Sybase? Definitely not, but the world is moving in the direction of "Polyglot Persistence" and away from the "Relational Persistence" hegemony. In my presentation I will explain why this shift is occurring and will speculate about what the future will hold.

TRANSCRIPT

Page 1: No sql – rise of the clusters

September 19, 2013

Speaker: David Wolfe

Page 2: No sql – rise of the clusters

Topics

What is SQL? What is NoSQL?

Why have relational databases been

successful?

Why did NoSQL databases emerge?

How are their data models different?

Page 3: No sql – rise of the clusters
Page 4: No sql – rise of the clusters

SQL & relational databases

Relational databases are software

applications that store data

Data is stored in tables that have rows &

columns : think excel spreadsheets

FirstName LastName Age Zipcode Gender

Bob Smith 45 38444 M

Jane Happy 23 15122 F

Fred Jones 55 92102 M

Johnny Appleseed 26 90025 M

Page 5: No sql – rise of the clusters

SQL & relational databases

Relational databases typically have

many tables that are “related” to one

another

Page 6: No sql – rise of the clusters

SQL & relational databases

Relational databases support access to data in tables through a language called “SQL” – Structured Query Language

SQL supports “set” based operations on tables – selection, projection, joining

SQL is based on relational algebra

Page 7: No sql – rise of the clusters

SQL & relational databases

Relational databases were developed in the late 1970s at IBM

They have been the dominant approach to data management in the enterprise through the early 2000’s

Examples include

Oracle

Sybase

MySQL

Postgress

Page 8: No sql – rise of the clusters
Page 9: No sql – rise of the clusters

NoSQL databases

NoSQL are software applications that

store data

They, not surprisingly, do not use SQL or

the relational model (interrelated tables)

They are “less strict” about data

definition

They were developed in a “big-data”

world for applications needing massive

scalability (clustering)

Page 10: No sql – rise of the clusters

NoSQL databases

There are many types of NoSQL databases

We will review the differences later

Page 11: No sql – rise of the clusters
Page 12: No sql – rise of the clusters

RDBMS value - persistence

During the 90’s and 2000’s as pc’s

became ubiquitous, distributed

computing took off.

In the 1990’s, client-server and n-tier

architectures dominated enterprise

development

The late 90’s and 2000’s saw the

dominance of the web and distributed

applications that broke out of enterprise

Page 13: No sql – rise of the clusters

RDBMS value - persistence

In this distributed world where

applications needed to keep data

around for

Many users

Extended periods

RDBMS emerged as the defacto choice for

persisting data.

Page 14: No sql – rise of the clusters

RDBMS value - concurrency

Another challenge that distributed

applications presented was

concurrency:

many users viewing and potentially updating

the same data at the same time

Concurrency is notoriously difficult to

get right for even the best engineers.

Relational databases “helped” by

controlling data access with transactions

Page 15: No sql – rise of the clusters

RDBMS value - integration

Enterprise application eco-systems

necessitate multiple integrated software

applications. Example

Customer Service app

Biz Intel app

E-Commerce app

Inventory management apps

Common approach was to use a shared

rdbms database integration approach.

Page 16: No sql – rise of the clusters

RDBMS value – SQL

RDBMS providers all supported a core

SQL standard

In theory this would allow developers to

switch reliance on different RDBMS

providers without problems

In fact, different providers (Oracle,

Sybase, Microsoft) developed different

“dialects” or SQL extensions (pl SQL vs.

T-SQL)

Page 17: No sql – rise of the clusters
Page 18: No sql – rise of the clusters

Crack #1– impedance mismatch

Impedance mismatch is the difference

between the relational model and in-

memory data structures

Page 19: No sql – rise of the clusters

Crack #1– impedance mismatch

In the late 1990s people believed that

impedance mismatch would lead to

RDBMS being replaced by databases

that replicated in-memory structures to

disk (OODBMS)

While the 1990s saw the rise of OO

programming languages, OODBMS

never took gained real traction

Page 20: No sql – rise of the clusters

Crack #1– impedance mismatch

OODBMS didn’t gain traction because

Impedance mismatch had been made easier

to deal with by Object-Relational (OR)

mapping frameworks like Hibernate, iBatis,

& Cocoon

There was a growing professional divide

between application developers and

database administrators

The value of RDBMS as an app integration

mechanism was large

Page 21: No sql – rise of the clusters

Crack #2– SOA

The 2000’s saw a shift in how enterprise

applications interacted

Historically, many applications interacted

through a shared RDBMS.

This approach – shared integration

RDBMS – has serious problems

Overly complex schema

Cant change tables or add indices easily

Database has to preserve integrity

Page 22: No sql – rise of the clusters

Crack #2– SOA

Interactions between applications shifted

to web-services

Web-services constituted protocols for

moving documents (XML, JSON) over

HTTP using SOAP or REST based

approaches

SOA allowed applications to

encapsulate data and expose it through

services

Page 23: No sql – rise of the clusters

The Final Crack #3– Clusters

The internet saw several large web properties dramatically increase in scale

Websites started tracking activity and structure in a very detailed way

Social gestures

Social links

Log data

Purchase gestures

Increasing numbers of users appeared using more devices

Page 24: No sql – rise of the clusters

The Final Crack #3– Clusters

The problem with scaling out (clustering)

is that RDBMS are not designed to run

on clusters.

Oracle RAC & MS SQL Server all use

the concept of a shared disk sub-system

Still single point of failure and scaling

limitation

The final crack – mismatch between

RDBMS & clusters

Page 25: No sql – rise of the clusters

NoSQL Emergence

The emergence of NoSQL was really

about needing databases that run on

clusters One exception is Graph databases

Though problems with shared database

integration and impedance mismatch

existed, it was the need for scale that

drove the emergence of NoSQL

databases

Page 26: No sql – rise of the clusters
Page 27: No sql – rise of the clusters

Aggregate Data Models

A key characteristic of NoSQL databases is that they do not use the Relational data metamodel (relations & tuples)

There are four types of data metamodels in the NoSQL eco-system

Key-value

Document

Column-family

Graph

Page 28: No sql – rise of the clusters

Aggregate Data Models

Key-value, document, and column-

family NoSQL databases share a

common characteristic of their data

models called “aggregate orientation” We ill not cover graph based data metamodels in this presentation

Page 29: No sql – rise of the clusters

Aggregates

The relational model takes information

you want to store and divides it into

rows.

Rows are lists of simple data values.

Rows are the unit of data operation

Aggregate orientation recognizes that

often times data units can be more

complex and can have nested lists and

record structures

Page 30: No sql – rise of the clusters

Aggregates

The relational model takes information you want to store and divides it into rows.

In RDBMS rows are lists of simple data values.

In RDBMS rows are the unit of data operation

Aggregate orientation recognizes that often times data units can be more complex and can have nested lists and record structures

With Aggregates, aggregates are the unit of data operation

Page 31: No sql – rise of the clusters

Relational Data Example

Page 32: No sql – rise of the clusters

Aggregate Data Example

Page 33: No sql – rise of the clusters

Consequences of Aggregate

Orientation

Relations capture data elements and relations, but not aggregates.

Aggregates are really “chunks” of data that are typically retrieved and operated on as an interaction unit.

Aggregates are about how the data is being used.

RDBMS do not have knowledge of aggregate structure and cant use it to store and distribute data

Page 34: No sql – rise of the clusters

Consequences of Aggregate

Orientation

So, RDBMS are aggregate-ignorant. Is that a bad or good thing? Its both

Its good if you need to access and use the data in many different ways – if you don’t have a primary structure for manipulating your data

Its bad if you want to run on a cluster.

Aggregates are great on clusters because you can distribute them across nodes

Page 35: No sql – rise of the clusters

Consequences of Aggregate

Orientation

Aggregate orientation allows you to

operate many logical data items (in the

aggregate) by updating the aggregate

atomically

Aggregate oriented NoSQL databases

can be said to support transactions on

single aggregates, but not across

aggregates

Page 36: No sql – rise of the clusters

Key-Value & Document Data

Models

Both types of databases have a key or

Id that is mapped to an aggregate data

structure in a virtual table

With key-value NoSQL dbs, we can only

access the aggregate by looking up its

key

With document databases we can also

look up aggregates by fields in the

aggregate

Page 37: No sql – rise of the clusters

Key-Value & Document Data

Models

Examples of Key-Value NoSQL dbs are

Redis

Examples of Document NoSQL dbs are

Mongodb

Couchbase

SimpleDB

Page 38: No sql – rise of the clusters

Column-Family Data Models

These NoSQL databases where

influenced by Google’s BigTable

The Columnar is a two-level aggregate

structure

There is a key (row identifier) that maps to

the aggregate of interest

The aggregate is a map of more detailed

values – these are referred to as columns

Page 39: No sql – rise of the clusters

Column-Family Data Models

Page 40: No sql – rise of the clusters

Column-Family Data Models

Column-family dbs organize columns into families

The data is row-oriented

Each row is an aggregate (eg. Customer with id 1234)

The data is column-oriented

Each column family defines a record type (customer profile)

But, columns can also be dynamic and unique (to model lists)

Page 41: No sql – rise of the clusters

Column-Family Data Models

Examples of Column-Family NoSQL dbs

are

Hbase

Cassandra

Page 42: No sql – rise of the clusters

Polyglot Persistence

The future?

Only NoSQL?

Only SQL?

Probably both – Polyglot Persistence