webinar: from relational databases to mongodb - what you need to know

Post on 01-Nov-2014

9 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Relational databases weren't designed to cope with the scale and agility challenges that face modern applications. MongoDB can offer scalability, performance and ease of use - but proper design will be a critical factor to that success. We'll take a dive into how MongoDB works to better understand what non-relational design is, why we might use it and what advantages it gives us. We'll develop schema designs by example, and consider strategies for scale out.

TRANSCRIPT

Engineer

Bryan Reinero

@blimpyacht

Relational to MongoDB

Unhelpful Terms

• NoSQL

• Big Data

• Distributed

What’s the data model?

MongoDB

• Non-relational

• Scalable

• Highly available

• Full featured

• Document database

RDBMS MongoDBTable, View ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ Embedded

DocumentForeign Key ➜ ReferencePartition ➜ Shard

Terminology

Sample Document{

maker : "M.V. Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {

type : "internal cumbustion",layout : "inline"cylinders : 4,displacement : 750,

},transmission : {

type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]

}}

Relational DBs• Attribute columns are valid for

every row

• Duplicate rows are not allowed

• Every column has the same type and same meaning

As a document store, MongoDB supports a flexible schema

1st Normal Form: No repeating groups

• Can't use equality to match elements

NameLumiaiPad

Galaxy

Categories“electronics,hand held, smart

phone”“PDA,tablet”

“smart phone,tablet”

Product_id

1234567891011

MakerNokiaApple

Samsung

1st Normal Form: No repeating groups

• Can't use equality to match elements

• Must use regular expressions to find data

NameLumiaiPad

Galaxy

Categories“electronics,hand held, smart

phone”“PDA,tablet”

“smart phone,tablet”

Product_id

1234567891011

MakerNokiaApple

Samsung

1st Normal Form: No repeating groups

• Can't use equality to match elements

• Must use regular expressions to find data

• Aggregate functions are difficult

NameLumiaiPad

Galaxy

Categories“electronics,hand held, smart

phone”“PDA,tablet”

“smart phone,tablet”

Product_id

1234567891011

MakerNokiaApple

Samsung

1st Normal Form: No repeating groups

• Can't use equality to match elements

• Must use regular expressions to find data

• Aggregate functions are difficult

• Updating a specific element is difficult

NameLumiaiPad

Galaxy

Categories“electronics,hand held, smart

phone”“PDA,tablet”

“smart phone,tablet”

Product_id

1234567891011

MakerNokiaApple

Samsung

The Tao of MongoDB

{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [

"electronics","handheld","smart phone"

]}

The Tao of MongoDB

{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [

"electronics","handheld","smart phone"

]}

// querying is easydb.products.find( { "categories": ”handheld" } );

The Tao of MongoDB

{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [

"electronics","handheld","smart phone"

]}

// querying is easydb.products.find( { "categories": ”handheld" } );

// can be indexeddb.products.ensureIndex( { "categories”: 1 } );

The Tao of MongoDB

{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [

"electronics","handheld","smart phone"

]}

// Updates are easydb.products.update(

{ "categories": "electronics"}, { $set: { "categories.$" : "consumer electronics" } }

);

The Tao of MongoDB

{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [

"electronics","handheld","smart phone"

]}

db.products.aggregate({ $unwind : "$categories" }, { $group : {

"_id" : "$categories", "counts" : { "$sum" : 1 }

} }

);

The Tao of MongoDB

{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [

"electronics","handheld","smart phone"

]}

db.products.aggregate({ $unwind : "$categories" }, { $group : {

"_id" : "$categories", "counts" : { "$sum" : 1 }

} }

);

Unwind the array

The Tao of MongoDB

{ _id : ObjectId(),maker : “Nokia”name : “Lumia”,categories : [

"electronics","handheld","smart phone"

]}

db.products.aggregate({ $unwind : "$categories" }, { $group : {

"_id" : "$categories", "counts" : { "$sum" : 1 }

} }

);

Unwind the array

Tally the occurrences

The Tao of MongoDB "result" : [

{ "_id" : "smart phone”, "counts" : 1589 },{ "_id" : "handheld”, "counts" : 2403 },{ "_id" : "electronics”, "counts" : 4767 }

]

db.products.aggregate({ $unwind : "$categories" }, { $group : {

"_id" : "$categories", "counts" : { "$sum" : 1 }

} }

);

Meh, big deal…. Right?

Aren’t nested structures just a pre-joined schema?

• I could use an adjacency list

• I could use an intersection table

Goals of Normalization

• Model data an understandable form

• Reduce fact redundancy and data inconsistency

• Enforce integrity constraints

Performance is not a primary goal

Normalize or Denormalize

Commonly held that denormalization is faster

Normalize or Denormalize

Commonly held that denormalization is faster

• Normalization can be fast, right?

Normalize or Denormalize

Commonly held that denormalization is faster

• Normalization can be fast, right? Requires proper indexing, indexing effects write performance

Normalize or Denormalize

Commonly held that denormalization is faster

• Normalization can be fast, right? Requires proper indexing, indexing effects write performance

• Does denormalization commit me to a join strategy?

Normalize or Denormalize

Commonly held that denormalization is faster

• Normalization can be fast, right? Requires proper indexing, indexing effects write performance

• Does denormalization commit me to a join strategy? Indexing overhead is a commitment too

Normalize or Denormalize

Commonly held that denormalization is faster

• Normalization can be fast, right? Requires proper indexing, indexing effects write performance

• Does denormalization commit me to a join strategy? Indexing overhead is a commitment too

• Does denormalizaiton improve a finite set of queries at the cost of several others?

Normalize or Denormalize

Commonly held that denormalization is faster

• Normalization can be fast, right? Requires proper indexing, indexing effects write performance

• Does denormalization commit me to a join strategy? Indexing overhead is a commitment too

• Does denormalizaiton improve a finite set of queries at the cost of several others? MongoDB works best in service to an application

Object–Relational Impedance Mismatch

• Inheritance hierarchies

• Polymorphic associations

Table Per Subclass

Vehiclesvinregistration maker

MotorcycleEngineraketrial Racebike

racing numberclassteamrider

Table Per Subclass

Vehicles- electric

- car- bus- motorcycle

- internal combustion-motorcycle - aircraft

- human powered- bicycle- skateboard

-horsedrawn

Table Per Concrete Class

• Each class is mapped to a separate table

• Inherited fields are present in each class’ table

• Can’t support polymorphic relationships

Table Per Concrete Class

• Each class is mapped to a separate table

• Inherited fields are present in each class’ table

• Can’t support polymorphic relationshipsSELECT maker FROM Motorcycles WHERE Motorcycles.country = "Italy"UNIONSELECT maker FROM Automobiles WHERE Automobiles.country = "Italy"

Table Per Class Family

• Classes mapped to a single table

NameF4

A104Triton 95

Typesportbikehelicoptersubmarine

Vehicle_id1234567891011

MakerM.V

AgustaM.V.

AgustaTriton

Table Per Class Family

• Classes mapped to a single table

• Discriminator column to identify class

discriminator

NameF4

A104Triton 95

Typesportbikehelicoptersubmarine

Vehicle_id1234567891011

MakerM.V

AgustaM.V.

AgustaTriton

Table Per Class Family

• Classes mapped to a single table

• Discriminator column to identify class

• Many empty columns, nullability issues

NameF4

A104Triton 95

Typesportbikehelicoptersubmarine

Vehicle_id1234567891011

MakerM.V

AgustaM.V.

AgustaTriton

Table Per Class Family

• Classes mapped to a single table

• Discriminator column to identify class

• Many empty columns, nullability issues

maker = “M.V. Agusta”, type = “sportbike”, num_doors = 0,wing_area = 0, maximum_depth = 0

???NameF4

A104Triton 95

Typesportbikehelicoptersubmarine

Vehicle_id1234567891011

MakerM.V

AgustaM.V.

AgustaTriton

The Tao of MongoDB{ maker : "M.V. Agusta",

type : sportsbike,engine : {

type : ”internal combustion",cylinders: 4,displacement : 750

},rake : 7,trail : 3.93

}{ maker : "M.V. Agusta",

type : Helicopterengine : {

type : "turboshaft"layout : "axial”,massflow : 1318

},Blades : 4undercarriage : "fixed"

}

The Tao of MongoDB{ maker : "M.V. Agusta",

type : sportsbike,engine : {

type : ”internal combustion",cylinders: 4,displacement : 750

},rake : 7,trail : 3.93

}{ maker : "M.V. Agusta",

type : Helicopter,engine : {

type : "turboshaft"layout : "axial”,massflow : 1318

},Blades : 4,undercarriage : "fixed"

}

Discriminator column

The Tao of MongoDB{ maker : "M.V. Agusta",

type : sportsbike,engine : {

type : ”internal combustion",cylinders: 4,displacement : 750

},rake : 7,trail : 3.93

}{ maker : "M.V. Agusta",

type : Helicopterengine : {

type : "turboshaft"layout : "axial”,massflow : 1318

},Blades : 4,undercarriage : "fixed"

}

Shared indexing strategy

The Tao of MongoDB{ maker : "M.V. Agusta",

type : sportsbike,engine : {

type : ”internal combustion",cylinders: 4,displacement : 750

},rake : 7,trail : 3.93

}{ maker : "M.V. Agusta",

type : Helicopterengine : {

type : "turboshaft"layout : "axial”,massflow : 1318

},Blades : 4undercarriage : "fixed"

}

Polymorphic attributes

Relaxed ACID

• Atomic operations at the Document level

Relaxed ACID

• Atomic operations at the Document level

• Consistency – strong / eventual

Replication

Relaxed ACID

• Atomic operations at the Document level

• Consistency – strong / eventual

• Isolation - read lock, write lock / logical database

Relaxed ACID

• Atomic operations at the Document level

• Consistency – strong / eventual

• Isolation - read lock, write lock / logical database

• Durability – write ahead journal, replication

The Tao of MongoDB

• Document database

• Flexible schema

• Relaxed ACID

This favors denormalization. What’s the consequence?

Scaling MongoDB

Client Applicatio

n

Single InstanceOr

Replica Set

MongoDB

Sharded cluster

Partitioning

• User defines shard key

• Shard key defines range of data

• Key space is like points on a line

• Range is a segment of that line

The Mechanism of Sharding

Complete Data Set

Define shard key on vehicle id

3456 56781234 45672345

The Mechanism of Sharding

Chunk Chunk

Define shard key on title

3456 56781234 45672345

The Mechanism of ShardingChunk Chunk ChunkChunk

Define shard key on vehicle id

3456 56781234 45672345

Chunk Chunk ChunkChunk

Shard 1 Shard 2 Shard 3 Shard 4

3456 56781234 45672345

Define shard key on vehicle id

Shard 1 Shard 2 Shard 3 Shard 4

TargetedOperations

Client

mongos

Shard 1 Shard 2 Shard 3 Shard 4

Data Growth

Shard 1 Shard 2 Shard 3 Shard 4

Load Balancing

Relational if you need to

• Enforce data constraints

• Service a broad set of queries

• Minimize redundancy

The Tao of MongoDB

• Avoid ad-hoc queries

• Model data for use, not storage

• Index effectively, index efficiently

Engineer, 10gen

Bryan Reinero

@blimpyacht

Thank You

top related