java tech & tools | beyond the data grid: coherence, normalisation, joins and linear scalability...

107
ODC Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability Ben Stopford : RBS

Upload: jax-london

Post on 15-May-2015

1.479 views

Category:

Technology


1 download

DESCRIPTION

2011-11-02 | 02:25 PM - 03:15 PMIn 2009 RBS set out to build a single store of trade and risk data that all applications in the bank could access simultaniously. This talk discusses a number of novel techniques that were developed as part of this work. Based on Oracle Coherence the ODC departs from the trend set by most caching solutions by holding its data in a normalised form making it both memory efficient and easy to change. However it does this in a novel way that supports most arbitrary queries without the usual problems associated with distributed joins. We'll be discussing these patterns as well as others that allow linear scalability, fault tolerance and millisecond latencies.

TRANSCRIPT

Page 1: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

ODCBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability

Ben Stopford : RBS

Page 2: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Page 3: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

How do you support low latency and high throughput?

Page 4: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Database Architecture is Aging

Most modern databases still follow a 1970s architecture (for example IBM’s System R)

Page 5: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

“Because RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark, then there is no market where they are competitive. As such, they should be considered as legacy technology more than a quarter of a century in age, for which a complete redesign and re-architecting is the appropriate next step.”

Michael Stonebraker (Creator of Ingres and Postgres)

Page 6: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The lay of the land: The main architectural

constructs in the database industry

Page 7: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The Traditional Architecture

• Data is on disk• Users have an allocated user

space, where intermediary results are calculated.

• The database brings data, normally via indexes, into memory and performs filters, joins, reordering and aggregation operations.

• The result is sent to the user.

Page 8: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Improving Database Performance

Shared Disk Architecture

SharedDisk

• More ‘grunt’• Suffers from issues

arising from disk and lock contention.

Page 9: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Improving Database Performance

Shared Nothing Architecture

• Massive storage potential

• Massive scalability of processing

• Commodity hardware• Limited by cross

partition joins

Page 10: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Improving Database Performance (3)

In Memory Databases

Page 11: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Memory is at least 100x faster than disk

0.000,000,000,000

μs ns psms

L1 Cache Ref

L2 Cache Ref

Main MemoryRef

1MB Main Memory

Cross Network Round Trip

Cross Continental Round Trip

1MB Disk/Network

* L1 ref is about 2 clock cycles or 0.7ns. This is the time it takes light to travel 20cm

Page 12: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The architecture of an in memory database

• All data is at your fingertips.

• Query plans become less important as there is no IO

• Intermediary results are just pointers.

Page 13: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

This makes them very fast!!

Page 14: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The proof is in the stats. TPC-H Benchmarks on a

1TB data set• Exasol: 4,253,937 QphH (In-Memory DB)• Oracle Database 11g (RAC): 1,166,976 QphH• SQL Server: 173,961QphH

Page 15: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

So why haven’t in memory databases

taken off?

Page 16: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Address-Spaces are relatively small and of a

finite, fixed size

• Is there enough memory for your all your data?

• What happens when your data grows: The ‘One more bit problem’?

Page 17: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Durability

What happens when you pull the plug?

Page 18: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Improving Database Performance (4)

Distributed In Memory (Shared Nothing)

Page 19: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Distribution Solves These Three Problems

• Solve the ‘one more bit’ problem by adding more hardware.

• Solve the durability problem with backups on another machine.

Page 20: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Improving Database Performance (5) No SQL / Distributed

Caching

Page 21: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Simplifying the Contract

Do you really need all of ACID?• Atomic• Consistent• Isolated• Durable

Page 22: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Databases have huge operational overheads

Research with Shore DB indicates only 6.8% of

instructions contribute to ‘useful work’

Taken from “OLTP Through the Looking Glass, and What We Found There” Harizopoulos et al

Page 23: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Avoid all that overhead

RAM means:• No IO• Single Threaded

No locking / latching

Page 24: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Regular DatabaseOracle, Sybase,

MySql

Scale out

Distributed

In-Memory

Exasol, VoltDB, Hana

NoSQL Mongo, Cassandra

Shared Nothing

Vertica, Greenplum

b

Scale out & drop

ACID

Data Grid Coherence,

Teracotta

Scale out & drop

Disk & ACID

In-Memory Databas

e

TimesTen, HSQL,

KDBSingle

Address

Space

Page 25: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

So what are the themes that underpin this

progression?Distributed Architecture

Simplify the Contract

Stick to RAM

Page 26: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

ODC – Distributed, Shared Nothing, In Memory, Semi-

Normalised, Graph DB450 processes

Messaging (Topic Based) as a system of record

(persistence)

2TB of RAMOracle

Coherence

Page 27: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The LayersD

ata

Layer Transactio

ns

Cashflows

Query

Layer

Mtms

Acc

ess

La

yer

Java client

API

Java client

API

Pers

iste

nce

Layer

Page 28: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Three Tools of Distributed Data Architecture

Indexing

Replication

Partitioning

Page 29: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

How should we use these tools?

Page 30: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Replication puts data everywhere

Wherever you go the data will be there

But your storage is limited by the memory on a node

Page 31: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Partitioning scalesKeys Aa-Ap

Scalable storage, bandwidth and processing

Associating data in different partitions implies moving it.

Page 32: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

What about Denormalisation?

Page 33: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Denormalisation implies the duplication of some

sub-entities

Page 34: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

…and that means managing consistency over

lots of copies

Page 35: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

…and all the duplication means you run out of space really quickly

Page 36: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Spaces issues are exaggerated further when

data is versioned

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

…and you need versioning to do MVCC

Page 37: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

And reconstituting a previous time slice

becomes very diffi cult.Trad

ePart

yTrade

r

Trade

Trade

Party

Party

Party

Trader

Trader

Page 38: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

So we want to hold entities separately

(normalised) to alleviate concerns around consistency and space usage

Page 39: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

This means the object graph will be split across

multiple machines.

Trade

Party

Trader

Trade

Party

Trader

Independently Versioned

Data is Singleton

Page 40: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Binding them back together involves a “distributed join” =>

Lots of network hops

Trade

Party

Trader

Trade

Party

Trader

Page 41: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

All these network hops make it slow

Page 42: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Whereas the denormalised model the join is already

done

Page 43: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Hence denormalisation is FAST!

(for reads)

Page 44: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

So what we want is the advantages of a normalised store at the speed of a denormalised one!

This is what using Snowflake Schemas and the Connected Replication pattern

is all about!

Page 45: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Looking more closely: Why does normalisation mean we have to spread data around the cluster. Why

can’t we hold it all together?

Page 46: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

It’s all about the keys

Page 47: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

We can collocate data with common keys but if they crosscut the only way to

collocate is to replicate

Common Keys

Crosscutting

Keys

Page 48: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

We tackle this problem with a hybrid model:

Trade

PartyTrader

Partitioned

Replicated

Page 49: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

We adapt the concept of a Snowflake Schema.

Page 50: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Taking the concept of Facts and Dimensions

Page 51: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Everything starts from a Core Fact (Trades for us)

Page 52: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Facts are Big, dimensions are small

Page 53: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Facts have one key that relates them all (used to

partition)

Page 54: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Dimensions have many keys

(which crosscut the partitioning key)

Page 55: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Looking at the data:

Facts:=>Big, common keys

Dimensions=>Small,crosscutting Keys

Page 56: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

We remember we are a grid. We should avoid the

distributed join.

Page 57: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

… so we only want to ‘join’ data that is in the same

process

Trades

MTMs

Common Key

Use a Key Assignment

Policy (e.g.

KeyAssociation in Coherence)

Page 58: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

So we prescribe different physical storage for Facts

and Dimensions

Trade

PartyTrader

Partitioned

Replicated

Page 59: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Facts are partitioned, dimensions are replicated

Data

La

yer

Transactions

Cashflows

Query

Layer

Mtms

Fact Storage(Partitioned)

Trade

PartyTrader

Page 60: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Facts are partitioned, dimensions are replicated

Transactions

Cashflows

Dimensions(repliacte)

Mtms

Fact Storage(Partitioned)

Facts(distribute/ partition)

Page 61: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The data volumes back this up as a sensible hypothesis

Facts:=>Big=>Distrib

ute

Dimensions=>Small => Replicate

Page 62: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Key Point

We use a variant on a Snowflake Schema to

partition big entities that can be related via a partitioning key and

replicate small stuff who’s keys can’t map to our

partitioning key.

Page 63: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Replicate

Distribute

Page 64: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

So how does they help us to run queries without

distributed joins?

This query involves:• Joins between Dimensions• Joins between Facts

Select Transaction, MTM, RefrenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Page 65: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

What would this look like without this pattern?

Get Cost

Centers

Get LedgerBooks

Get SourceBooks

Get Transac-tions

Get MTMs

Get Legs

Get Cost

Centers

Network

Time

Page 66: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

But by balancing Replication and Partitioning we don’t need all

those hops

Page 67: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Stage 1: Focus on the where clause:

Where Cost Centre = ‘CC1’

Page 68: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 1: Get the right keys to query the Facts

Join Dimensions in Query Layer

Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Page 69: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 2: Cluster Join to get Facts

Join Dimensions in Query Layer

Join Facts across cluster

Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Page 70: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Stage 2: Join the facts together effi ciently as we know they are

collocated

Page 71: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Transactions

Cashflows

Mtms

Partitioned Storage

Stage 3: Augment raw Facts with relevant

Dimensions

Join Dimensions in Query Layer

Join Facts across cluster

Join Dimensions in Query Layer

Select Transaction, MTM, ReferenceData From MTM, Transaction, Ref Where Cost Centre = ‘CC1’

Page 72: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Stage 3: Bind relevant dimensions to the result

Page 73: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Bringing it together:

Java client

API

Replicated Dimensions

Partitioned Facts

We never have to do a distributed join!

Page 74: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

So all the big stuff is held partitioned

And we can join without shipping keys around and

having intermediate

results

Page 75: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

We get to do this…

Trade

Party

Trader

Trade

Party

Trader

Page 76: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

…and this…

Trade

Party

Trader Version 1

Trade

Party

Trader Version 2

Trade

Party

Trader Version 3

Trade

Party

Trader Version 4

Page 77: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

..and this..

Trade

Party

Trader

Trade

Trade

Party

Party

Party

Trader

Trader

Page 78: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

…without the problems of this…

Page 79: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

…or this..

Page 80: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

..all at the speed of this… well almost!

Page 81: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford
Page 82: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

But there is a fly in the ointment…

Page 83: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

I lied earlier. These aren’t all Facts.

Facts

Dimensions

This is a dimension• It has a different

key to the Facts.• And it’s BIG

Page 84: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

We can’t replicate really big stuff… we’ll run out of space => Big Dimensions are a problem.

Page 85: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Fortunately there is a simple solution!

The Connected Replication

Pattern

Page 86: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Whilst there are lots of these big dimensions, a large majority are never used. They are not all “connected”.

Page 87: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

If there are no Trades for Goldmans in the data store then a Trade Query will never need the Goldmans Counterparty

Page 88: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Looking at the Dimension data some are quite large

Page 89: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

But Connected Dimension Data is tiny by comparison

Page 90: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

One recent independent study from the database community showed that 80% of data remains unused

Page 91: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

So we only replicate

‘Connected’ or ‘Used’ dimensions

Page 92: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

As data is written to the data store we keep our ‘Connected Caches’ up

to dateD

ata

Layer

Dimension Caches

(Replicated)

Transactions

Cashflows

Pro

cessin

g

Layer

Mtms

Fact Storage(Partitioned)

As new Facts are added relevant Dimensions that they reference are moved to processing layer caches

Page 93: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The Replicated Layer is updated by recursing through the arcs on the domain model when facts change

Page 94: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Saving a trade causes all it’s 1st level references to be triggered

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Save Trade

Partitioned Cache

Cache Store

Trigger

Page 95: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

This updates the connected caches

Trade

Party

Alias

Source

Book

Ccy

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Page 96: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The process recurses through the object graph

Trade

Party

Alias

Source

Book

Ccy

Party

LedgerBook

Data Layer(All Normalised)

Query Layer(With connected dimension Caches)

Page 97: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

‘Connected Replication’A simple pattern which recurses through the foreign keys in the

domain model, ensuring only ‘Connected’ dimensions are

replicated

Page 98: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

With ‘Connected Replication’ only 1/10th of the data

needs to be replicated (on

average).

Page 99: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Limitations of this approach

•Data set size. Size of connected dimensions limits scalability.

• Joins are only supported between “Facts” that can share a partitioning key (But any dimension join can be supported)

Page 100: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Everything is Java

Java client

APIJava schema Java ‘Stored

Procedures’ and

‘Triggers’

Page 101: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

API – Queries utilise a fluent interface

Page 102: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Performance

Query with more than

twenty joins conditions:

2GB per min / 250Mb/s

(per client)3ms latency

Page 103: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Conclusion

Data warehousing, OLTP and Distributed caching fields are all converging on in-memory architectures to get away from disk induced latencies.

Page 104: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Conclusion

Shared-Nothing architectures are always subject to the distributed join problem if they are to retain a degree of normalisation.

Page 105: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Conclusion

We present a novel mechanism for avoiding the distributed join problem by using a Star Schema to define whether data should be replicated or partitioned.

Partitioned Storage

Page 106: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

Conclusion

We make the pattern applicable to ‘real’ in-memory data models by only replicating objects that are actually used: the Connected Replication pattern.

Page 107: Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and Linear Scalability | Ben Stopford

The End• Further details online

http://www.benstopford.com (there are textual and slideshare versions of this talk)

• A big thanks to the team in both India and the UK who built this thing.

• Questions?