data is a heart of scalability

35
1 Grid Dynamics – Scaling Mission-Critical Systems [email protected] December, 2009 Data is a heart of scalable system (base of “Philosophy of transaction processing”)

Upload: eugene-steinberg

Post on 14-Dec-2014

608 views

Category:

Technology


0 download

DESCRIPTION

General discussion of problems of scalability wrt to data access patterns and systems

TRANSCRIPT

Page 1: Data Is A Heart Of Scalability

1Grid Dynamics – Scaling Mission-Critical Systems

[email protected], 2009

Data is a heart of scalable system (base of “Philosophy of transaction processing”)

Page 2: Data Is A Heart Of Scalability

Read vs. write dialectic

Page 3: Data Is A Heart Of Scalability

3Grid Dynamics – Scaling Mission-Critical Systems

Read vs write

Processing of business transaction

involves both read and write

operations. These operations impose

contradictory requirements for data

structures and architecture.

Page 4: Data Is A Heart Of Scalability

4Grid Dynamics – Scaling Mission-Critical Systems

Data schema

Normalized DenormalizedRead

Bad. Complex queries,

joins are slow

Good. Fast queries,

no joins, simple queries

WriteGood.

Non contradicting, less rows to update

Bad. Potential inconsistency, more rows to update,

complex update procedures

Page 5: Data Is A Heart Of Scalability

5Grid Dynamics – Scaling Mission-Critical Systems

Redundancy

Single copy Multiple copiesRead

Bad. Bottleneck

Good. Balancing of load between copies

WriteGood.

No consistency problems,

Single place to update

Bad. Multiple places to

update, synchronization and

consistency problems

Page 6: Data Is A Heart Of Scalability

6Grid Dynamics – Scaling Mission-Critical Systems

Storage

Goo

d fo

r re

ad

Good for write

RDBMS

Key/valueDoc oriented

File system

MQ

Page 7: Data Is A Heart Of Scalability

7Grid Dynamics – Scaling Mission-Critical Systems

Message queue as a Storage

Sending message = write operationConsuming message = read operation (with very limited semantics)

Durable subscriptionsTransaction support

MQ does not need to keep indexes, and may write transactions on disk

extremely fast

Page 8: Data Is A Heart Of Scalability

8Grid Dynamics – Scaling Mission-Critical Systems

Storage media

Magnetic disks Slow Persistent

Dynamic memory Very Fast Volatile

Flash memory – starting to change IT landscape Fast Persistent

Page 9: Data Is A Heart Of Scalability

9Grid Dynamics – Scaling Mission-Critical Systems

RDBMS are not sleeping

MQ gets integrated into the core of RDBMS

MySQL, Postgres, BerkleyDB are starting to move

From anemic storage – to processing facility

In-memory operations

Oracle TimesTen

Materialized views

Page 10: Data Is A Heart Of Scalability

Distribution

Page 11: Data Is A Heart Of Scalability

11Grid Dynamics – Scaling Mission-Critical Systems

You have to go distributed

You cannot avoid building distributed system.

Fault tolerance System should survive server failure

Scaling Resources of single server are limited

Globalization Modern business is distributed

Page 12: Data Is A Heart Of Scalability

12Grid Dynamics – Scaling Mission-Critical Systems

Network

Network is your enemy, never forget it

Network is unreliableNetwork is slowNetwork has limited bandwidth

Also network iterations require complex data format transformation – HTTP + SSL + XML may kill performance in

blink of an eye

Page 13: Data Is A Heart Of Scalability

13Grid Dynamics – Scaling Mission-Critical Systems

Network vs. disk access to dataNetwork Magnetic disk

LatencyLess (~1ms)

Seek time (~10ms) if not cached

Random access

Good, unless large number of separate

small iterations is used

Bad, high seek time

Bandwidth Limited, network infrastructure is

shared

Higher, if no seek required – very high

throughput

Page 14: Data Is A Heart Of Scalability

Data as a state of the system

Page 15: Data Is A Heart Of Scalability

15Grid Dynamics – Scaling Mission-Critical Systems

Validity of state

Valid = has an interpretation which make sense from business or operational prospective

It should look meaningful, it doesn’t matter what happens inside.

Contradiction is not a contradiction as long a we know how to resolve it

Page 16: Data Is A Heart Of Scalability

16Grid Dynamics – Scaling Mission-Critical Systems

Time is just another axis

Speed of light is limited, but it does not make

stars less beautiful, even if their light is a

thousands years old.

There is no need to force all changes instantly,

offloading operations for asynchronous and

batch processing are powerful method to

increase performance and deal with peak-

loads.

Page 17: Data Is A Heart Of Scalability

17Grid Dynamics – Scaling Mission-Critical Systems

What data do we usually have?

Transactional Data changing dynamically (either by us or 3rd

party)

Static data Data is changing not that often Can be treated as immutable for most

operations

Page 18: Data Is A Heart Of Scalability

18Grid Dynamics – Scaling Mission-Critical Systems

Operations on data

Operational data May include transactional and static data Complex read operations Low latency requirements

Transactional data Intensive write operation Low latency requirements

Secondary operations (e.g. daily reporting) Read access to entire scope of information Operations over large datasets

Page 19: Data Is A Heart Of Scalability

19Grid Dynamics – Scaling Mission-Critical Systems

Data structures

Data structures should be tailored for access pattern.

How to deal with read/write contradiction?

Use two storagesSynchronize data between storages Synchronously Asynchronously Periodically

Page 20: Data Is A Heart Of Scalability

20Grid Dynamics – Scaling Mission-Critical Systems

StoragesOperational log storage

Transactional data only Fast writes, limited read (e.g. in case of recovery only)

Operational view Transactional + static data Tailored for business logic queries Write intensive

Long term storage Every data in system Required for migration backup/restore of information Suitable for analytics and ad hoc

Page 21: Data Is A Heart Of Scalability

21Grid Dynamics – Scaling Mission-Critical Systems

Sweet spot of data grids

IMDG is very fast at simple queries IMDG has great write throughput

This makes IMDG ideal solution as a “view” of operational data.

We can tailor data structures for queries

No need for persistence

Page 22: Data Is A Heart Of Scalability

22Grid Dynamics – Scaling Mission-Critical Systems

Long term storage

RDBMS is unbeatable in this field

Complex analytic queries

Ad hoc queries

Trust in big vendors

Asynchronous synchronization works best

here

Page 23: Data Is A Heart Of Scalability

Inventing a bicycle

General transaction processing system revisited

Page 24: Data Is A Heart Of Scalability

24Grid Dynamics – Scaling Mission-Critical Systems

Transaction processing style

Synchronous processing We can return transaction acknowledge to

client, only when we can guarantee that transaction is successful and durable

Asynchronous processing We acknowledge only the fact of starting

business transaction

I both cases we have to fixate a request before acknowledge it

Page 25: Data Is A Heart Of Scalability

25Grid Dynamics – Scaling Mission-Critical Systems

A bicycle

1. Fixate incoming request (optional)

2. Acknowledge (async processing)3. Operational processing4. Fixate operation result

Operation log Operational view Data warehouse

1

4

2

5

3 6

Downstreamsystems

7

7

5. Response (sync processing)6. Update operational view7. Backend processing - updating warehouse - working with down streams

Long term storage

Page 26: Data Is A Heart Of Scalability

26Grid Dynamics – Scaling Mission-Critical Systems

Operational log

Files Backup solution required Local access only

DBMS Low transaction throughput Index management overhead

In-memory/IMDG Not durable

MQ Best fit?

Page 27: Data Is A Heart Of Scalability

27Grid Dynamics – Scaling Mission-Critical Systems

Operational view

RDBMS – possible, with some tuning

Normalized data model is efficient

Disk slow, but there are in-memory options

Key/Value DBMS – possible

Not so high write throughput

In-memory/IMDG – best fit

Limited capacity

Page 28: Data Is A Heart Of Scalability

28Grid Dynamics – Scaling Mission-Critical Systems

Long term storage

RDBMS – monopoly

Key/Value – possible

But long term storage anticipates schema

and strong consistency

Page 29: Data Is A Heart Of Scalability

29Grid Dynamics – Scaling Mission-Critical Systems

Different approach

Traditional design – one size fits all We need to design storage good for both read and

write operations. We are working against physics here

Multiple layer storage One storage optimized for write and reliability * One storage optimized for read operation Synchronization between storages

We replaced one impossible problem with 3 hard problems But it is clear how to solve each of

them and such solutions can be reused.* Only for write intensive applications

Page 30: Data Is A Heart Of Scalability

IMDG

Page 31: Data Is A Heart Of Scalability

31Grid Dynamics – Scaling Mission-Critical Systems

Weak No SQL “In-memory” is not so fast

Flash memory technology Strong believe in hardware

In-memory RDBMS (though limited to single server)

Network is slow Multiple network round trips per request may ruin

performance Bandwidth is limited

RDBMS is better with complex queries

Page 32: Data Is A Heart Of Scalability

32Grid Dynamics – Scaling Mission-Critical Systems

Strong - data

Schema should be adopted Denormalized – single lookup per

operation Data affinity is your friend

… but if you cook a right schema: True horizontal scale out Fast operations Great write throughput

And it scales!

Page 33: Data Is A Heart Of Scalability

33Grid Dynamics – Scaling Mission-Critical Systems

Strong - distribution

Addresses headaches of distributed systemsCoordination of work

Keeping cluster together Node communications Failover – (IMDG facilitates availability)

Dealing with state (data) Data bottleneck problems Data consistency – (IMDG provides

consistency)

+ Recovery from failure

Page 34: Data Is A Heart Of Scalability

34Grid Dynamics – Scaling Mission-Critical Systems

The CAP theorem

* Introduced in 2000 by Eric Brewer, formally proven by Seth Gilbert and Nancy Lynch in 2002

CConsistency

AAvailability

PPartitiontolerance

IMDG

Quorum based

systems

Eventual consistencywith conflict resolution

Page 35: Data Is A Heart Of Scalability

35Grid Dynamics – Scaling Mission-Critical Systems

Q&A