distributing your data

Post on 12-May-2015

1.093 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A presentation on a few of the fundamental concepts behind distributed data stores. Presented at Devnation Portland 2010 http://devnation.us/events/10.

TRANSCRIPT

HELLO DEVDAY

PDX 2010

Justin Marney

HELLO DEVDAY

PDX 2010

Justin Marney

HELLO DEVDAY

Viget Labs

PDX 2010

Justin Marney

HELLO DEVDAY

Viget Labs

GMU BCS ‘06

PDX 2010

Justin Marney

Viget Labs

GMU BCS ‘06

Ruby ‘07

PDX 2010

Viget ‘08

Viget Labs

GMU BCS ‘06

Ruby ‘07

PDX 2010

PDX 2010

PDX 2010

DISTRIBUTING YOUR DATA

PDX 2010

DISTRIBUTING YOUR DATA

WHY?

PDX 2010

DISTRIBUTING YOUR DATA

WHY?

web applications are judged by their level of availability

PDX 2010

web applications are judged by their level of availability

WHY?

ability to continue operating during failure scenarios

DISTRIBUTING YOUR DATA

PDX 2010

web applications are judged by their level of availability

WHY?

ability to continue operating during failure scenarios

ability to manage availability during failure scenarios

PDX 2010

web applications are judged by their level of availability

WHY?

ability to continue operating during failure scenarios

ability to manage availability during node failure

increase throughput

PDX 2010

web applications are judged by their level of availability

ability to continue operating during failure scenarios

ability to manage availability during node failure

increase throughput

increase durability

PDX 2010

ability to continue operating during failure scenarios

ability to manage availability during node failure

increase throughput

increase durability

increase scalability

PDX 2010

SCALABILITY"I can add twice as much X and get twice as much Y."

X = processor, RAM, disks, servers, bandwidth

Y = throughput, storage space, uptime

PDX 2010

SCALABILITYscalability is a ratio.

2:2 = linear scalability ratio

scalability ratio allows you to predict how much it will cost you to grow.

PDX 2010

SCALABILITYUP/DOWN/VERTICAL/HORIZONTAL/L/R/L/R/A/B/START

PDX 2010

SCALABILITYUPgrow your infrastructuremultiple data centershigher bandwidthfaster machines

PDX 2010

SCALABILITYDOWNshrink your infrastructuremobileset-toplaptop

PDX 2010

SCALABILITYVERTICALadd to a single nodeCPURAMRAID

PDX 2010

SCALABILITYHORIZONTALadd more nodesdistribute the loadcommodity costlimited only by capital

PDX 2010

@gary_hustwit: Dear Twitter: when a World Cup match is at the 90th minute, you might want to turn on a few more servers.

PDX 2010

ASYNCHRONOUSA distributed transaction is bound by availability of all nodes.

PDX 2010

ASYNCHRONOUSA distributed transaction is bound by availability of all nodes.

(.99^1) = .99

(.99^2) = .98

(.99^3) = .97

PDX 2010

ASYNCHRONOUSAsynchronous systems operate without the concept of global state.

The concurrency model more accurately reflects the real world.

PDX 2010

ASYNCHRONOUSAsynchronous systems operate without the concept of global state.

The concurrency model more accurately reflects the real world.

What about my ACID!?

PDX 2010

Consistent

ACIDAtomic

Isolated

Durable

Series of database operations either all occur, or nothing occurs.

Transaction does not violate any integrity constraints during execution.

Cannot access data that is modified during an incomplete transaction.

Transactions that have committed will survive permanently.

PDX 2010

ACIDDefines a set of characteristics that aim to ensure consistency.

What happens when we realize that in order scale we need to distribute our data and handle asynchronous operations?

PDX 2010

ACIDWithout global state, no Atomicity.

Without a linear timeline, no transactions and no Isolation.

The canonical location of data might not exist, therefore no D.

PDX 2010

Without A, I, or D, Consistency in terms of entity integrity is no longer guaranteed.

PDX 2010

CAP TheoremEric Brewer @ 2000 Principles of Distributed Computing (PODC).

Seth Gilbert and Nancy Lynch published a formal proof in 2002.

PDX 2010

CAP AcronymConsistency: Multiple values for the same piece of data are not allowed.

Availability: If a non-failing node can be reached the system functions.

Partition-Tolerance: Regardless of packet loss, if a non-failing node is reached the system functions.

PDX 2010

CAP Theorem

Consistency, Availability, Partition-Tolerance: Choose One...

PDX 2010

Single node systems bound by CAP.

CAP Theorem

100% Partition-tolerant100% ConsistentNo Availability Guarantee

PDX 2010

Multi-node systems bound by CAP.

AP : Dynamo, no ACID

CP : Quorum, distributed databases

CA : DT, 2PC, ACID

CAP Theorem

PDX 2010

CAP doesn't say AP systems are the solution to your problem.

CAP Theorem

Most systems are a hybrid of CA, CP, & AP.

Not an absolute decision.

PDX 2010

Understand the trade-offs and use that understanding to build a system that fails predictably.

CAP Theorem

Enables you to build a system that degrades gracefully during a failure.

PDX 2010

BASEDan Pritchett

Associate for Computing Machinery Queue, 2008

BASE: An ACID Alternative

PDX 2010

Basically AvailableSoft StateEventually Consistent

BASE: An ACID Alternative

BASE

PDX 2010

Basically AvailableSoft StateEventually Consistent

BASE: An ACID Alternative

BASE

PDX 2010

Eventually ConsistentRename to Managed Consistency.

Does not mean probable or hopeful or indefinite time in the future.

Describes what happens during a failure.

PDX 2010

Eventually ConsistentDuring certain scenarios a decision must be made to either return inconsistent data or deny a request.

EC allows you control the level of consistency vs. availability in your application.

PDX 2010

Eventually ConsistentIn order to achieve availability in an asynchronous system, accept that failures are going to happen.

Understand failure points and know what you are willing to give up in order to achieve availability.

PDX 2010

How can we model the operations we perform on our data to be asynchronous & EC?

PDX 2010

Model system as a network of independent components.

Partition components along functional boundaries.

Don't interact with your data as one big global state.

PDX 2010

This doesn't meant every part of your system must operate this way!

Use ACID 2.0 to help identify and architect components than can.

PDX 2010

ACID 2.0

Commutative

Associative

Idempotent

Distributed

Order of operations does not change the result.

Operations can be aggregated in any order.

Operation can be applied multiple times without changing the result.

Operations are distributed and processed asynchronously.

PDX 2010

OPS BROSIncremental scalability

Homogeneous node responsibilities

Heterogeneous node capabilities

PDX 2010

top related