nosql revolution: under the covers of distributed systems at scale (spot401) | aws re:invent 2013

Post on 01-Nov-2014

1.918 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

The Dynamo paper started a revolution in distributed systems. The contributions from this paper are still impacting the design and practices of some of the world's largest distributed systems, including those at Amazon.com and beyond. Building distributed systems is hard, but our goal in this session is to simplify the complexity of this topic to empower the hacker in you! Have you been bitten by the eventual consistency bug lately? We show you how to tame eventual consistency and make it a great scaling asset. As you scale up, you must be ready to deal with node, rack, and data center failure. We share insights on how to limit the blast radius of the individual components of your system, battle tested techniques for simulating failures (network partitions, data center failure), and how we used core distributed systems fundamentals to build highly scalable, performance, durable, and resilient systems. Come watch us uncover the secret sauce behind Amazon DynamoDB, Amazon SQS, Amazon SNS, and the fundamental tenents that define them as Internet scale services. To turn this session into a hacker's dream, we go over design and implementation practices you can follow to build an application with virtually limitless scalability on AWS within an hour. We even share insights and secret tips on how to make the most out of one of the services released during the morning keynote.

TRANSCRIPT

@ksshams @swami_79

SPOT 401 - Leading the NoSQL Revolution:

under the covers of Distributed Systems @ scale

@ksshams @swami_79

what are we covering?

The evolution of large scale

distributed systems @ Amazon from

the 90’s to today

The lessons we learned and insights you can employ in your own distributed systems

@ksshams @swami_79

let’s start with a story about a little

company called amazon.com

@ksshams @swami_79

once upon a time...

(in 2000)

episode 1

@ksshams @swami_79

a thousand miles

away...

(seattle)

@ksshams @swami_79

amazon.com - a rapidly growing Internet based retail

business relied on relational databases

@ksshams @swami_79

we had 1000s of independent services

@ksshams @swami_79

each service managed its own state in RDBMs

@ksshams @swami_79

RDBMs are actually kind of cool

@ksshams @swami_79

first of all... SQL!!

@ksshams @swami_79

so it is easier to query..

@ksshams @swami_79

easier to learn

@ksshams @swami_79

they are as versatile as a swiss army knife

complex queries

key-value access

transactions analytics

@ksshams @swami_79

RDBMs are *very* similar to

Swiss Army Knives

@ksshams @swami_79

but sometimes.. swiss army knifes..

can be more than what you bargained for

@ksshams @swami_79

partitioning

easy re-

partitioning HARD..

@ksshams @swami_79

so we bought

bigger boxes...

@ksshams @swami_79

Q4 was hard-work at Amazon

benchmark new

hardware

migrate to new

hardware

repartition databases

pray ...

@ksshams @swami_79

RDBMs availability challenges..

@ksshams @swami_79

then.. (in 2005)

episode 2

@ksshams @swami_79

amazon dynamo predecessor to

dynamoDB

specialist tool :

•limited querying capabilities

•simpler consistency

replicated DHT with consistent hashing

optimistic replication

“sloppy quorum”

anti-entropy mechanism

object versioning

@ksshams @swami_79

dynamo had many benefits • higher availability

• we traded it off for eventual consistency

• incremental scalability

• no more repartitioning

• no need to architect apps for peak

• just add boxes

• simpler querying model ==>> predictable performance

@ksshams @swami_79

but dynamo was not perfect...

lacked strong consistency

@ksshams @swami_79

but dynamo was not perfect...

scaling was easier, but...

@ksshams @swami_79

but dynamo was not perfect...

steep learning curve

@ksshams @swami_79

but dynamo was not perfect...

dynamo was a product ... ==>> not a service...

@ksshams @swami_79

then.. (in 2012)

episode 3

@ksshams @swami_79

ADMIN

“Even though we have years of experience with large, complex

NoSQL architectures, we are happy to be finally out of the

business of managing it ourselves.” - Don MacAskill, CEO

• NoSQL database

• fast & predictable performance

• seamless scalability

• easy administration

DynamoDB

@ksshams @swami_79

services, services, services

@ksshams @swami_79

amazon.com’s experience with services

@ksshams @swami_79

how do you create a successful service?

@ksshams @swami_79

with great services, comes great responsibility

@ksshams @swami_79

Architect

Customer

@ksshams @swami_79

DynamoDB Goals and Philosophies

never compromise on durability scale is our

problem easy to use

scale in rps consistent and low

latencies

@ksshams @swami_79

how to build these large scale services?

@ksshams @swami_79

don’t compromise on durability…

@ksshams @swami_79

don’t compromise on… availability

@ksshams @swami_79

plan for success, plan for scalability

@ksshams @swami_79

@ksshams @swami_79

Fault tolerant design is key.. • Everything fails all the time

• Planning for failures is not easy

• How do you ensure your recovery strategies work correctly?

@ksshams @swami_79

Byzantine General Problem

@ksshams @swami_79

A simple 2-way replication system of a traditional database…

Primary Standby

Writes

@ksshams @swami_79 @ksshams @swami_79

P S

S is dead, need to trigger new

replica

P is dead, need to promote myself

P’

@ksshams @swami_79 @ksshams @swami_79

Improved Replication: Quorum

Replica

Replica

Writes Replica

Quorum: Successful write on a majority

Not so easy..

Replica B

Replica C

Writes from

client A Replica A

Replica D

New member in the

group

Should I continue to serve reads?

Should I start a new quorum?

Replica E Replica F

Reads and

Writes from

client B

Classic Split Brain Issue in Replicated systems leading to lost writes!

@ksshams @swami_79

Building correct distributed systems is not straight forward..

• How do you handle replica failures?

• How do you ensure there is not a parallel

quorum?

• How do you handle partial failures of replicas?

• How do you handle concurrent failures?

correctness is hard, but necessary

Formal Methods

Formal Methods

to minimize bugs, we must have a precise description of the design

Formal Methods

code is too detailed

how would you express partial failures or concurrency?

design documents and diagrams are vague & imprecise

Formal Methods

law of large numbers is your friend,

so design for scale

until you hit large numbers

@ksshams @swami_79

TLA+ to the rescue?

@ksshams @swami_79

PlusCal

@ksshams @swami_79

formal methods are necessary

but not sufficient..

@ksshams @swami_79

customer

@ksshams @swami_79

forget to test - no, serious don’t ly

embrace failure and don’t be surprised

simulate failures at unit

test level

fault injection testing

datacenter testing

network brown out testing

scale testing

testing is a lifelong journey

@ksshams @swami_79

testing is necessary but not sufficient..

@ksshams @swami_79

Architect

Customer

@ksshams @swami_79

release cycle

gamma simulate real

world

one box does it work?

phased deployment

treading lightly

monitor does it still

work?

@ksshams @swami_79

Monitor customer behavior

@ksshams @swami_79

measuring customer experience is key

don’t be satisfied by average - look at

99 percentile

@ksshams @swami_79

understand the scaling dimensions

@ksshams @swami_79

understand how your service will be abused

@ksshams @swami_79

let’s see these rules in action through a true story

@ksshams @swami_79

we were building distributed systems all over

amazon.com

@ksshams @swami_79

we needed a uniform and correct way to do

consensus..

@ksshams @swami_79

so we built a paxos lock library service

@ksshams @swami_79

such a service is so much more useful than just leader election.. it became a distributed

state store

@ksshams @swami_79

such a service is so much more useful than just leader election.. or a distributed state

store

wait wait.. you’re telling me if I poll,

I can detect node failure?

@ksshams @swami_79

we acted quickly - and scaled up our entire fleet

with more nodes

doh!!!!

we slowed consensus...

@ksshams @swami_79

understand the scaling dimensions

& scale them independently...

@ksshams @swami_79

State Store

a lock service has 3 components..

@ksshams @swami_79

State Store

they must be scaled independently..

@ksshams @swami_79

State Store

they must be scaled independently..

@ksshams @swami_79

State Store

they must be scaled independently..

Let’s Go Over The demo from this morning

stream ingestion

stream ingestion

stream ingestion

Real-time tweet analytics using DynamoDB

• Stream from Kinesis to DynamoDB

• What data do want in real-time? • (per-second, top words)

• How does DynamoDB help? • Atomic counters (per-word counts in that second)

• Indexed queries (top N word-counts in that second

Time Word Count

2013-10-13T12:00 Earth 9

2013-10-13T12:00 Mars 10

2013-10-13T12:00 Pluto 5

2013-10-13T12:03 Earth 8

Time Count Word

2013-10-13T12:00 5 Pluto

2013-10-13T12:00 9 Earth

2013-10-13T12:00 10 Mars

2013-10-13T12:03 8 Earth

WordCount Table Local Secondary Index

DynamoDB cost: $0.25 / hr

stream ingestion

Aggregate queries using Redshift

• Simple Redshift connector (buffer files, store in s3, call copy command)

• Manifest copy connector • 2 streams

• transaction table for deduplication

• manifest copy

Right tool for right job…

• Canal -> DynamoDB -> Redshift -> Glacier…

You are not done yet..

• Listen to customer feedback

• Iterate..

Example: DynamoDB

• Start with immediate needs of reliable, super scalable, low latency datastore

• Iterate • Developers wanted flexible query: Local Secondary Indexes

• Developers wanted parallel loads: Parallel Scans

• Mobile developers wanted direct access to their datastore: Fine-grained Access Control

• Mobile developers wanted geo-awareness: Geospatial library

• Developers wanted DynamoDB on their laptop: DynamoDB Local

• Developers wanted richer query: Global Secondary Indexes

• We will continue to innovate..

@ksshams @swami_79 @ksshams @swami_79

Sacred Tenets in Distributed Systems

plan for success – plan for scalability

don’t compromise durability for performance

plan for failures - fault -tolerance is key consistent performance

is important release - think of blast radius

insist on correctness

@ksshams @swami_79

understand scaling

dimensions

observe how service is

used

scalability

over features

strive for

correctness

relentlessly test

monitor like a hawk

@ksshams @swami_79

Please give us your feedback on this

presentation

Don’t miss SPOT 201!!!

SPOT 401

@ksshams @swami_79

top related