nosql revolution: under the covers of distributed systems at scale (spot401) | aws re:invent 2013
DESCRIPTION
The Dynamo paper started a revolution in distributed systems. The contributions from this paper are still impacting the design and practices of some of the world's largest distributed systems, including those at Amazon.com and beyond. Building distributed systems is hard, but our goal in this session is to simplify the complexity of this topic to empower the hacker in you! Have you been bitten by the eventual consistency bug lately? We show you how to tame eventual consistency and make it a great scaling asset. As you scale up, you must be ready to deal with node, rack, and data center failure. We share insights on how to limit the blast radius of the individual components of your system, battle tested techniques for simulating failures (network partitions, data center failure), and how we used core distributed systems fundamentals to build highly scalable, performance, durable, and resilient systems. Come watch us uncover the secret sauce behind Amazon DynamoDB, Amazon SQS, Amazon SNS, and the fundamental tenents that define them as Internet scale services. To turn this session into a hacker's dream, we go over design and implementation practices you can follow to build an application with virtually limitless scalability on AWS within an hour. We even share insights and secret tips on how to make the most out of one of the services released during the morning keynote.TRANSCRIPT
@ksshams @swami_79
SPOT 401 - Leading the NoSQL Revolution:
under the covers of Distributed Systems @ scale
@ksshams @swami_79
what are we covering?
The evolution of large scale
distributed systems @ Amazon from
the 90’s to today
The lessons we learned and insights you can employ in your own distributed systems
@ksshams @swami_79
let’s start with a story about a little
company called amazon.com
@ksshams @swami_79
once upon a time...
(in 2000)
episode 1
@ksshams @swami_79
a thousand miles
away...
(seattle)
@ksshams @swami_79
amazon.com - a rapidly growing Internet based retail
business relied on relational databases
@ksshams @swami_79
we had 1000s of independent services
@ksshams @swami_79
each service managed its own state in RDBMs
@ksshams @swami_79
RDBMs are actually kind of cool
@ksshams @swami_79
first of all... SQL!!
@ksshams @swami_79
so it is easier to query..
@ksshams @swami_79
easier to learn
@ksshams @swami_79
they are as versatile as a swiss army knife
complex queries
key-value access
transactions analytics
@ksshams @swami_79
RDBMs are *very* similar to
Swiss Army Knives
@ksshams @swami_79
but sometimes.. swiss army knifes..
can be more than what you bargained for
@ksshams @swami_79
partitioning
easy re-
partitioning HARD..
@ksshams @swami_79
so we bought
bigger boxes...
@ksshams @swami_79
Q4 was hard-work at Amazon
benchmark new
hardware
migrate to new
hardware
repartition databases
pray ...
@ksshams @swami_79
RDBMs availability challenges..
@ksshams @swami_79
then.. (in 2005)
episode 2
@ksshams @swami_79
amazon dynamo predecessor to
dynamoDB
specialist tool :
•limited querying capabilities
•simpler consistency
replicated DHT with consistent hashing
optimistic replication
“sloppy quorum”
anti-entropy mechanism
object versioning
@ksshams @swami_79
dynamo had many benefits • higher availability
• we traded it off for eventual consistency
• incremental scalability
• no more repartitioning
• no need to architect apps for peak
• just add boxes
• simpler querying model ==>> predictable performance
@ksshams @swami_79
but dynamo was not perfect...
lacked strong consistency
@ksshams @swami_79
but dynamo was not perfect...
scaling was easier, but...
@ksshams @swami_79
but dynamo was not perfect...
steep learning curve
@ksshams @swami_79
but dynamo was not perfect...
dynamo was a product ... ==>> not a service...
@ksshams @swami_79
then.. (in 2012)
episode 3
@ksshams @swami_79
ADMIN
“Even though we have years of experience with large, complex
NoSQL architectures, we are happy to be finally out of the
business of managing it ourselves.” - Don MacAskill, CEO
• NoSQL database
• fast & predictable performance
• seamless scalability
• easy administration
DynamoDB
@ksshams @swami_79
services, services, services
@ksshams @swami_79
amazon.com’s experience with services
@ksshams @swami_79
how do you create a successful service?
@ksshams @swami_79
with great services, comes great responsibility
@ksshams @swami_79
Architect
Customer
@ksshams @swami_79
DynamoDB Goals and Philosophies
never compromise on durability scale is our
problem easy to use
scale in rps consistent and low
latencies
@ksshams @swami_79
how to build these large scale services?
@ksshams @swami_79
don’t compromise on durability…
@ksshams @swami_79
don’t compromise on… availability
@ksshams @swami_79
plan for success, plan for scalability
@ksshams @swami_79
@ksshams @swami_79
Fault tolerant design is key.. • Everything fails all the time
• Planning for failures is not easy
• How do you ensure your recovery strategies work correctly?
@ksshams @swami_79
Byzantine General Problem
@ksshams @swami_79
A simple 2-way replication system of a traditional database…
Primary Standby
Writes
@ksshams @swami_79 @ksshams @swami_79
P S
S is dead, need to trigger new
replica
P is dead, need to promote myself
P’
@ksshams @swami_79 @ksshams @swami_79
Improved Replication: Quorum
Replica
Replica
Writes Replica
Quorum: Successful write on a majority
Not so easy..
Replica B
Replica C
Writes from
client A Replica A
Replica D
New member in the
group
Should I continue to serve reads?
Should I start a new quorum?
Replica E Replica F
Reads and
Writes from
client B
Classic Split Brain Issue in Replicated systems leading to lost writes!
@ksshams @swami_79
Building correct distributed systems is not straight forward..
• How do you handle replica failures?
• How do you ensure there is not a parallel
quorum?
• How do you handle partial failures of replicas?
• How do you handle concurrent failures?
correctness is hard, but necessary
Formal Methods
Formal Methods
to minimize bugs, we must have a precise description of the design
Formal Methods
code is too detailed
how would you express partial failures or concurrency?
design documents and diagrams are vague & imprecise
Formal Methods
law of large numbers is your friend,
so design for scale
until you hit large numbers
@ksshams @swami_79
TLA+ to the rescue?
@ksshams @swami_79
PlusCal
@ksshams @swami_79
formal methods are necessary
but not sufficient..
@ksshams @swami_79
customer
@ksshams @swami_79
forget to test - no, serious don’t ly
embrace failure and don’t be surprised
simulate failures at unit
test level
fault injection testing
datacenter testing
network brown out testing
scale testing
testing is a lifelong journey
@ksshams @swami_79
testing is necessary but not sufficient..
@ksshams @swami_79
Architect
Customer
@ksshams @swami_79
release cycle
gamma simulate real
world
one box does it work?
phased deployment
treading lightly
monitor does it still
work?
@ksshams @swami_79
Monitor customer behavior
@ksshams @swami_79
measuring customer experience is key
don’t be satisfied by average - look at
99 percentile
@ksshams @swami_79
understand the scaling dimensions
@ksshams @swami_79
understand how your service will be abused
@ksshams @swami_79
let’s see these rules in action through a true story
@ksshams @swami_79
we were building distributed systems all over
amazon.com
@ksshams @swami_79
we needed a uniform and correct way to do
consensus..
@ksshams @swami_79
so we built a paxos lock library service
@ksshams @swami_79
such a service is so much more useful than just leader election.. it became a distributed
state store
@ksshams @swami_79
such a service is so much more useful than just leader election.. or a distributed state
store
wait wait.. you’re telling me if I poll,
I can detect node failure?
@ksshams @swami_79
we acted quickly - and scaled up our entire fleet
with more nodes
doh!!!!
we slowed consensus...
@ksshams @swami_79
understand the scaling dimensions
& scale them independently...
@ksshams @swami_79
State Store
a lock service has 3 components..
@ksshams @swami_79
State Store
they must be scaled independently..
@ksshams @swami_79
State Store
they must be scaled independently..
@ksshams @swami_79
State Store
they must be scaled independently..
Let’s Go Over The demo from this morning
stream ingestion
stream ingestion
stream ingestion
Real-time tweet analytics using DynamoDB
• Stream from Kinesis to DynamoDB
• What data do want in real-time? • (per-second, top words)
• How does DynamoDB help? • Atomic counters (per-word counts in that second)
• Indexed queries (top N word-counts in that second
Time Word Count
2013-10-13T12:00 Earth 9
2013-10-13T12:00 Mars 10
2013-10-13T12:00 Pluto 5
2013-10-13T12:03 Earth 8
Time Count Word
2013-10-13T12:00 5 Pluto
2013-10-13T12:00 9 Earth
2013-10-13T12:00 10 Mars
2013-10-13T12:03 8 Earth
WordCount Table Local Secondary Index
DynamoDB cost: $0.25 / hr
stream ingestion
Aggregate queries using Redshift
• Simple Redshift connector (buffer files, store in s3, call copy command)
• Manifest copy connector • 2 streams
• transaction table for deduplication
• manifest copy
Right tool for right job…
• Canal -> DynamoDB -> Redshift -> Glacier…
You are not done yet..
• Listen to customer feedback
• Iterate..
Example: DynamoDB
• Start with immediate needs of reliable, super scalable, low latency datastore
• Iterate • Developers wanted flexible query: Local Secondary Indexes
• Developers wanted parallel loads: Parallel Scans
• Mobile developers wanted direct access to their datastore: Fine-grained Access Control
• Mobile developers wanted geo-awareness: Geospatial library
• Developers wanted DynamoDB on their laptop: DynamoDB Local
• Developers wanted richer query: Global Secondary Indexes
• We will continue to innovate..
@ksshams @swami_79 @ksshams @swami_79
Sacred Tenets in Distributed Systems
plan for success – plan for scalability
don’t compromise durability for performance
plan for failures - fault -tolerance is key consistent performance
is important release - think of blast radius
insist on correctness
@ksshams @swami_79
understand scaling
dimensions
observe how service is
used
scalability
over features
strive for
correctness
relentlessly test
monitor like a hawk
@ksshams @swami_79
Please give us your feedback on this
presentation
Don’t miss SPOT 201!!!
SPOT 401
@ksshams @swami_79