o'reilly webinar: simplicity scales - big data
TRANSCRIPT
1
SIMPLICITY SCALES
Big data application management & operations
SQL
SQLNoSQL
SQLNoSQLNewSQL
CHANGE IN ARCHITECTURAL DESIGN
App App App App
Virtualization
Server
App
Aggregation
Server Server Server Server
SMALL APPSBIG SERVERS
ONE LOCATION
BIG APPSCOMMODITY SERVERS
MANY LOCATIONS
In 2014, 20% of enterprise data
projects add distributed processes
into production
THE BENEFITS OF RIAK
Riak is an operationally friendly database that is:
• Fault-tolerant
• Highly-available
• Scalable
• Self-healing
THE PROPERTIES OF A DISTRIBUTED DB
Riak is a multi-model database that is:
• Open Source & Commercial
• Distributed
• Masterless
• Eventually Consistent
Why THOSE properties?
This is NOT about Riak.
This is about design decisions in distributed systems.
This IS about Riak.
And learning from Basho’s architectural decisions.
DISTRIBUTED SYSTEMS – A DEFINITION
“A distributed system is a software system in which components located on network computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal.”
--Wikipedia
DISTRIBUTED SYSTEMS – A DEFINITION
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable”
--Leslie Lamport
DISTRIBUTED SYSTEMS – A DEFINITION
“Everything works at small scale. Understand failure modalities to understand your realities.”
--Tyler Hannan
THINKING DISTRIBUTED
What we consider, when we think distributed:
• Availability
• Fault Tolerance
• Latency
UPTIME IS A POOR METRIC…
Availability
AVAILABILITY
“…widespread underestimation of the specific difficulties of size seems one of the major underlying causes of current software failure.”
--Wikipedia
THE CAP THEOREM
HARVEST AND YIELD
Harvest• a fraction• data available / complete data
Yield• a probability• queries completed / queries requested
Failure will cause known linear reduction to one of these
UNDERSTANDING “CONSISTENCY
HARVEST
YIELD
Queries Issued
Queries Offered
Data Available
Total Dataset
HARVEST AND YIELD
Traditional design demands 100% HARVEST…
but success of modern applications is often measured in YIELD.
HARVEST AND YIELD: USE CASE
HARVEST AND YIELD: USE CASE
RELATIONAL AVAILABILITY
primary
replica replica replica
coordination
Write/Read
RELATIONAL AVAILABILITY
primary replica replica
coordination
X
Write/Read
RIAK AVAILABILITY
Riak has a masterless architecture in which every node in a cluster is capable of serving read and write requests.
Each1/Nth the data1/Nth the performance
=
RIAK AVAILABILITY
Availability Requires Scalability
RELATIONAL SCALABILITY
A - K L - P Q - Z
Designed to scale vertically Cost of vertical scaling
Sharding
ADD CAPACITY AS NEEDED
Node 3
Node 4
Node 5
Node 0
Node 1
Node 2
Designed for Horizontal Scale
Deployed on commodity Hardware
Consistent Hashing
PERFECTION IS UNATTAINABLE…
FAILURES WILL HAPPEN
Fault Tolerance
FAULT TOLERANCE
How many hosts/replicas do you need to survive “F” failures?
• F + 1 – fundamental minimum
• 2F + 1 – a majority are alive
• 3F + 1 – Byzantine Fault Tolerance
NAÏVE HASHING
NH(Ka) = 0NH(Kb) = 1NH(Kc) = 2NH(Kd) = 0…
Node # = HASH(KEY) % NUM_NODES
NAÏVE HASHING
Node 0 Node 1 Node 2
Ka Kd Kg
Kj Km Kp
Kb Ke Kh
Kk Kn Kq
Kc Kf Ki
Kl Ko Kr
NAÏVE HASHING
Node 0 Node 1 Node 2
Ka Ke Ki
Km Kq
Kb Kf Kj
Kn Kr
Kc Kg Kk
Ko
Node 4
Kd Kh Kl
Kp
NAÏVE HASHING
• K = # of Keys• NN = # of Nodes
As NN grows factor essentially becomes 1, thus ALL keys move
K * (NN – 1) / NN => K
CONSISTENT HASHING
• # of Partitions remains CONSTANT• Key always maps to the SAME
Partition• Node owns Partitions• Partitions contain keys• Extra Level of Indirection
Partition # = HASH(KEY) % Partitions
CONSISTENT HASHING
P4 P7 P2 P5 P8 P3 P6 P9P1
Node 0 Node 1 Node 2
Ka Kd Kg
Kj Km Kp
Kb Ke Kh
Kk Kn Kq
Kc Kf Ki
Kl Ko Kr
CONSISTENT HASHING
P4 P7 P2 P5P8 P3 P6 P9 P1
Node 0 Node 1 Node 2
KaKd Kg
KjKm Kp
Kb KeKh
Kk KnKq
Kc Kf Ki
Kl Ko Kr
Node 4
CONSISTENT HASHING
• K = # of Keys• NN = # of Nodes• Q = # of Partitions
As K grows NN becomes constant, thus K/Q keys move
NN * K/Q => K/Q
RIAK AVAILABILITY
Node 0
Node 1
Node 2
BRIEF MILLISECONDSPHOTONS FLYING THROUGH GLASSTIME STOPS FOR NO ONE
Latency
UNDERSTANDING LATENCY
299,792,458 meters/second in a vacuum
UNDERSTANDING LATENCY
LATENCY: GOOGLE’S BIG TABLE
95th percentile: 24 ms
99.9th percentile: 994 ms
REDUNDANCY REDUCES LATENCY
IF response > 10 msTHEN send 2nd request
5% increase in total requests
99.9th percentile latency = 50 ms
UNDERSTANDING LATENCY
Overall latency is determined by latency of the SLOWEST machine.
Get data close to your users.
MULTI-CLUSTER REPLICATION
Replicate data across datacenters or across the world
UNDERSTANDING PERFORMANCE
You get fast read and write performance
What does this mean?
THE PLAN, THE ENVIRONMENT
• How do we measure performance?
• What do we measure when we measure performance?
• basho_bench• Google Cloud
CLUSTER EXPANSION
NODE FAILURE
THINKING DISTRIBUTED
What we consider, when we think distributed:
• Availability
• Fault Tolerance
• Latency
DISTRIBUTED SYSTEMS – A DEFINITION
“Everything works at small scale. Understand failure modalities to understand your realities.”
--Tyler Hannan
CV CV
NoSQLDatabase
Unstructured Data
No pre-defined Schema
Small and Large Data Sets on Commodity HW
Many Models: K/V, document store, graph
Variety of Query Methods
RELATIONAL & NOSQLWhat’s the difference?
RelationalDatabase
Structured Data
Defined Schema
Tables withRows/Columns
Indexedw/ Table Joins
SQL
THE EVOLUTION OF NOSQL
UnstructuredData Platforms
Multi-Model Solutions
Point Solutions
42% of database decision makers admit they
struggle to manage the NoSQL solutions deployed in their environments”
Riak
Spark
COMPLEX TECHNOLOGY STACK
Simplify the ComplexityEnsure High AvailabilityScale Horizontally
Enter Basho Data Platform
BASHO DATA PLATFORM?
Riak KV Client
Basho Redis Proxy
Client Application
Redis Redis Riak KV
Riak KV
Service Manager
Spark Spark
Spark Connector
Spark Client
Riak KV EnsembleBasho Data Platform
Read
Write
Read
Write
Read
Redis Services
Read
Spark Services
Leader Election
Query
Write
Read/Solr
MANAGING COMPLEXITY AT SCALE
THOUSANDS OF USERS
XfinityTV
MILLIONS OF RECORDS
Information requested and
amended more than 2.6 BILLION times
a year
42 MILLION Summary Care
Records
1.3 BILLION prescription messages
BILLIONS OF MOBILE DEVICES
10 BILLION data transactions a day – 150,000 a second
Forecasting 2.8 BILLION locations around the world
Generates 4GB OF DATA every second
?Tyler Hannan@tylerhannan
learn more at ricon.io