flash economics and lessons learned from operating low latency platforms at high throughput with...

28
© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 1 Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability FLASH ECONOMICS AND LESSONS LEARNED FROM OPERATING LOW LATENCY AND HIGH TPS PLATFORMS IN-MEMORY NOSQL DR. V. SRINIVASAN FOUNDER, VP ENGINEERING & OPERATIONS GITPRO APRIL 12, 2014

Upload: aerospike-inc

Post on 15-Jan-2015

1.123 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 1

Aerospike aer . o . spike [air-oh- spahyk] noun, 1. tip of a rocket that enhances speed and stability

FLASH ECONOMICS AND LESSONS LEARNED

FROM OPERATING LOW LATENCY AND HIGH TPS

PLATFORMS

IN-MEMORY NOSQL

DR. V. SRINIVASANFOUNDER, VP ENGINEERING &

OPERATIONS

GITPROAPRIL 12, 2014

Page 2: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 2

REQUIREMENTS FOR INTERNET ENTERPRISES

1. Know who the Interaction is with■ Monitor 200+ Million US Consumers,

5+ Billion mobile devices and sensors

2. Determine intent based on current context

■ Page views, search terms, game state, last purchase, friends list, ads served, location

3. Respond now, use big data for more accurate decisions■ Display the most relevant Ad■ Recommend the best product■ Deliver the richest gaming experience■ Eliminate fraud…

4. Service can NEVER go down!

Page 3: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 3

INTERNET ENTERPRISES

RETAILE-COMMERCE

MOBILE

OMNICHANNEL GAMIN

G

WEB

VIDEO

SOCIAL

SEARCH

EMAIL

Page 4: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 4

Response time: Hours, WeeksTB to PBRead Intensive

TRANSACTIONS (OLTP)

Response time: SecondsGigabytes of data

Balanced Reads/Writes

ANALYTICS (OLAP)

STRUCTURED DATA

Response time: Seconds

Terabytes of dataRead Intensive

BIG DATA ANALYTICS

Real-time TransactionsResponse time: < 10 ms1-20 TBBalanced Reads/Writes24x7x365 Availability

UNSTRUCTURED DATA

REAL-TIME BIG DATA

DATABASE LANDSCAPE

Page 5: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 5

Aerospike recognized as the only company in the Visionaries Quadrant in Gartner's Magic Quadrantfor Operational Database Management Systems

Gartner, Magic Quadrant for Operational Database Management Systems Donald Fienberg et al.October 23, 2013

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available at www.aerospike.com .Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Page 6: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 6

MILLIONS OF CONSUMERSBILLIONS OF DEVICES

AEROSPIKE CLUSTER

APP SERVERS RDBMS

DATA WAREHOUSE

SEGMENTS

WRITE REAL-TIME CONTEXTREAD RECENT CONTENT

PROFILE STORECookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms...

REAL-TIME ANALYTICS Best sellers, top scores, trending tweets

BATCH ANALYTICSDiscover patterns, segment data: location patterns, audience affinity

TYPICAL REAL-TIME DATABASE DEPLOYMENT

TRANSACTIONS

WRITE CONTEXT

Page 7: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 7

KEY CHALLENGES1. Handle extremely high rates of read/write

transactions over persistent data

2. Avoid hot spots to maintain tight latency SLAs

3. Provide immediate consistency with replication

4. Ensure long running tasks do not slow down transactions

5. Scale linearly as data sizes and workloads increase

6. Add capacity with no service interruption

Page 8: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 8

SYSTEM ARCHITECTURE FOR 100% UPTIME

Page 9: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 9

SHARED-NOTHING SYSTEM:100% DATA AVAILABILITY■ Every node in a cluster is identical,

handles both transactions and long running tasks

■ Data is replicated synchronously with immediate consistency within the cluster

■ Data is replicated asynchronously across data centers

OHIO Data Center

Page 10: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 10

ROBUST DHT TO ELIMINATE HOT SPOTSHow Data Is Distributed (Replication Factor 2)

■ Every key is hashed into a 20 byte (fixed length) string using the RIPEMD160 hash function

■ This hash + additional data (fixed 64 bytes)are stored in RAM in the index

■ Some bits from this hash value are used to compute the partition id

■ There are 4096 partitions

■ Partition id maps to node id based on cluster membership

cookie-abcdefg-12345678cookie-abcdefg-12345678

182023kh15hh3kahdjsh182023kh15hh3kahdjsh

PartitionID

Master node

Replica node

… 1 4

1820 2 3

1821 3 2

4096 4 1

Page 11: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 11

REAL-TIME PRIORITIZATION TO MEET SLA

1. Write sent to row master

2. Latch against simultaneous writes

3. Apply write to master memory and replica memory synchronously

4. Queue operations to disk

5. Signal completed transaction (optional storage commit wait)

6. Master applies conflict resolution policy (rollback/ rollforward)

master replica

1. Cluster discovers new node via gossip protocol

2. Paxos vote determines new data organization

3. Partition migrations scheduled

4. When a partition migration starts, write journal starts on destination

5. Partition moves atomically

6. Journal is applied and source data deleted

transactions continue

Writing with Immediate Consistency Adding a Node

Page 12: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 12

INTELLIGENT CLIENT TO MAKE APPS SIMPLERShield Applications from the Complexity of the Cluster■ Implements Aerospike API

■Optimistic row locking■Optimized binary protocol

■ Cluster tracking ■Learns about cluster changes,

partition map■Gossip protocol

■ Transaction semantics■Global transaction ID■Retransmit and timeout

■ Linear scale■No extra hop■No load balancers

Page 13: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 13

OTHER DATABASE

OS FILE SYSTEM

PAGE CACHE

BLOCK INTERFACE

SSD HDD

BLOCK INTERFACE

SSD SSD

OPEN NVM

SSD

OTHER DATABASE

AEROSPIKE FLASH OPTIMIZEDIN-MEMORY DATABASE

Ask me and I’ll tell you the answer.

Ask me. I’ll look up the answer and then tell it to you.

AEROSPIKE

HYBRID MEMORY SYSTEM™

• Direct device access• Large Block Writes• Indexes in DRAM• Highly Parallelized• Log-structured FS “copy-on-write”• Fast restart with shared memory

FLASH OPTIMIZED HIGHPERFORMANCE

Page 14: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 14

Storage type DRAM & NoSQL SSD & DRAMStorage per server 180 GB (196 GB Server) 2.4 GB (4 x 700 GB)

TPS per server 500,000 500,000Cost per server $8,000 $11,000

Server costs $1,488,000 $154,000Power/server 0.9 kW 1.1 kW

Power (2 years) $0.12 per kWh ave. US

$352,000 $32,400

Maintenance (2 years) $3,600 per server

$670,000 $50,400

Total $2,510,000 $236,800

FLASH PROVIDES DRAM-LIKE PERFORMANCE WITHMUCH LOWER COMPLEXITY & TCO

Actual customer analysis.Customer requires 500K TPS,

10 TB of storage, with 2x replication factor.

186 SERVERS REQUIRED 14 SERVERS REQUIRED

OTHER DATABASES

ONLY

Page 15: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 15

HOT ANALYTICS BY ROW

Page 16: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 16

SECONDARY INDEXES IN MEMORY

■ Fast■Indexes in DRAM,

Data on Flash■No hotspots, Index-Data

balanced across the cluster■Parallel processing across

nodes, cores & SSDs

■ Reliable■Index and Data co-located

to manage data migrations and guarantee ACID

■Lock-free MVCC

Secondary Index

Secondary Index

Primary Index

Record ValuesRecord Values

DRAM

SSDServer

Client

Page 17: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 17

LOW SELECTIVITY INDEX QUERIES

1. Query sent to ALL nodes in parallel

“SCATTER”

2. Secondary Index keys in DRAM ■ Map to Primary keys in DRAM■ Co-located with Record on SSD

3. Records read in parallel from ALL SSDs

4. Parallel read results aggregated on node

5. Results from ALL nodesaggregated client-side

“GATHER”

Secondary KeysSecondary Keys

Primary Keys

Records R1, R2

Records R1, R2

DRAM

SSDServer

Client

Keys

Keys

R3, R4R3, R4

R5, R4R5, R4

V1V1 V2V2 V3V3 V4V4 V5V5 V6V6

Page 18: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 18

SQL & NoSQL

➤ Secondary index Equality, Range, IN (,,,), Compound

e.g. WHERE group_id = 1234,

WHERE last_activity > 1349293398,

WHERE branch_id IN (5,6,7,8)

➤ Filters SQL: Where clause with non-indexed

“AND”s (e.g. “AND gender=‘M’ ”)

NOSQL: Map step

➤ Aggregation SQL: GROUP BY, ORDER BY, LIMIT,

OFFSET

NOSQL: Reduce step

Secondary KeySecondary Key

Primary Key

RecordRecord

FilterFilter MapMap

Aggregate

Aggregate

DRAM

SSD

Aggregate

Aggregate

Client

Client

Server

Reduce Reduce

Aggregate

Aggregate

QueryQuery

Page 19: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 19

ROW BASED SCHEDULING

■ Due to caching and blocks, most system resource consumption is per row

( Flash is in-memory )

■ Rows are fine grained

■ Scheduler is “local” only

■ Deadline scheduling

■ Per-query priority (per transaction timeout)

Secondary Index

Secondary Index

Primary Index

Record ValuesRecord ValuesSSD

Server

Client

Hot analytics

Operational

Priority Q

Page 20: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 20

LESSONS LEARNED

Page 21: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 21

Native Flash Performance

Balanced Read-Heavy0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

Aerospike CassandraMongoDB Couchbase 2.0*

*We were forced to exclude Couchbase...since when run with either disk or replica durability on it was unable to complete the test.” – Thumbtack Technology

0 50,000 100,000 150,000 200,0000

2.5

5

7.5

10Balanced Workload Read Latency

AerospikeCas-sandraMongoDB

Throughput, ops/sec

Avera

ge L

ate

ncy,

ms

0 50,000 100,000 150,000 200,0000

4

8

12

16Balanced Workload Update Latency

AerospikeCas-sandraMongoDB

Throughput, ops/sec

Avera

ge L

ate

ncy,

ms

HIGH THROUGHPUT LOW LATENCY

Th

rou

gh

pu

t, T

PS

Page 22: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 22

High Availability Through Clustering & Replication

1 32 4 5 Phases1) 100KTPS – 4 nodes2) Clients at Max 3) 400KTPS – 4 nodes4) 400KTPS – 3 nodes5) 400KTPS – 4 nodes

Aerospike Node Specs: CentOS 6.3 Intel i5-2400@ 3.1 GHz (Quad core) 16 GB RAM@1333 MHz

Page 23: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 23

LESSONS

1. Keep architecture simple■No hot spots (e.g., robust DHT)■Scales up easily (e.g., easy to size)■Avoids points of failure (e.g., single node type)

2. Avoid manual operation – automate, automate!■Self-managed cluster responds to node failures■Data rebalancing requires no intervention■Real-time prioritization allows unattended system operation

3. Keep system asynchronous■Shared nothing – nodes are autonomous■Async writes across data centers■Independent tuning parameters for different classes of

tasks

Page 24: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 24

LESSONS (cont’d)

4. Monitor the Health of the System Extensively■Growth in load sneaks up on you over weeks■Early detection means better service■Most failures can be predicted (e.g., capacity, load, …)

5. Size clusters properly■Have enough capacity ALWAYS!■Upgrade SSDs every couple years■Reduce cluster sizes to make operations simple

6. Have geographically distributed data centers■Size the distributed data centers properly■Use active-active configurations if possible■Size bandwidth requirements accurately

Page 25: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 25

LESSONS (CONT’D)

7. Have plan for unforeseen situations■Devise scenarios and practice during normal work time■Ensure you can do rolling upgrades during high load time■Make sure that your nodes can restart fast (< 1 minute)

8. Constantly test and monitor app end-to-end ■Application level metrics are more important than DB metrics■Most issues in a service are due to a combination of application, network,

database, storage, etc.

9. Separate online and offline workloads■Reserve real-time edge database for transactions and hot analytics queries

(where newest data is important)■Avoid ad-hoc queries on on-line system■Perform deep analysis in offline system (Hadoop)

10. Use the Right Data Management System for the job■Fast NoSQL DB for real-time transactions and hot analytics on rapidly

changing data■Hadoop or other comparable systems for exhaustive analytics on mostly

read-only data

Page 26: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 26

1. Scaling the Internet of Everything2. Pushing the limits of modern hardware3. No data loss (ACID) and No downtime

MODERN REAL-TIME DATA PLATFORMA

PP S

ER

VE

RA

ER

OS

PIK

E S

ER

VE

R

REAL-TIME BIG DATA APPLICATION

AEROSPIKE SMART CLIENT™

• APIs (C, C#, Java, PHP, Python, Ruby, Erlang…)• Transactions, Cluster awareness

EXTENSIBLE DATA MODEL

• Str, Int, Lists, Maps• Lookups, Queries, Scans

• Aerospike Alchemy Framework™with User Defined Functions and Distributed Aggregations

MONITORING & MANAGEMENT

• Aerospike Monitoring Console™

• Command Line Tools

• Plugins-Nagios, Graphite, Zabbix

AEROSPIKE SMART CLUSTER™

AEROSPIKE HYBRID MEMORY SYSTEM™

PROXIMITY & REDUNDANCY

Cross Data Center Replication™ (XDR)

REAL-TIMEENGINE

APP/WEB SERVER

AEROSPIKE CLUSTER

Written in ‘C’, Patents pending

Page 27: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 27

SUPPORT FOR REAL-TIME BIG DATA APPSRapid Development Complete Customizability

➤ Support for popular languages and tools ASQL and Aerospike Client in

Java, C#, Ruby, Python..

➤ Complex data types Nested documents

(map, list, string, integer) Large (Stack, Set, List)

Objects

➤ Queries Single record Batch multi-record lookups Equality and range Aggregations and MapReduce

➤ User Defined Functions (UDFs) In-DB processing

➤ Aggregation Framework UDF Pipeline MapReduce ++

➤ Time Series Queries Just 2 IOPs for most r/w

independent of object size

Page 28: Flash Economics and Lessons learned from operating low latency platforms at high throughput with Aerospike NoSQL

© 2014 Aerospike, Inc. All rights reserved. Confidential. | GITPRO – April 12, 2014 | 28

QUESTIONS?

[email protected]

www.aerospike.com