adam fuchs' accumulo talk at nosql now! 2013

33
Securely explore your data SQRRL ENTERPRISE + APACHE ACCUMULO: A secure, scalable, real-time analysis framework Adam Fuchs, CTO Sqrrl Data, Inc. August 21, 2013

Upload: sqrrl

Post on 27-Jan-2015

110 views

Category:

Technology


5 download

DESCRIPTION

Adam Fuch provides an overview of Accumulo and Sqrrl Enterprise at the 2013 NoSQL Now! conference

TRANSCRIPT

Page 1: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

Securely explore your data

SQRRL ENTERPRISE +

APACHE ACCUMULO:

A secure, scalable, real-time analysis framework

Adam Fuchs, CTO

Sqrrl Data, Inc.

August 21, 2013

Page 2: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

Page 3: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

TWO HALVES OF REAL-TIME

Real-Time reduce event to reaction time Real-Time reduce ingest to query latency

Data-Driven Query-Driven

Page 4: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

1. SPE queries NoSQL to enrich streaming data

2. SPE persists results in NoSQL for future query

3. SPE takes action automatically

4. SPE issues data-driven alerts

5. Sqrrl provides context for dashboards

6. Analysis tools query use Sqrrl to search and manipulate historical data

Data-Driven + Query-Driven Real-Time Ecosystem

Data

NoSQL+

SPE

Dashboards

Actions

InteractiveAnalysis Tools(Discovery + Forensics)

1 2

3

5

4

6

Page 5: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5

This talk focuses on the database.

Dashboards

InteractiveAnalysis Tools(Discovery + Forensics)

1. SPE queries NoSQL to enrich streaming data2. SPE persists results in NoSQL for future query3. SPE takes action automatically4. SPE issues data-driven alerts5. Sqrrl provides context for dashboards6. Analysis tools query use Sqrrl to search and manipulate historical data

Data

Actions

SPE4

3

NoSQL+6

5

21

Page 6: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

Page 7: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO DATA FORMAT

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7

Accumulo Key/Value Example

An Accumulo key is a 5-tuple, consisting of:

- Row: Controls Atomicity- Column Family: Controls Locality - Column Qualifier: Controls Uniqueness- Visibility Label: Controls Access- Timestamp: Controls Versioning

Page 8: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO TABLETS

Collections of KV pairs form Tables

Tables are partitioned into Tablets

Metadata tablets hold info about other tablets, forming a 3-level hierarchy

A Tablet is a unit of work for a Tablet Server

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8

Root Tablet-∞ to ∞

Metadata Tablet 1-∞ to “Encyclopedia:Ocelot”

Data Tablet-∞ : thing

Data Tabletthing : ∞

Data Tablet-∞ : Ocelot

Data TabletOcelot : Yak

Data TabletYak : ∞

Data Tablet-∞ to ∞

Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞

Well-Known Location

(zookeeper)

Table: Adam’s Table Table: Encyclopedia Table: Foo

Page 9: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO PROCESSES

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 9

Tablet Server

Tablet

Tablet Server

Tablet

Tablet Server

Tablet

Application

Zookeeper

Zookeeper

Zookeeper

Master

HDFS

Read/Write

Store/Replicate

Assign/Balance

Delegate Authority

Application

Application

Page 10: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

TABLET DATA FLOW

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10

In-Memory Map

Write AheadLog

(For Recovery)

Sorted, Indexed

File

Sorted, Indexed

File

Sorted, Indexed

File

Tablet

ReadsIterator

TreeMinor

Compaction

Merging / Major Compaction

Iterator Tree

Writes Iterator Tree

Scan

Page 11: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

WORD COUNT:

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 11

Summing Aggregating Iterator

Input Corpus

Page 12: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ITERATOR FRAMEWORK

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12

Iterator Operations:

- File Reads- Block Caching- Merging- Deletion- Isolation- Locality Groups- Range Selection- Column Selection- Cell-level Security- Versioning- Filtering- Aggregation- Partitioned Joins

Page 13: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO LATENCIES

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13

Ingesters QueriersTablet Servers

Input BatchWriter

In-Memory

Map

ScanIterators

Scanner/Batch

Scanner

In-Memory

Map

RFile

Compaction

Iterators

ScanIterators

RFile

Compaction

Iterators

In-Memory

Map

RFiles

CompactionIterators

ScanIterators

Output

~ms~ms ~ms

ms

- m

in

Page 14: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO THROUGHPUT

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 14

Ingesters QueriersTablet Servers

Input BatchWriter

In-Memory

Map

ScanIterators

Scanner/Batch

Scanner

In-Memory

Map

RFile

Compaction

Iterators

ScanIterators

RFile

Compaction

Iterators

In-Memory

Map

RFiles

CompactionIterators

ScanIterators

Output

~ms~ms ~ms

ms

- m

in

Read-Modify-Write Latency: ~ms

>1K entries/s challenging with R-M-W

Ingest:up to 500K entries/s

per node

Scan:up to 1M entries/s

per node

Page 15: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

SQRRL ENTERPRISE

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15

Built on Apache Accumulo

Sqrrl Server

Sqrrl API over Apache Thrift RPC(JSON, Graph, Aggregation, Search, etc.)

• Sqrrl proprietary• Automated indexing• Custom iterators• Lucene integration• Security extensions Accumulo RPC

(Sorted Key/Value I/O)

Hadoop RPC(File I/O)

• Open source (including Sqrrl contributions)

• Open source or commercial distributions

Graph + Document I/O

Exploratory / Operational Apps

Bulk Processing Integration

Page 16: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16

Page 17: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

DATA-CENTRIC SECURITY

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17

Definition: Data carries with it information that is required to make policy decisions on its releasability.

User 1 User 2Sqrrl/

Accumulo

Page 18: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

SECURITY

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18

Example Accumulo Key/Value Pairs

Accumulo is the only NoSQL database with cell-level access controls

Page 19: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

DATA-CENTRIC SECURITY ECOSYSTEM

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 19

Data Labeler Sqrrl Enterprise

Apps

User Attributes

Audits

Policies

End Users

Auth. Service

Policy Engine

Key Mgmt

Page 20: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 20

Page 21: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

HIERARCHICAL DECOMPOSITION

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 21

Row:

Column Family:

Column Qualifier:

Value:

<person>

attribute purchases

age

<age>

discount

<cost>

sneakers

<rate>

returns

hat

<cost>

Page 22: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

MATERIALIZED TABLE

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 22

Row: george

attribute purchases

age

27 $83

sneakers

bill

attribute purchases

40%

sneakers

$100

discount

49

age

Key/Value Pair

Column Family:

Column Qualifier:

Value:

Page 23: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

FORWARD AND INVERTED INDEX

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 23

Table:

Row:

Column Family:

Value:

Forward Index

<UUID>

<Type>

<Field>

<Term>

Inverted Index

<Term>

<UUID>

<Type+Field>

<Digest of Event>

Column Qualifier:

Page 24: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

FORWARD AND INVERTED INDEX

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 24

Page 25: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

CUSTOM INDEXING

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 25

Table:

Row:

Geo Index

<GeoHash>

<Event Type>

<UUID>

<Digest of Event>

Latitude10110101001

Longitude00111010010

101001110111010101011100001011100

Depth11010110110

Column Family:

Column Qualifier:

Value:

Page 26: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

D4M 2.0 SCHEMA FOR TWITTER DATA

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 26

Table:

Row:

Column Family:

Tedge

<UUID>

“stat”

<stat>

“1”

“time”

<time>

“1”

“user”

<user>

“1”

“word”

<word>

“1”

TedgeT

<value>

“stat”

<UUID>

“1”

“time”

<UUID>

“1”

“user”

<UUID>

“1”

“word”

<UUID>

“1”

Column Qualifier:

Value:

Page 27: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

D4M 2.0 SCHEMA FOR TWITTER DATA

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 27

Table:

Row:

Column Family:

TedgeDegT

<value>

“stat”

“degree”

<count>

“time”

“degree”

<count>

“user”

“degree”

<count>

“word”

“degree”

<count>

Ttext

<UUID>

Column Qualifier:

Value:

“text”

-

<text>

Page 28: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

D4M 2.0 SCHEMA FOR TWITTER DATA

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 28

Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database , Kepner et. al., HPEC 2013

Page 29: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

OUTLINE

Two Halves of “Real-Time”

Accumulo and Sqrrl Technology

Data-Centric Security

Table Designs

Performance Benchmarks

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 29

Page 30: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO WITH D4M 2.0 SCHEMA PERFORMANCE

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 30

Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database , Kepner et. al., HPEC 2013

Maximizing throughput on an 8-node, 192-core cluster:

Page 31: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ACCUMULO SCALABILITY: GRAPH500 BENCHMARK

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 31

source: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf

Page 32: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

ATOMIC INCREMENT PERFORMANCE COMPARISON

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 32

Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo)

Page 33: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential

QUESTIONS?

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 33

Adam Fuchs, CTOSqrrl Data, Inc.