snappydata - cidrcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · lambda architecture...

45
SnappyData A Unified Cluster for Streaming, Transactions, & Interactive Analytics Version 0.7 | © Snappydata Inc 2017 www.Snappydata.io Barzan Mozafari | Jags Ramnarayan | Sudhir Menon Yogesh Mahajan | Soubhik Chakraborty | Hemant Bhanawat | Kishor Bachhav

Upload: others

Post on 25-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

SnappyDataA Unified Cluster for Streaming, Transactions, & Interactive Analytics Version 0.7 | © Snappydata Inc 2017

www.Snappydata.io

Barzan Mozafari | Jags Ramnarayan | Sudhir Menon

Yogesh Mahajan | Soubhik Chakraborty | Hemant Bhanawat | Kishor Bachhav

Page 2: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Our Pedigree

GTD VenturesPune (25+)

Portland (5)

Ann Arbor (1)

Page 3: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Mixed Workloads Are Everywhere

Stream Processing

TransacAon InteracAve AnalyAcs

Analytics on mutating data

Correlating and joining streams with large histories

Maintaining state or counters while

ingesting streams

Page 4: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Mixed Workloads Are Everywhere

Enrich – requires reference data join

InteracAve OLAP queries

State – manage windows with mutaMng state AnalyAcs – Joins to history, trend paOerns Model maintenance

TransacAon writesIOT Devices

Scalable Store HDFS, MPP DB

Data-in-moAon AnalyAcs

ApplicaAon

STREAMS

Alerts

Enrich

Reference DB

Models

InteracMve Queries

Updates

Page 5: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Why SupporAng Mixed Workloads is Difficult?

Data Structures Query Processing

Paradigm Scheduling &

Provisioning

Columnar Batch

Processing Long-running

Row stores Point

Lookups Short-lived

Sketches Delta /

Incremental Bursty

Page 6: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Lambda Architecture

Query

New Data

Batch layer

Master Datasheet

2

Serving layer

Batch view 3

Batch view

Speed layer

4 Real-Ame View Real-Ame View

1

Query

5

Page 7: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Lambda Architecture is Complex

Scalable Store HDFS, MPP DB

Data-in-moAon AnalyAcs

ApplicaAon

STREAMS

Alerts

Enrich

Reference DB

Models

InteracMve Queries

Updates Storm, Spark Streaming, Samza…

Enterprise DB – Oracle, Postgres..

NoSQL – Cassandra, Redis, Hbase ….

Teradata, Greenplum, VerMca, ...

IOT Devices

Page 8: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Lambda Architecture is Complex

• Complexity

- Learn and master mulMple products, data models, disparate APIs & configs

• Wasted resources

• Slower

- Excessive copying, serializaMon, shuffles - Impossible to achieve interacMve-speed analyMcs on large or mutaMng data

Page 9: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

9

CanWeSimplify&Op0mize?

Page 10: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Our SoluAon: SnappyData

Deep Scale, High Volume

MPP DB

Real-Ame design Low latency, HA, concurrency

Batch design, high throughput

Rapidly Maturing Matured over 13 years

Single Unified HA Cluster OLTP + OLAP + Streaming for Real-time Analytics

Page 11: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

•  Cannot update •  Repeated for each

User/App

USER 1 / APP 1

Spark Master

Spark Executor (Worker)

Framework for streaming SQL, ML…

Immutable CACHE

USER 2 / APP 2

Spark Master

Spark Executor (Worker)

Framework for streaming SQL, ML…

Immutable CACHE

HDFS SQL

NoSQL

BoOleneck

We Transform Spark from This …

Page 12: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

… Into an “Always-On” Hybrid Database !

Deep Scale, High Volume

MPP DB

HDFS SQL

NoSQL

HISTORY

Spark Executor (Worker) JVM - Long-lived

Framework for streaming SQL, ML…

Spark Driver

In-Memory ROW + COLUMN

Start with Indexing

Store

-  Mutable, -  Transactional Spark

Cluster

JDBC

ODBC

Spark Job

Shared Nothing Persistence

Page 13: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Unified Data Model & API

Cluster Manager & Scheduler

Snappy Data Server (Spark Executor + Store)

Parser

OLAP

TRX

Data Synopsis Engine

Distributed Membership Service

HA

Stream Processing

Tables ODBC/JDBC Data Frame

RDD

Low Latency

High Latency

HYBRID Store

ProbabilisMc Rows Columns

Index

Query OpMmizer

Add / Remove Server

•  Mutability (DML+Trx)

•  Indexing

•  SQL-based streaming

Page 14: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Overview

Cluster Manager & Scheduler

Snappy Data Server (Spark Executor + Store)

Parser

OLAP

TRX

Data Synopsis Engine

Distributed Membership Service

HA

Stream Processing

Data Frame

RDD

Low Latency

High Latency

HYBRID Store

ProbabilisMc Rows Columns

Index

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Page 15: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Hybrid Store

Unbounded Streams IngesMon

Real Mme Sampling

TransacMonal State Update

ProbabilisMc

Index Rows

Row Buffer Columns

Random Writes ( Reference data )

OLAP

Stream AnalyMcs

Page 16: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Updates & Deletes on Column Tables

Column Segment ( t1-t2)

Column Segment ( t2-t3)

0 1 0 0 0 0 1 1 0

K11 K12

.

.

.

.

.

C11 C12

.

.

.

.

.

C21 C22

.

.

.

.

.

Summary Metadata

Periodic Compaction

One ParAAon

Time

WRITE

Row Buffer

MVCC

New Segment

Replicate for HA

Page 17: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

`

ProbabilisAc Store: Synopses + Uniform & StraAfied Samples

Higher resoluMon for more recent Mme ranges

1. Streaming CMS

(Count-Min-Sketch)

[t1, t2) [t2, t3) [t3, t4) [t4, now) Time

4T 2T T ≤T

....

Page 18: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

`

ProbabilisAc Store: Synopses + Uniform & StraAfied Samples

Higher resoluMon for more recent Mme ranges

1. Streaming CMS

(Count-Min-Sketch)

[t1, t2) [t2, t3) [t3, t4) [t4, now) Time

4T 2T T ≤T

....

Maintain a small sample at each CMS cell

2. Top-K Queries w/ Arbitrary Filters

Tradi>onal CMS CMS+Samples

Page 19: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

`

ProbabilisAc Store: Synopses + Uniform & StraAfied Samples

Higher resoluMon for more recent Mme ranges

1. Streaming CMS

(Count-Min-Sketch)

[t1, t2) [t2, t3) [t3, t4) [t4, now) Time

4T 2T T ≤T

....

Maintain a small sample at each CMS cell

2. Top-K Queries w/ Arbitrary Filters

Tradi>onal CMS CMS+Samples

3. Fully Distributed Stratified Samples

Always include Mmestamp as a straMfied column for streams

Streams Aging Row Store (In-memory) Column Store (Disk)

timestamp

Page 20: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Overview

Cluster Manager & Scheduler

Snappy Data Server (Spark Executor + Store)

Parser

OLAP

TRX

Data Synopsis Engine

Distributed Membership Service

HA

Stream Processing

Data Frame

RDD

Low Latency

High Latency

HYBRID Store

ProbabilisMc Rows Columns

Index

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Page 21: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

SupporAng Real-Ame & HA

Locator

Lead Node Executor JVM (Server)

Shared Nothing Persistence

JDBC/ODBC

Catalogue Service

Managed Driver

SPARK Contacts

SPARK Context

SNAPPY Cluster

Manager

REST

SPARK JOBS

SPARK Program

Memory Mgmt

BLOCKS SNAPPY STORE

Stream SNAPPY

Tables

Tables

DataFrame

•  Spark Executors are long running. Driver failure doesn’t shutdown Executors

•  Driver HA – Drivers are “Managed” by SnappyData with standby secondary

•  Data HA – Consensus based clustering integrated for eager replicaMon

DataFrame

Page 22: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Overview

Cluster Manager & Scheduler

Snappy Data Server (Spark Executor + Store)

Parser

OLAP

TRX

Data Synopsis Engine

Distributed Membership Service

HA

Stream Processing

Data Frame

RDD

Low Latency

High Latency

HYBRID Store

ProbabilisMc Rows Columns

Index

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Page 23: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Overview

Cluster Manager & Scheduler

Snappy Data Server (Spark Executor + Store)

Parser

OLAP

TRX

Data Synopsis Engine

Distributed Membership Service

HA

Stream Processing

Data Frame

RDD

Low Latency

High Latency

HYBRID Store

ProbabilisMc Rows Columns

Index

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Page 24: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Query OpAmizaAon

24

•  Bypass the scheduler for transacAons and low-latency jobs

• Minimize shuffles aggressively

- Dynamic replicaMon for reference data - Retain ‘join indexes’ whenever possible - Collocate and co-parMMon related tables and streams

• OpAmized ‘Hash Join’, ‘Scan’, ‘GroupBy’ compared to Spark - Use more variables (eliminate virtual funcs) to generate beOer code - Use vectorized structures - Avoid Spark’s single-node boOlenecks in broadcast joins

•  Column segment pruning through metadata

Page 25: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Co-parAAoning & Co-locaAon

Spark Executor Subscriber A-M Ref data

Spark Executor Subscriber N-Z Ref data

Linearly scale with partition pruning

Subscriber A-M

Subscriber N-Z

KAFKA Queue

KAFKA Queue

Page 26: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Overview

Cluster Manager & Scheduler

Snappy Data Server (Spark Executor + Store)

Parser

OLAP

TRX

Data Synopsis Engine

Distributed Membership Service

HA

Stream Processing

Data Frame

RDD

Low Latency

High Latency

HYBRID Store

ProbabilisMc Rows Columns

Index

Query OpMmizer

Add / Remove Server

Tables ODBC/JDBC

Page 27: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Approximate Query Processing (AQP): Academia vs. Industry

25+ yrs of successful research in academia

User-facing AQP almost non-existent in commercial world!

Some approximate features in Infobright, Yahoo’s Druid, Facebook’s Presto, Oracle 12C, ...

AQUA, Online AggregaMon, MapReduce Online, STRAT, ABS, BlinkDB / G-OLA, ...

WHY ?

BUT:

Page 28: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Approximate Query Processing (AQP): Academia vs. Industry

25+ yrs of successful research in academia

User-facing AQP almost non-existent in commercial world!

Some approximate features in Infobright, Yahoo’s Druid, Facebook’s Presto, Oracle 12C, ...

AQUA, Online AggregaMon, MapReduce Online, STRAT, ABS, BlinkDB / G-OLA, ...

WHY ?

BUT:

select geo, avg(bid) from adImpressions group by geo having avg(bid)>10 with error 0.05 at confidence 95

geo avg(bid) error prob_existence

MI 21.5 ± 0.4 0.99

CA 18.3 ± 5.1 0.80

MA 15.6 ± 2.4 0.81

... ... ... ....

Page 29: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Approximate Query Processing (AQP): Academia vs. Industry

25+ yrs of successful research in academia

User-facing AQP almost non-existent in commercial world!

Some approximate features in Infobright, Yahoo’s Druid, Facebook’s Presto, Oracle 12C, ...

AQUA, Online AggregaMon, MapReduce Online, STRAT, ABS, BlinkDB / G-OLA, ...

WHY ?

BUT:

select geo, avg(bid) from adImpressions group by geo having avg(bid)>10 with error 0.05 at confidence 95

geo avg(bid) error prob_existence

MI 21.5 ± 0.4 0.99

CA 18.3 ± 5.1 0.80

MA 15.6 ± 2.4 0.81

... ... ... .... 1.  Incompatible w/ BI tools

2.  Complex semantics

3.  Bad sales pitch!

Page 30: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

A First Industrial-Grade AQP Engine

1. Highlevel Accuracy Contract (HAC)

•  User picks a single number p, where 0≤p≤1 (by default p=0.95)

•  Snappy guarantees that s/he only sees things that are at least p% accurate

•  Snappy handles (and hides) everything else!

geo avg(bid)

MI 21.5

WI 42.3

NY 65.6

... ...

Page 31: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

A First Industrial-Grade AQP Engine

1. Highlevel Accuracy Contract (HAC)

•  User picks a single number p, where 0≤p≤1 (by default p=0.95)

•  Snappy guarantees that s/he only sees things that are at least p% accurate

•  Snappy handles (and hides) everything else!

2. Fully compatible w/ BI tools

•  Set HAC behavior at JDBC/ODBC connecMon level

geo avg(bid)

MI 21.5

WI 42.3

NY 65.6

... ...

Page 32: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

A First Industrial-Grade AQP Engine

1. Highlevel Accuracy Contract (HAC)

•  User picks a single number p, where 0≤p≤1 (by default p=0.95)

•  Snappy guarantees that s/he only sees things that are at least p% accurate

•  Snappy handles (and hides) everything else!

2. Fully compatible w/ BI tools

•  Set HAC behavior at JDBC/ODBC connecMon level

•  Concurrency: 10’s of queries in shared clusters •  Resource usage: everyone hates their AWS bill •  Network shuffles •  Immediate results while waiMng for final results

3. Better marketing!

iSight (Immediate inSight)

geo avg(bid)

MI 21.5

WI 42.3

NY 65.6

... ...

Page 33: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

33

Benchmarks

Page 34: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Benchmarks

• Mixed Benchmark: Ad Analytics

- Ad impressions arrive on a message bus - Aggregate by publisher and geo - Report avg bid, # of impressions, and # of uniques every few

secs - Write to a parMMoned store - TransacMonally update the profiles during ingesMon - Q1: Top-20 ads receiving most impressions per region - Q2: Top-20 ads receiving largest bids per geo - Q3: Top-20 publishers receiving largest sum of bids overall

• TPC - H

Stream

ing

OLA

PTrx

Page 35: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Setup

35

• HW: 5 c4.2xlarge EC2 instances

- 1 coordinator + 4 workers

• Software: Latest GA versions avalable:

- Kaxa 2.10_0.8.2.2 - Spark 2.0.0 - Cassandra 3.9

w/ Spark-Cassandra connector 2.0.0_M3

- MemSQL Ops-5.5.10 Community EdiMon w/ Spark-MemSQL Connector 2.10_1.3.3

- SnappyData 0.6.1

Page 36: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Ad AnalyAcs

1.5-2x faster ingestion 7-142× faster analytics (at 300M records)

Page 37: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Data Synopsis Engine

Page 38: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

TPC-H

SnappyData MemSQL Spark

5.7s

100 GB

12.0s

66.9s

Avg Median

2.5s

6.4s

55.1s

SnappyData was 2.6x faster than MemSQL & 22.4x faster than Spark 2.0

Page 39: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Conclusion

Page 40: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Where Are We Today ?

•  Current customers

•  Current release

•  Next funding round

•  Upcoming features

- Investment banking, Industrial IoT, Telco, Ad AnalyMcs, & healthcare

- 0.7 (GA 1.0 in Q1-2017)

- Q2-2017

-  IntegraMon of Spark ML w/ our Data Synopsis Engine -  Cost-based query opMmizer -  Physical designer & workload miner (hOp://CliffGuard.org)

Page 41: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Lessons Learned

1. A unique experience marryings two different breeds of distributed systems lineage-based for high-throughput vs. (consensus-) replicaMon-based for low-latency

Page 42: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Lessons Learned

2. A unified cluster is simpler, cheaper, and faster

- By sharing state across apps, we decouple apps from data servers and provide HA - Save memory, data copying, serializaMon, and shuffles - Co-parMMoning and co-locaMon for faster joins and stream analyMcs

1. A unique experience marryings two different breeds of distributed systems lineage-based for high-throughput vs. (consensus-) replicaMon-based for low-latency

Page 43: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Lessons Learned

2. A unified cluster is simpler, cheaper, and faster

- By sharing state across apps, we decouple apps from data servers and provide HA - Save memory, data copying, serializaMon, and shuffles - Co-parMMoning and co-locaMon for faster joins and stream analyMcs

1. A unique experience marryings two different breeds of distributed systems lineage-based for high-throughput vs. (consensus-) replicaMon-based for low-latency

3. Advantages over HTAP engines: Deep stream integration + AQP

- Stream processing ≠ stream analyMcs - Top-k w/ almost arbitrary predicates + 1-pass straMfied sampling over streams

Page 44: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

Lessons Learned

2. A unified cluster is simpler, cheaper, and faster

- By sharing state across apps, we decouple apps from data servers and provide HA - Save memory, data copying, serializaMon, and shuffles - Co-parMMoning and co-locaMon for faster joins and stream analyMcs

1. A unique experience marryings two different breeds of distributed systems lineage-based for high-throughput vs. (consensus-) replicaMon-based for low-latency

3. Advantages over HTAP engines: Deep stream integration + AQP

- Stream processing ≠ stream analyMcs - Top-k w/ almost arbitrary predicates + 1-pass straMfied sampling over streams

4. Commercializing academic work is lots of work but also lots of fun

Page 45: SnappyData - CIDRcidrdb.org/cidr2017/slides/p28-mozafari-cidr17-slides.pdf · Lambda Architecture is Complex Scalable Store HDFS, MPP DB Data-in-moAon AnalyAcs ApplicaAon STREAMS

THANK YOU !

Try our iSight cloud for free: htp://snappydata.io/iSight