imc summit 2016 innovation - derek nelson - pipelinedb: the streaming-sql database

PipelineDBThe Streaming SQL Database

Derek Nelson

What is PipelineDB?

● Relational database

What is PipelineDB?


● Runs SQL queries continuously on streams, incrementally storing results in tables

What is PipelineDB?


● Runs SQL queries continuously on streams, incrementally storing results in tables

● Seamlessly integrates streaming computation and relational storage

PipelineDB primitives

● Continuous view: stores incrementally updating continuous query results



● Continuous transform: applies a transformation to an event and writes the result to another stream



● Continuous transform: applies a transformation to an event and writes the result to another stream

● Continuous trigger: fires whenever some condition is true within a continuous view

Why did we build PipelineDB?

● Data-processing demands are outpacing hardware innovation (disks)



● Storing critical data in main memory is an obvious workaround for the disk bottleneck



● Storing critical data in main memory is an obvious workaround for the disk bottleneck

● For a vast set of use cases, we can actually do better

Critical observations:

● If fast query results are required, then the query itself is often already known

○ Especially if consumers are other applications




● If the query is known in advance, we can efficiently compute the result continuously as new data arrives




● If the query is known in advance, we can efficiently compute the result continuously as new data arrives

● No need to store granular data after results are incrementally updated

Traditional databasesStore Query

Traditional databases

SELECT COUNT(*) FROM table

= ?

Store Query

Traditional databases

✓

SELECT COUNT(*) FROM table

= 16

Store Query

The Continuous Query

Query

Store

THEN


SELECT COUNT(*) FROM stream

= ?



= 1 ✓



= 2 ✓



= 3 ✓



= 4 ✓



= 5 ✓



= 6 ✓



= 7 ✓



= 8 ✓



= 9 ✓



= 10✓



= 11✓



= 12✓



= 13✓



= 14✓



= 15✓



= 16✓

Example Topology

Kafka

Example Topology

SELECT * FROM kafka_topicJOIN table t USING (x)THEN INSERT INTO stream

TransformKafka

Example Topology


Transform

Continuous View

SELECT x, AVG(value)FROM stream GROUP BY x

Kafka

Example Topology


Transform

Continuous View


WHEN OLD.avg < 10 AND NEW.avg > 10THEN EXECUTE PROCEDURE post_alarm(‘pipelinedb.com/alert’)

Continuous Trigger

Kafka

Example Topology


Transform

Continuous View


SQL clients SELECT from continuous views for realtime results

WHEN OLD.avg < 10 AND NEW.avg > 10THEN EXECUTE PROCEDURE post_alarm(‘pipelinedb.com/alert’)

Continuous Trigger

Kafka

x AVG

a 1.442

b 7.55

Benefits of continuous SQL

● Streaming analytics with pure SQL

○ No application code

○ Very low engineering overhead

○ Add new continuous queries with no downtime


● Sustainable infrastructure cost

○ Consumed memory / disk independent of ingested data volume

total data ingested

database size

CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream


● Sustainable infrastructure cost

○ Consumed memory / disk independent of ingested data volume


● Realtime push becomes possible (no polling)

○ Incremental updates mean we can trigger any functionality the moment something interesting happens


● Realtime push becomes possible (no polling)

○ Incremental updates mean we can trigger any functionality the moment something interesting happens

CREATE TRIGGER trig ON cont_viewWHEN some_condition(new.value)THEN http_post(‘pipelinedb.com/alarm’)


● Works with all existing standard SQL clients

Thank you!

PipelineDB

imc summit 2016 innovation - derek nelson - pipelinedb: the streaming-sql database

Data & Analytics