add real-time streaming sql to your mysql skill set julian hyde - chief architect steve herskovitz...

24
Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Upload: zoe-obrien

Post on 17-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Add Real-time Streaming SQL to Your MySQL Skill Set

Julian Hyde - Chief Architect

Steve Herskovitz – Director of Professional Services

Page 2: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

The Data Crunch

» Data volumes rising fast

» Human-originated data (e.g. e-commerce purchases) rising fast

» Machine-generated data (e.g. e-commerce events and network

packets) rising even faster

» Every business needs answers with lower latency

» Every significant problem is distributed:

» Geographically distributed organizations

» Multiple boxes for scale

» Exploit multiple cores

Page 3: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Data management is hard

» If you make a mistake, the system won’t be fast enough

» Can’t afford to lose data

» New technologies are very difficult to use

» MapReduce

» NoSQL

» Multi-threaded programming in Java, C++, Erlang, Scala, …

» Collaborate, interoperate, evolve

Page 4: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Today’s Computing Model

4

Database$$$$

TransactionProcessingApplication

InfrastructureApplication

Log file

Real-TimeApplication

BusinessEvents

BusinessTransactions

Batch LoadProcess

Polling

Page 5: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Stream Computing Model

5

Database$$

TransactionProcessingApplication

InfrastructureApplication

Real-TimeApplication

StreamProcesso

r

Business

Events

BusinessTransactions

Real-TimeAnswers

Traditional BI Application

Page 6: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Case study: Mozilla

Page 7: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Demo: Mozilla downloads

Page 8: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

SQL – life in the old dinosaur yet

» Widely spoken

» Rich

» Orthogonal

» Declarative

» Tune your system without

changing your logical schema

» Apps don’t interfere with each

other

» Adaptive

» Route around failure

» Exploit available resources

» Make tradeoffs to meet QoS goals

Page 9: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Streaming SQL: example #1

Tweets about this conference:

» SELECT STREAM ROWTIME, author, text

FROM Tweets

WHERE text LIKE ‘%#MySQL%'

Page 10: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Demo: studio & simple query

Page 11: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Streaming SQL basics

» Streams:

» CREATE STREAM Tweets (

author VARCHAR(20),

text VARCHAR(140));

» Relational operators have streaming counterparts:

» Project (SELECT)

» Filter (WHERE)

» Union

» Join

» Aggregation (GROUP BY)

» Windowed aggregation (e.g. SUM(x) OVER window)

» Sort (ORDER BY)

Page 12: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Streaming SQL: example #2

» Each minute, return the number of clicks on each web

page:

» SELECT STREAM ROWTIME, uri, COUNT(*)

FROM PageRequests

GROUP BY FLOOR(ROWTIME TO MINUTE), uri

Page 13: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Streaming SQL: Time

» ROWTIME pseudo-column

» Provided by source application or generated by system

» WINDOW

» Present in regular SQL (e.g. SQL:2003) but more important in

streaming SQL

» Defines a ‘working set’ for streaming JOIN, GROUP BY, windowed

aggregation

» Monotonicity (“sortedness”)

» Prerequisite for certain streaming operations

Page 14: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Streaming SQL: example #3

Find all orders from New York that shipped within an hour:

» CREATE VIEW compliant_orders AS

SELECT STREAM *

FROM orders OVER sla

JOIN shipments

ON orders.id = shipments.orderid

WHERE city = 'New York'

WINDOW sla AS (RANGE INTERVAL '1' HOUR PRECEDING)

Page 15: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Streaming SQL: other stuff

» Schemas, views, tables

» Ability to nest queries

» User-defined functions and transforms

» Adapters make external systems look like read/write streams

Page 16: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Streaming SQL: example #4

Find all stock trades where the average price over the last ten trades is

two standard deviations higher than the average over the last hour:» SELECT STREAM *

FROM (

SELECT STREAM

ticker,

price,

volume,

AVG(price) OVER lastHour AS avgHr,

STDDEV(price) OVER lastHour AS stddevHr,

AVG(price) OVER lastTenTrades AS avg10

FROM Trades

WINDOW

lastTenTrades AS (PARTITION BY ticker ROWS 10 PRECEDING)),

lastHour AS (PARTITION BY ticker RANGE INTERVAL ’1’ HOUR PRECEDING)

WHERE avg10 > avgHr + 2 * stddevHr

Page 17: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Streaming SQL for business intelligence

Conventional BI:

» Star schema:

» Fact table

» Dimension tables

» Aggregate tables

» Data warehouse populated using an ETL process

» OLAP servers (e.g. Mondrian) provide a top-down view of data

Challenge:

» Keep all of these systems up to date in real time

» Alert when key metrics are outside acceptable range

Page 18: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

ETL Process for OLAP

OLAP

Operationaldatabase

Datawarehouse

Conventional ETL

Aggregate tables populated from DW

OLAP cache flushed after load

Page 19: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Real-time OLAP: Challenges

OLAP imperatives

Highly aggregated data – e.g.

one number computed from

10M rows

Therefore:

1. Use a cache

2. Materialize results as

aggregates

Real-time imperatives

View latest version of the data

Maintaining N aggregates

requires ~N blocks of I/O per

incoming row

Therefore:

1. Don’t use a cache

2. Don’t maintain aggregates

Page 20: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Real-time OLAP: Solutions

1. Notify cache when underlying data has changed

1. Populate cache from data warehouse

2. Continuous ETL process

2. Build aggregates in memory

1. Flush to disk intermittently

2. OLAP engine looks for aggregates in memory first

Page 21: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Continuous ETL for Real-time OLAP

OLAP

Operationaldatabase

Datawarehouse

SQLstream

Aggregate tables populated incrementally

OLAP cache flushed

proactively

Page 22: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Summary

1. Data problems are getting harder

2. People are trying – and failing – to solve these problems

with SQL databases

3. Stream computing is a powerful new kind of platform

4. Streaming SQL is pragmatic and powerful

Page 23: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Any questions?

Page 24: Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

Thank you for attending!

Further reading:

» “Data in Flight” by Julian Hyde (Communications of the

ACM, Vol. 53 No. 1, Pages 48-52)

Blog: http://julianhyde.blogspot.com/

Twitter: @julianhyde