streaquel overview mike franklin uc berkeley language panel 1 st octennial swim meeting january 9,...

16
StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Upload: roland-dennis

Post on 18-Dec-2015

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

StreaQuel Overview

Mike Franklin

UC Berkeley

Language Panel

1st Octennial SWiM Meeting

January 9, 2003

Page 2: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin2

1 {t1,t2,t3

2 {t2,t3,t4

3 {t3,t4,t5

4 {t4,t5,t6

5 {t5,t6,t7

Time Tuple sets

Semantics of data streams (our view)

Streams are a mapping from time to sets of tuples

Since data streams are unbounded, windows are vital for restricting the data access by a query.

A stream can be transformed by:– Moving a window across it– A window can be moved by

Shifting its extremities Changing its size

Page 3: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin3

An example

1

2

3

4

5

Time

t1

t2

t3

t4

t5

Tuple Entry

Base DataStream

{t1

{t1,t2

{t1,t2,t3

{t1,t2,t3,t4

{t1,t2,t3,t4,t5

Sliding WindowTransformation

{t1

{t1,t2

{t2,t3

{t3,t4

{t4,t5

Page 4: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin4

Classification of windowed queries

WHAT IS THE “HOP-SIZE”

FOR THE INPUT SET?

Periodic

Never

Aperiodic

k = 1

k = other

k = windowSize

Periodicity (k)

on demand

INPUT

NOTION OF TIMEWHICH DIRECTION DO THE (OLDER,

NEWER) ENDS OF INPUT SET MOVE?

System Clock

Tuple Sequence Number

Snapshot - (fxd,fxd)

Sliding - (fwd, fwd)

Landmark - (fxd, fwd)

Reverse Landmark - (bwd, fxd)

(fxd, bwd)

Reverse Sliding - (bwd, bwd)

(fwd, bwd)

(bwd, fwd)

(fwd, fxd)

Page 5: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin5

The StreaQuel Language An extension of SQL Operates exclusively on streams Is closed under streams Supports different ways to “create” streams

– Infinite time-stamped tuple sequence– Traditional stable relations

Flexible windows: sliding, landmark and more Supports logical and physical time When used with a cursor mechanism, allows

clients to do their own window-based processing. Eventually the target language for TelegraphCQ

Page 6: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin6

General Form of a StreaQuel Query

SELECT projection_list

FROM from_list

WHERE selection_and_join_predicates

ORDEREDBY

TRANSFORM…TO

WINDOW…BY

Windows can be applied to individual streams Window movement is expressed using a “for loop”-

type construct in the “transform” clause We’re not completely psyched about our syntax at

this point.

Page 7: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin7

StreaQuel Keywords

NOW = “current time”– Eg; wall-clock or latest sequence#

ST = “start time” of query

On_demand = interrupt from user

BoS = 0 = beginning of stream

Page 8: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin8

Example – Landmark query

0 105 15 20 25 30 35 40 45 50 55 60

NOW = 40 = t

TimelineSTWindow

TimelineSTWindow

TimelineSTWindow

TimelineSTWindow

NOW = 41 = t

...

...

NOW = 45 = t

NOW = 50 = t

Page 9: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin9

Challenge Queries: TPC-Squirrel

Our Opinion…

(thanks to Sam Madden for finding this wonderful picture)

Page 10: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin10

Challenge Queries – Query 1

Stream: Packets(pID, length, time)Query: Generate the stream of packets whose length is greater than

twice the average packet length over the last 1 hour.

AVGPACKETS:Select AVG(length) as avlenFrom PacketsWindow Packets By (NOW - 1hr, NOW)

GREATERTHAN2AVG:Select *From Packets p, AVGPACKETS aWhere p.length > 2 * avlen;Window p By (NOW, NOW)

STREAM: Open a delta-output cursor on GREATERTHAN2AVG.

Page 11: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin11

Challenge Queries – Query 2

SquirrelSensors(sID, region, time)SquirrelType(sID, type))Query: Create an alert when more than 20 type 'A' squirrels are in

Jennifer's backyard. Option 1: (Twenty Distinct Type 'A' squirrels have been to J's backyard

since the beginning of time)

Select ALERT()From SquirrelSensors ss, SquirrelType stWhere ss.region = JWBackyard AND

ss.id = st.id AND st.type = A

Having COUNT (DISTINCT(ss.id)) > 20

Page 12: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin12

Challenge Queries – Query 2…contd.

Option 2: (Twenty Distinct Type 'A' squirrels are being sensed in J's backyard this instant)

Select ALERT()From SquirrelSensors ss, SquirrelType stWhere ss.region = JWBackyard AND

ss.id = st.id AND st.type = A

Having COUNT (DISTINCT(ss.id)) > 20Window ss by (NOW, NOW)

Page 13: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin13

Challenge Queries – Query 2…contd.

Option 3: (Individual Squirrel Tracking: Twenty Distinct Type 'A' squirrels have last been sensed in J's backyard)

LASTREADING: Select sid, MAX(time) From SquirrelSensors ss GroupBy sid;

Select ALERT()From SquirrelSensors ss, SquirrelType st, LASTREADING lrWhere ss.region = JWBackyard AND ss.id = st.id AND

st.type = 'A' AND ss.id = lr.id AND ss.time = lr.time

Having (COUNT DISTINCT(ss.id)) > 20

Page 14: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin14

Challenge Queries – Query 3

SquirrelChirps(sID, loc, time)Query: Stream an event each time 3 different squirrels within a pairwise

distance of 5 meters from each other chirp within 10 seconds of each other.

Select timeFrom SquirrelChirps sc1, SquirrelChirps sc2, SquirrelChirps sc3Where sc1.id <> sc2.id AND

sc1.id <> sc3.id AND sc2.id <> sc3.id AND distance(sc1.loc, sc2.loc) < 5 meters AND distance(sc1.loc, sc3.loc) < 5 meters AND distance(sc2.loc, sc3.loc) < 5 meters

Window sc1 By (NOW, NOW)Window sc2 By (NOW - 10, NOW)Window sc3 By (NOW - 10, NOW)

Page 15: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin15

StreQuel Summary

Streams are our primitives– Capture both tuple sequences and relations– All operations are closed under streams

Flexible window support Other aspects:

– support for multiple notions of time– support for events (tuples) that are not totally

ordered (distributed systems)– great name

A work in progress

Page 16: StreaQuel Overview Mike Franklin UC Berkeley Language Panel 1 st Octennial SWiM Meeting January 9, 2003

Michael J. Franklin16

Not just windows… Other Issues

TinyDB: “Run this CQ (with min acceptable sample rates) and make sure I don’t have to change any batteries for 3 months”

Dealing with dirty/missing/late data Correlating across different time domains Adjusting sample rates to topology of

network.