streaquel overview mike franklin uc berkeley language panel 1 st octennial swim meeting january 9,...
TRANSCRIPT
StreaQuel Overview
Mike Franklin
UC Berkeley
Language Panel
1st Octennial SWiM Meeting
January 9, 2003
Michael J. Franklin2
1 {t1,t2,t3
2 {t2,t3,t4
3 {t3,t4,t5
4 {t4,t5,t6
5 {t5,t6,t7
Time Tuple sets
Semantics of data streams (our view)
Streams are a mapping from time to sets of tuples
Since data streams are unbounded, windows are vital for restricting the data access by a query.
A stream can be transformed by:– Moving a window across it– A window can be moved by
Shifting its extremities Changing its size
Michael J. Franklin3
An example
1
2
3
4
5
Time
t1
t2
t3
t4
t5
Tuple Entry
Base DataStream
{t1
{t1,t2
{t1,t2,t3
{t1,t2,t3,t4
{t1,t2,t3,t4,t5
Sliding WindowTransformation
{t1
{t1,t2
{t2,t3
{t3,t4
{t4,t5
Michael J. Franklin4
Classification of windowed queries
WHAT IS THE “HOP-SIZE”
FOR THE INPUT SET?
Periodic
Never
Aperiodic
k = 1
k = other
k = windowSize
Periodicity (k)
on demand
INPUT
NOTION OF TIMEWHICH DIRECTION DO THE (OLDER,
NEWER) ENDS OF INPUT SET MOVE?
System Clock
Tuple Sequence Number
Snapshot - (fxd,fxd)
Sliding - (fwd, fwd)
Landmark - (fxd, fwd)
Reverse Landmark - (bwd, fxd)
(fxd, bwd)
Reverse Sliding - (bwd, bwd)
(fwd, bwd)
(bwd, fwd)
(fwd, fxd)
Michael J. Franklin5
The StreaQuel Language An extension of SQL Operates exclusively on streams Is closed under streams Supports different ways to “create” streams
– Infinite time-stamped tuple sequence– Traditional stable relations
Flexible windows: sliding, landmark and more Supports logical and physical time When used with a cursor mechanism, allows
clients to do their own window-based processing. Eventually the target language for TelegraphCQ
Michael J. Franklin6
General Form of a StreaQuel Query
SELECT projection_list
FROM from_list
WHERE selection_and_join_predicates
ORDEREDBY
TRANSFORM…TO
WINDOW…BY
Windows can be applied to individual streams Window movement is expressed using a “for loop”-
type construct in the “transform” clause We’re not completely psyched about our syntax at
this point.
Michael J. Franklin7
StreaQuel Keywords
NOW = “current time”– Eg; wall-clock or latest sequence#
ST = “start time” of query
On_demand = interrupt from user
BoS = 0 = beginning of stream
Michael J. Franklin8
Example – Landmark query
0 105 15 20 25 30 35 40 45 50 55 60
NOW = 40 = t
TimelineSTWindow
TimelineSTWindow
TimelineSTWindow
TimelineSTWindow
NOW = 41 = t
...
...
NOW = 45 = t
NOW = 50 = t
Michael J. Franklin9
Challenge Queries: TPC-Squirrel
Our Opinion…
(thanks to Sam Madden for finding this wonderful picture)
Michael J. Franklin10
Challenge Queries – Query 1
Stream: Packets(pID, length, time)Query: Generate the stream of packets whose length is greater than
twice the average packet length over the last 1 hour.
AVGPACKETS:Select AVG(length) as avlenFrom PacketsWindow Packets By (NOW - 1hr, NOW)
GREATERTHAN2AVG:Select *From Packets p, AVGPACKETS aWhere p.length > 2 * avlen;Window p By (NOW, NOW)
STREAM: Open a delta-output cursor on GREATERTHAN2AVG.
Michael J. Franklin11
Challenge Queries – Query 2
SquirrelSensors(sID, region, time)SquirrelType(sID, type))Query: Create an alert when more than 20 type 'A' squirrels are in
Jennifer's backyard. Option 1: (Twenty Distinct Type 'A' squirrels have been to J's backyard
since the beginning of time)
Select ALERT()From SquirrelSensors ss, SquirrelType stWhere ss.region = JWBackyard AND
ss.id = st.id AND st.type = A
Having COUNT (DISTINCT(ss.id)) > 20
Michael J. Franklin12
Challenge Queries – Query 2…contd.
Option 2: (Twenty Distinct Type 'A' squirrels are being sensed in J's backyard this instant)
Select ALERT()From SquirrelSensors ss, SquirrelType stWhere ss.region = JWBackyard AND
ss.id = st.id AND st.type = A
Having COUNT (DISTINCT(ss.id)) > 20Window ss by (NOW, NOW)
Michael J. Franklin13
Challenge Queries – Query 2…contd.
Option 3: (Individual Squirrel Tracking: Twenty Distinct Type 'A' squirrels have last been sensed in J's backyard)
LASTREADING: Select sid, MAX(time) From SquirrelSensors ss GroupBy sid;
Select ALERT()From SquirrelSensors ss, SquirrelType st, LASTREADING lrWhere ss.region = JWBackyard AND ss.id = st.id AND
st.type = 'A' AND ss.id = lr.id AND ss.time = lr.time
Having (COUNT DISTINCT(ss.id)) > 20
Michael J. Franklin14
Challenge Queries – Query 3
SquirrelChirps(sID, loc, time)Query: Stream an event each time 3 different squirrels within a pairwise
distance of 5 meters from each other chirp within 10 seconds of each other.
Select timeFrom SquirrelChirps sc1, SquirrelChirps sc2, SquirrelChirps sc3Where sc1.id <> sc2.id AND
sc1.id <> sc3.id AND sc2.id <> sc3.id AND distance(sc1.loc, sc2.loc) < 5 meters AND distance(sc1.loc, sc3.loc) < 5 meters AND distance(sc2.loc, sc3.loc) < 5 meters
Window sc1 By (NOW, NOW)Window sc2 By (NOW - 10, NOW)Window sc3 By (NOW - 10, NOW)
Michael J. Franklin15
StreQuel Summary
Streams are our primitives– Capture both tuple sequences and relations– All operations are closed under streams
Flexible window support Other aspects:
– support for multiple notions of time– support for events (tuples) that are not totally
ordered (distributed systems)– great name
A work in progress
Michael J. Franklin16
Not just windows… Other Issues
TinyDB: “Run this CQ (with min acceptable sample rates) and make sure I don’t have to change any batteries for 3 months”
Dealing with dirty/missing/late data Correlating across different time domains Adjusting sample rates to topology of
network.