windows in niagara

19
Windows in Windows in Niagara Niagara Jin (Jenny) Li, David Jin (Jenny) Li, David Maier, Maier, Vassilis Papadimos, Vassilis Papadimos, Peter Tucker, Kristin Tufte Peter Tucker, Kristin Tufte

Upload: newton

Post on 05-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Windows in Niagara. Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte. Overview. Make Windows Explicit Tag tuples with a window id Standard operators don’t know about different kinds of windows - work with window ID attribute Use Punctuation Infrastructure - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Windows in Niagara

Windows in NiagaraWindows in Niagara

Jin (Jenny) Li, David Maier, Jin (Jenny) Li, David Maier,

Vassilis Papadimos, Vassilis Papadimos,

Peter Tucker, Kristin TuftePeter Tucker, Kristin Tufte

Page 2: Windows in Niagara

OverviewOverview Make Windows ExplicitMake Windows Explicit

Tag tuples with a window idTag tuples with a window id Standard operators don’t know about different kinds of Standard operators don’t know about different kinds of

windows - work with window ID attributewindows - work with window ID attribute Use Punctuation InfrastructureUse Punctuation Infrastructure

Punctuation signals end of window Punctuation signals end of window No need for specialized window operators – just use No need for specialized window operators – just use

punctuate-aware operatorspunctuate-aware operators Flexible Flexible

Window on system time, external time or tuple-basedWindow on system time, external time or tuple-based Data can arrive and be processed out of orderData can arrive and be processed out of order

Page 3: Windows in Niagara

Niagara Control StructureNiagara Control Structure

Push-based (pipelined) Push-based (pipelined) system.system.

Each operator is a thread. Each operator is a thread. Operators are connected by Operators are connected by

queues of tuples. queues of tuples. Operators wait on input Operators wait on input

queue, when tuple is ready, queue, when tuple is ready, it is processed and result is it is processed and result is inserted in output queueinserted in output queue

streamscan

unnest (path expr)

select

Page 4: Windows in Niagara

streamscan

Niagara Query ExecutionNiagara Query Execution

unnest (bid.bidderid)

select (bidderid = 501)

Reads and parses data from a stream.

Uses a path expression to extract matching elements from input tuples.

<bid> <bidderid> 501 </bidderid> <itemid> 42 </itemid> <price> $10.00 </price> </bid>

Query:

Find all bids that bidder with id = 501 has made.

bid

bidderid:501 price: $5.00

bid

bidderid:501 price: $5.00

Page 5: Windows in Niagara

slide_number speaker

* Kristin

Page 6: Windows in Niagara

NEXMark Schema

Streams:bid

auctionid bidderid price datetime auctionsite

auction

id description reserve expires auctionsiteitemname seller category

Note: bid.datetime and auction.expires are time generated at the source sites.

Page 7: Windows in Niagara

Three Example QueriesThree Example Queries

All three queries are window aggregates, All three queries are window aggregates, specifically, time-based window countspecifically, time-based window count

Query 1: use internal system time and Query 1: use internal system time and internal punctuationsinternal punctuations

Query 2: use external timestamp and Query 2: use external timestamp and internal punctuationsinternal punctuations

Query 3: use external timestamp and Query 3: use external timestamp and external punctuationsexternal punctuations

Page 8: Windows in Niagara

Query 1:Select the number of bids on each item in the past five minutes. Update the results every minute.

SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid

streamscan (Bid) B1

unnest (auctionid)

WindowGroupby(B1.auctionid, count(*))

Punctuator/TimestamperAdd timestamp field to tuplePunctuate at end of minute

Groupby(B1.auctionid, B.wid, count(*))

TimerTimestamp = CURRENT_TIME

BucketizerAdd window ranges to tuples

Punctuate end of window

streamscan (Bid) B1

unnest (auctionid)

Page 9: Windows in Niagara

5:01TS = 5:00

Query 1 Details

streamscan (Bid) B1

unnest (auctionid)

groupby(B1.auctionid, B1.winId, count(*))

punctuator/timestamper

timerTimestamp = CURRENT_TIME

bucketizer

T1 T2 T3

T1

T15:00

T15:00 10

T11-55:00 101-1* * *

* (,5:00] *

5:00

T2

T25:00

T25:00 15

T21-55:00 15

T3

T35:00

T35:00 15

T31-55:00 15

5:01

SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid

10 1

15 2

auctionid count

* ( ,5:00]

Page 10: Windows in Niagara

Query 1 vs. Query 2Query 1 vs. Query 2

Query 1:

SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid

Query 2:

SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite

Select the number of bids on each item in the past five minutes. Update the results every minute.

Select the number of bids made at each auction site in the past five minutes. Update the results every minute.

“CQL2004”

Page 11: Windows in Niagara

Query 2:

Select the number of bids made at each auction site in the past five minutes. Update the results every minute.

SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite

streamscan (Bid) B1

unnest (auctionsite, datetime)

groupby(B1.auctionsite, B.winId, count(*))

punctuator/enforcer Enforce datetime > current timestamp

Punctuate at end of minute

timerTimestamp = CURRENT_TIME – 5 MINUTES

bucketizer Add window ranges to tuples

Punctuate end of window

Page 12: Windows in Niagara

Query 2:

Select the number of bids made at each auction site in the past five minutes. Update the results every minute.

SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite

streamscan (Bid) B1

unnest (auctionsite, datetime)

groupby(B1.auctionsite, B.winId, count(*))

punctuator/enforcer Enforce datetime > current timestamp

Punctuate at end of minute

timerTimestamp = CURRENT_TIME – 5 MINUTES

bucketizer Add window ranges to tuples

Punctuate end of window

Page 13: Windows in Niagara

Query 2 vs. Query 3Query 2 vs. Query 3Query 2:

SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite

Query 3:

SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite

Select the number of bids made at each auction site in the past five minutes. Update the results every minute.

Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes.

Page 14: Windows in Niagara

Query 3:

Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes.

SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite

streamscan (Bid) B1

unnest (auctionsite, datetime)

groupby(B1.auctionsite, B.winId, count(*))

bucketizer Add window ranges to tuples

Punctuate end of window

… … … …T1

…T2

Site A

Site B Site C

Page 15: Windows in Niagara

T1A1-1A 5:01

Query 3 Details

streamscan (Bid) B1

unnest (auctionsite, datetime)

groupby(B1.auctionsite, B.winId, count(*))

bucketizer Add window ranges to tuples

Punctuate end of window

… … … …T1

T2

T1AA 5:01 T2AA 5:07A (,5:05]

T2A2-2A 5:071-1A *

window 1, site A: T1A

T1BB 5:02A (,5:10]

2-2A * T1B1-1B 5:02

window 1, site B: T1B

window 2, site A: T2A

window 1, site B: T2B

A, 1, 1 A, 1, 2

T2B1-1B 5:04

Site A

Site B Site C

SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite

Auctionsite, count, wid

T2BB 5:04

Window 1: 5:00 – 5:05Window 2: 5:05 – 5:10

Legend:

Page 16: Windows in Niagara

DiscussionDiscussion BucketizerBucketizer

Apply a function to the streamApply a function to the stream Encapsulate window semanticsEncapsulate window semantics Punctuate-Aware Punctuate-Aware

e.g. punctuation on time -> punctuation on wide.g. punctuation on time -> punctuation on wid Wid is used as a grouping/join attributeWid is used as a grouping/join attribute

PunctuatorPunctuator Adds timestamp as an attributeAdds timestamp as an attribute - optional- optional Enforce punctuationsEnforce punctuations - optional- optional Converts stream semantics to punctuationsConverts stream semantics to punctuations Outputs punctuationsOutputs punctuations Punctuations signal the end of windows, results are Punctuations signal the end of windows, results are

output and state is purgedoutput and state is purged

Page 17: Windows in Niagara

ConclusionsConclusions

Process window queries without Process window queries without specialized window operatorsspecialized window operators

Flexible window semanticsFlexible window semantics Use punctuate-aware operators, introduce Use punctuate-aware operators, introduce

minimum number of new operatorsminimum number of new operators

Page 18: Windows in Niagara

Future WorkFuture Work

Semantics of window operatorsSemantics of window operators Performance of different implementationsPerformance of different implementations Study affect of disorderStudy affect of disorder Groupby ? WindowGroupby ? Window

Page 19: Windows in Niagara

Questions?Questions?

… …… …