windows in niagara
DESCRIPTION
Windows in Niagara. Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte. Overview. Make Windows Explicit Tag tuples with a window id Standard operators don’t know about different kinds of windows - work with window ID attribute Use Punctuation Infrastructure - PowerPoint PPT PresentationTRANSCRIPT
Windows in NiagaraWindows in Niagara
Jin (Jenny) Li, David Maier, Jin (Jenny) Li, David Maier,
Vassilis Papadimos, Vassilis Papadimos,
Peter Tucker, Kristin TuftePeter Tucker, Kristin Tufte
OverviewOverview Make Windows ExplicitMake Windows Explicit
Tag tuples with a window idTag tuples with a window id Standard operators don’t know about different kinds of Standard operators don’t know about different kinds of
windows - work with window ID attributewindows - work with window ID attribute Use Punctuation InfrastructureUse Punctuation Infrastructure
Punctuation signals end of window Punctuation signals end of window No need for specialized window operators – just use No need for specialized window operators – just use
punctuate-aware operatorspunctuate-aware operators Flexible Flexible
Window on system time, external time or tuple-basedWindow on system time, external time or tuple-based Data can arrive and be processed out of orderData can arrive and be processed out of order
Niagara Control StructureNiagara Control Structure
Push-based (pipelined) Push-based (pipelined) system.system.
Each operator is a thread. Each operator is a thread. Operators are connected by Operators are connected by
queues of tuples. queues of tuples. Operators wait on input Operators wait on input
queue, when tuple is ready, queue, when tuple is ready, it is processed and result is it is processed and result is inserted in output queueinserted in output queue
streamscan
unnest (path expr)
select
streamscan
Niagara Query ExecutionNiagara Query Execution
unnest (bid.bidderid)
select (bidderid = 501)
Reads and parses data from a stream.
Uses a path expression to extract matching elements from input tuples.
<bid> <bidderid> 501 </bidderid> <itemid> 42 </itemid> <price> $10.00 </price> </bid>
Query:
Find all bids that bidder with id = 501 has made.
bid
bidderid:501 price: $5.00
bid
bidderid:501 price: $5.00
slide_number speaker
* Kristin
NEXMark Schema
Streams:bid
auctionid bidderid price datetime auctionsite
auction
id description reserve expires auctionsiteitemname seller category
Note: bid.datetime and auction.expires are time generated at the source sites.
Three Example QueriesThree Example Queries
All three queries are window aggregates, All three queries are window aggregates, specifically, time-based window countspecifically, time-based window count
Query 1: use internal system time and Query 1: use internal system time and internal punctuationsinternal punctuations
Query 2: use external timestamp and Query 2: use external timestamp and internal punctuationsinternal punctuations
Query 3: use external timestamp and Query 3: use external timestamp and external punctuationsexternal punctuations
Query 1:Select the number of bids on each item in the past five minutes. Update the results every minute.
SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid
streamscan (Bid) B1
unnest (auctionid)
WindowGroupby(B1.auctionid, count(*))
Punctuator/TimestamperAdd timestamp field to tuplePunctuate at end of minute
Groupby(B1.auctionid, B.wid, count(*))
TimerTimestamp = CURRENT_TIME
BucketizerAdd window ranges to tuples
Punctuate end of window
streamscan (Bid) B1
unnest (auctionid)
5:01TS = 5:00
Query 1 Details
streamscan (Bid) B1
unnest (auctionid)
groupby(B1.auctionid, B1.winId, count(*))
punctuator/timestamper
timerTimestamp = CURRENT_TIME
bucketizer
T1 T2 T3
T1
T15:00
T15:00 10
T11-55:00 101-1* * *
* (,5:00] *
5:00
T2
T25:00
T25:00 15
T21-55:00 15
T3
T35:00
T35:00 15
T31-55:00 15
5:01
SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid
10 1
15 2
auctionid count
* ( ,5:00]
Query 1 vs. Query 2Query 1 vs. Query 2
Query 1:
SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid
Query 2:
SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite
Select the number of bids on each item in the past five minutes. Update the results every minute.
Select the number of bids made at each auction site in the past five minutes. Update the results every minute.
“CQL2004”
Query 2:
Select the number of bids made at each auction site in the past five minutes. Update the results every minute.
SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite
streamscan (Bid) B1
unnest (auctionsite, datetime)
groupby(B1.auctionsite, B.winId, count(*))
punctuator/enforcer Enforce datetime > current timestamp
Punctuate at end of minute
timerTimestamp = CURRENT_TIME – 5 MINUTES
bucketizer Add window ranges to tuples
Punctuate end of window
Query 2:
Select the number of bids made at each auction site in the past five minutes. Update the results every minute.
SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite
streamscan (Bid) B1
unnest (auctionsite, datetime)
groupby(B1.auctionsite, B.winId, count(*))
punctuator/enforcer Enforce datetime > current timestamp
Punctuate at end of minute
timerTimestamp = CURRENT_TIME – 5 MINUTES
bucketizer Add window ranges to tuples
Punctuate end of window
Query 2 vs. Query 3Query 2 vs. Query 3Query 2:
SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite
Query 3:
SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite
Select the number of bids made at each auction site in the past five minutes. Update the results every minute.
Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes.
Query 3:
Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes.
SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite
streamscan (Bid) B1
unnest (auctionsite, datetime)
groupby(B1.auctionsite, B.winId, count(*))
bucketizer Add window ranges to tuples
Punctuate end of window
… … … …T1
…T2
Site A
Site B Site C
T1A1-1A 5:01
Query 3 Details
streamscan (Bid) B1
unnest (auctionsite, datetime)
groupby(B1.auctionsite, B.winId, count(*))
bucketizer Add window ranges to tuples
Punctuate end of window
… … … …T1
…
T2
T1AA 5:01 T2AA 5:07A (,5:05]
T2A2-2A 5:071-1A *
window 1, site A: T1A
T1BB 5:02A (,5:10]
2-2A * T1B1-1B 5:02
window 1, site B: T1B
window 2, site A: T2A
window 1, site B: T2B
A, 1, 1 A, 1, 2
T2B1-1B 5:04
Site A
Site B Site C
SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite
Auctionsite, count, wid
T2BB 5:04
Window 1: 5:00 – 5:05Window 2: 5:05 – 5:10
Legend:
DiscussionDiscussion BucketizerBucketizer
Apply a function to the streamApply a function to the stream Encapsulate window semanticsEncapsulate window semantics Punctuate-Aware Punctuate-Aware
e.g. punctuation on time -> punctuation on wide.g. punctuation on time -> punctuation on wid Wid is used as a grouping/join attributeWid is used as a grouping/join attribute
PunctuatorPunctuator Adds timestamp as an attributeAdds timestamp as an attribute - optional- optional Enforce punctuationsEnforce punctuations - optional- optional Converts stream semantics to punctuationsConverts stream semantics to punctuations Outputs punctuationsOutputs punctuations Punctuations signal the end of windows, results are Punctuations signal the end of windows, results are
output and state is purgedoutput and state is purged
ConclusionsConclusions
Process window queries without Process window queries without specialized window operatorsspecialized window operators
Flexible window semanticsFlexible window semantics Use punctuate-aware operators, introduce Use punctuate-aware operators, introduce
minimum number of new operatorsminimum number of new operators
Future WorkFuture Work
Semantics of window operatorsSemantics of window operators Performance of different implementationsPerformance of different implementations Study affect of disorderStudy affect of disorder Groupby ? WindowGroupby ? Window
Questions?Questions?
… …… …