semantics and evaluation techniques for window aggregates in data streams jin li, david maier,...

Post on 28-Mar-2015

216 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Semantics and Evaluation Semantics and Evaluation Techniques for Window Techniques for Window

Aggregates in Data StreamsAggregates in Data Streams

Jin Li, David Maier, Kristin TuftJin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. e, Vassilis Papadimos, Peter A.

TuckerTuckerSIGMOD 2005SIGMOD 2005

IntroductionIntroduction

Window aggregation is an important Window aggregation is an important query capacity.query capacity.

Evaluating window aggregate Evaluating window aggregate queries over streams is non-trivial.queries over streams is non-trivial. OverlappingOverlapping Confusion by window definition with Confusion by window definition with

physical streamphysical stream Out-of-order data arrival.Out-of-order data arrival. ……

TechniquesTechniques

Window-ID (WID): Window-ID (WID): OverlappingOverlapping Confusion by window definition with Confusion by window definition with

physical streamphysical stream Punctuation: Punctuation:

Out-of-order data arrivalOut-of-order data arrival

Example 1Example 1

Q1:SELECTQ1:SELECT seg-id, max(speed), min(speeseg-id, max(speed), min(speed)d) FROMFROM Traffic [RANGE Traffic [RANGE 300 seconds 300 seconds

SLIDE SLIDE 60 seconds 60 seconds WATTR ts] WATTR ts]

GROUP BY seg-idGROUP BY seg-id

Example 1Example 1

tuple

Window SemanticsWindow Semantics

Window semantics often has been Window semantics often has been described operationally.described operationally. Example: some window query operators Example: some window query operators

process window extents sequentially, but data process window extents sequentially, but data arrivals without in window extent’s order.arrivals without in window extent’s order.

Window SpecificationWindow Specification

Window specification: a window type and Window specification: a window type and a set of parameters that defines a window a set of parameters that defines a window to be used by a query.to be used by a query. ex: RANGE, SLIDE and WATTR in Q1.ex: RANGE, SLIDE and WATTR in Q1.

Different window aggregate query has Different window aggregate query has different window specification.different window specification. Sliding window aggregate query.Sliding window aggregate query.

Stream Query:Stream Query: Data-drivenData-driven Domain-drivenDomain-driven

Window SpecificationWindow Specification

Similar to the CQL (Continuous Query LanSimilar to the CQL (Continuous Query Language).guage). Different: user specified WATTR and SLIDE pDifferent: user specified WATTR and SLIDE p

arameters.arameters.

Sliding Window AggregateSliding Window Aggregate

Time-based:Time-based: Q1Q1

Row-based:Row-based:

RANGE and SLIDE are different attributes:RANGE and SLIDE are different attributes:

Sliding Window AggregateSliding Window Aggregate

Partitioned Window Aggregate:Partitioned Window Aggregate:

Using function: a variation of Q3Using function: a variation of Q3

Window Semantic Window Semantic FrameworkFramework

Three functions for mapping between window-idThree functions for mapping between window-ids and tuples in both directionss and tuples in both directions windowswindows, , extentextent and and wids.wids.

T T : a set of tuples.: a set of tuples. S S : window specification: window specification windows windows ((TT,,SS): set of window-ids that identify wi): set of window-ids that identify wi

ndow extents to which tuples in T may belongs.ndow extents to which tuples in T may belongs. extentextent ( (ww,,TT,,SS): the set of tuples in T belonging to ): the set of tuples in T belonging to

the window extent identified by the window extent identified by ww,, ( , )w window T S

windowswindows, , extentextent

queries in which RANGE and SLIDE are queries in which RANGE and SLIDE are specified on the WATTR attribute:specified on the WATTR attribute:

slide-by-tuple:slide-by-tuple:

slide-by-n_tuples:slide-by-n_tuples:

slide-by-n_tuples over logical order:slide-by-n_tuples over logical order:

partitioned tuple-based:partitioned tuple-based:

Mapping Tuples to Mapping Tuples to Window-idsWindow-ids

widswids: Function for identifying window extent to w: Function for identifying window extent to which tuple hich tuple tt belongs. belongs.

queries in which RANGE and SLIDE are specifiequeries in which RANGE and SLIDE are specified on the WATTR attribute:d on the WATTR attribute:

slide-by-tuple (and variations):slide-by-tuple (and variations):

( , , ) { | ( )}wids t T S w W t extent w

( , , [ , ,1, ])

{ | ( . . )}

wids t T S RANGE RATTR row num

w W tWATTR w t RATTR RANGE

Partitioned tuple-base:Partitioned tuple-base:

r=rankr=rank((t,row-num,PATTR,Tt,row-num,PATTR,T))

Towards Window Query Towards Window Query EvaluationEvaluation

Backward-context Backward-context Given a tuple Given a tuple tt, it’s backward-context is , it’s backward-context is

information about tuples that have arrived information about tuples that have arrived before before tt . .

ex: partitioned tuple-based window.ex: partitioned tuple-based window. Forward-contextForward-context

Given a tuple Given a tuple tt, it’s backward-context is , it’s backward-context is information about tuples that have arrived information about tuples that have arrived after after tt..

ex: slide-by-tuple.ex: slide-by-tuple. FCF( forward-context free)FCF( forward-context free) FCA (forward-context award)FCA (forward-context award)

DisorderDisorder

Merging unsynchronized streams, netMerging unsynchronized streams, network delays.work delays. ex: network flow sometimes use start time ex: network flow sometimes use start time

as timestamp.as timestamp. Methods: slack , BSort, heartbeats.Methods: slack , BSort, heartbeats.

FCF Window with WID FCF Window with WID ApproachApproach

Punctuation: A message embedded Punctuation: A message embedded in a data stream indicating that a in a data stream indicating that a certain subset of data is complete. certain subset of data is complete. WID uses punctuations to signal the WID uses punctuations to signal the end of window extents.end of window extents.

wids function

punctuation

FCA Windows with WID FCA Windows with WID ApproachApproach

FCB (forward-context bounded)FCB (forward-context bounded) FCU (forward-context unbounded)FCU (forward-context unbounded)

PerformancePerformance

Environment:Environment: Data generator: XMark data generator, anData generator: XMark data generator, an

d network analysis tool.d network analysis tool. 1. data in generated order.1. data in generated order. 2. data in bounded-disorder2. data in bounded-disorder 3. data in block-sorted-disorder.3. data in block-sorted-disorder. Comparison: buffering mechanism.Comparison: buffering mechanism.

ParametersParameters

R: RANGER: RANGE S: SLIDES: SLIDE

ResultResult

WID V.S. BufferingWID V.S. Buffering

ResultResult

ConclusionConclusion

top related