online aggregation joseph m. hellerstein peter j.haas helen j.wang

17
Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang Presented by Archana Vijayalakshmanan

Upload: verda

Post on 07-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang. Presented by Archana Vijayalakshmanan. Contents. Introduction Example Advantages Requirements Approaches to building a system System issues Conclusion. +. AVG. Query Results. 3.262574342. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Online AggregationJoseph M. HellersteinPeter J.HaasHelen J.Wang

Presented by

Archana Vijayalakshmanan

Page 2: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Contents

Introduction Example Advantages Requirements Approaches to building a system System issues Conclusion

Page 3: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Online Aggregation: Motivation

Select AVG(grade) from ENROLL; A “fancy” interface:

+Query Results

AVG3.262574342

Page 4: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

A Better Approach

Don’t process in batch! Online aggregation:

Page 5: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Example Select AVG(grade) from ENROLL

GROUP BY major;

Page 6: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Advantages

• stopping condition set on the fly!• statistical techniques are more sophisticated• can handle GROUP BY w/o a priori

knowledge

Page 7: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Requirements

Usability Continuous output

non-blocking query plans

time/precision control fairness/partiality

Performance time to accuracy time to completion pacing

Page 8: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

A Naive Approach

SELECT running_avg(final_grade),

running_confidence(final_grade),

running_interval(final_grade) FROM grades;No groupingCan’t meet performance & usability needs:

no guarantee of continuous output no guarantee of fairness (or control over partiality) no control over pacing

Page 9: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Random Access to Data

Heap ScanOK if clustering uncorrelated to agg & grouping attrs

Index Scan can scan an index on attrs uncorrelated to agg or

grouping Sampling from indices

could introduce new sampling access methods (e.g. Olken’s work)

Page 10: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Group By & Distinct

• Can’t sort! sorting blocks sorting is unfair

• Must use hash-based techniques non-blocking approach but do not scale gracefully.

• Hybrid Hashing.• “Hybrid Cache” even better.

Page 11: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Index Striding

For fair Group By:read tuples in round-robin fashion.

(want random tuple from Group 1, random tuple from Group 2, ...)

each group is updated at appropriate rate.gives info/speed match!

Page 12: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Join Algorithms

Non-Blocking Joinsno sorting!merge join OK, but watch for the sorted output hybrid hash not greatsymmetric pipeline hashnested loops always good, can be too slow

Page 13: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Query Optimization

Avoid sorting Blocking sub-operations

2 components in cost function: dead time (td ): time spent doing “invisible” work -- tax

this at a high rate! output time (to ): time spent producing output

Preference to plans that maximize user control e.g., index striding

Page 14: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Extended Aggregate Functions

Basically,aggregate functions must provide running estimates

SUM,COUNT-straight forward

VAR,STD DEV-algorithms return confidence intervals

Page 15: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

APICurrent API uses built-in methods

e.g., StopGroup(cursor,groupval) speedUpGroup(cursor,groupval)

slowDownGroup(cursor,groupval)

setSkipFactor(cursor name,integer)

Page 16: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

Future Work

Better UI -online data visualization (Tioga DataSplash)

data viz = “graphical” aggregate

- “drill down” and roll up” facilities Nested Queries Control w/o Indices Checkpointing/continuation Tracking online queries Extensions of statistical results

Page 17: Online Aggregation Joseph M. Hellerstein Peter J.Haas Helen J.Wang

References

control.cs.berkeley.edu/online/olamd/olamd.PPT