Transcript
  • Slide 1

Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference, Cairo, Egypt Presented By Supriya Sudheendra Slide 2 Outline Slide 3 Introduction o Approximate Query Processing is a viable solution for: Huge amounts of data High query complexities Stringent response-time requirements o Decision Support Systems Support business and organizational decision-making activities Helps decision makers compile useful information from raw data, solve problems and make decisions Slide 4 Introduction o DSS users pose very complex queries to the DBMS Requires complex operations over GB or TBs of disk- resident data Very long time to execute and produce exact answers Number of scenarios where users prefer a fast, approximate answers Slide 5 Prior Work o Previous Approximate query processing techniques Focused on specific forms of aggregate queries Data reduction mechanism how to obtain the synopses of data o Sampling-based Techniques A join-operator on 2 uniform random samples results in a non-uniform sample having very few tuples For non-aggregate queries, it produces a small subset of the exact answer which might be empty when joins are involved. Slide 6 Prior Work o Histogram Based Techniques Problematic for high-dimensional data Storage overhead High construction cost o Wavelet Based Techniques Mathematical tool for hierarchical decomposition of functions Apply wavelet decomposition to input data collection > data synopsis Avoids high construction costs and storage overhead Slide 7 Contribution of the Paper o Viability and effectiveness of wavelets as a generic tool for high-dimensional DSS o New, I/O-efficient wavelet decomposition algorithm for relational tables o Novel Query processing algebra for Wavelet-Co- Efficient Data Synopses o Extensive Experiments Slide 8 Background o Mathematical tool to hierarchically decompose functions o Coarse overall approximation together with detail coefficients that influence function at various scales o Haar wavelets are conceptually simple, fast to compute o Variety of applications like image editing and querying Slide 9 One-Dimensional Haar Wavelets o How to compute, given a data array: Average the values together pairwise to get a lower- resolution representation of data Detailed coefficients-> differences of the averages from the computed pairwise average Reconstruction of the data array possible Why Detail Coefficients Slide 10 One-dimensional Haar Wavelets o Wavelet Transform: Overall average followed by detail coefficients in increasing order of resolution. Each entry->wavelet coefficient o W A = [4, -2, 0, -1] o For vectors containing similar values, most detail coefficients have small values that can be eliminated Introduces only small errors Slide 11 One-dimensional Haar Wavelets o Overall average more important than any detail coefficient o To normalize the final entries of W A, each wavelet coefficient is divided by 2 l l: level of resolution W A = [4, -2, 0, -1/ 2] Slide 12 Multi-dimensional Haar Wavelets o Haar wavelets can be extended to multi-dimensional array Standard Decomposition Fix an ordering for the data dimensions(1,2,d) Apply complete 1-D wavelet transform for each 1-d row of array cells along dimension k Nonstandard Decomposition Alternates between dimensions during successive steps of pairwise averaging and differencing for each 1-D row of array cells along dimension k Repeated recursively on quadrant containing all averages across all dimensions Slide 13 Non-standard Decomposition Pairwise averaging and differencing for one positioning of 2x2 box with root [2i 1, 2i 2 ] Distribution of the results in the wavelet transform array Process is recursed on lower-left quadrant of W A Slide 14 Example Decomposition of a 4 X 4 Array Slide 15 Multi-dimensional Haar coefficients: Semantics and Representation o D-dimensional Haar basis function corresponding to w is defined by: D-dimensional rectangular support region Quadrant sign information Slide 16 Support Regions for 16 Nonstandard 2-D Haar Basis Function Blank areas regions of A whose reconstruction is independent of the coefficient WA[0,0] overall average WA[3,3] contributes only to upper right quadrant Slide 17 Haar CoEfficients: Semantics and Representation o W = W.R d-dimensional support hyper-rectangle of W encloses all cells in A to which W contributes Hyper-rectangle represented by low and high boundaries across each dimension j, 1


Top Related