![Page 1: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/1.jpg)
Parallel Multi-Dimensional ROLAP Indexing
Andrew Rau-ChaplinFaculty of Computer Science
Dalhousie University
Joint work with
Frank Dehne, Carleton Univ.
Todd Eavis, Dalhousie Univ.
![Page 2: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/2.jpg)
Data Warehousing for Decision Support
Operational data collected into DW
DW used to support multi-dimensional views
Views form the basis of OLAP processing
Our focus: the OLAP server
Data MiningAnalysisQuery Reports
Olap ServerOlap Server
Meta Data Repository
MonitoringAdministration
Operational Databases
Data Warehouse
Data Marts
External Sources
ExtractClean
TransformLoad
Refresh
Output
Front-End Tools
Olap Engines
Data Storage
Data Cleaningand
Integration
![Page 3: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/3.jpg)
Multi-dimensional views
Collection of feature attributes
Aggregate along one or more measure attributes
Reduce the granularity by “collapsing” dimensions
Points generated by: distributive functions(e.g.,
sum) algebraic functions (e.g.,
average) holistic functions(e.g.,
median)
Red
White
Blue
By Make & Colour
By Colour
By Make
1993
19901991
1992
ChevyFord
By Year
By Colour & Year
By Make & Year
![Page 4: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/4.jpg)
Data Cube Generation
Proposed by Gray et al in 1995
Can be generated “manually” from a relational DB but this is very inefficient
Exploit the relationship between cuboids to compute all 2d cuboids
In OLAP environments, we typically pre-compute these views to improve query response time
ABC
AB AC BC
A C B
ALL
![Page 5: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/5.jpg)
Existing Parallel Results
Goil & ChoudharyMOLAP solution
in-memory structures global partition + d
communication rounds
distributed viewsLimitations
Memory for multi-dimensional arrays
expensive communication for larger d
J. Of Data Mining & Knowledge Discovery 1(4), 1997
![Page 6: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/6.jpg)
Our Approach
ROLAP solution Construct and cost the
data cube lattice Find a “least cost”
spanning tree Partition the spanning tree
over the processors equally, construct views and distribute
Can handle partial cubes
Limitations What about indexing?????
ABCD
ABC ABD ACD BCD
AB AC AD BC BD CD
AA BB CC DD
All
CCGrid’01 + J. Dist. & Parallel Databases 11(2), 2001
![Page 7: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/7.jpg)
Parallel Multi-dimensional Indexing
Query specifies a range on multiple dimensions
Forms a hypercube in the point space
![Page 8: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/8.jpg)
General Approach
No multidimensional index is universally successful
Exploit domain specific information and the features of a particular index
OLAP Data is provided up front Updates are batch oriented
![Page 9: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/9.jpg)
Design Goals
A framework for distributed high-performance indexing of ROLAP cubes Practical to implement Low communication volume Fully adapted to external memory (disks) No shared disk required Incrementally maintainable Efficient for high D spatial searches Scalable in terms of data size,
dimensions, processors
![Page 10: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/10.jpg)
Challenge
How to order and partition data such that Number of records retrieved per node is
as balanced as possible Minimize the number of disk seeks
required in answering a queryABC
P1 P2 P3 P4
![Page 11: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/11.jpg)
Indexing the Data Cube
Combine the strengths of a space filling and an r-tree index
Use Hilbert curve to load buckets
Index buckets with r-tree
Update indexes with merge/sort
![Page 12: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/12.jpg)
Space Filling Curves & Striping
![Page 13: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/13.jpg)
Query Retrieval
P1 P2 P3 P4
ABC ABC ABC ABC
![Page 14: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/14.jpg)
Example
Original Space Processor 1 Processor 2
8 points to be reported
Reports:2 consecutive blocks & 4 points
Reports:2 consecutive blocks & 4 points
![Page 15: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/15.jpg)
The Parallel Framework
A single view is partitioned across p processors
Partial Hilbert/r-tree indexes are computed locally
Queries are answered concurrently
Queries answered individually or “piggy-backed”
![Page 16: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/16.jpg)
The Virtual Data Cube
Problem: Full cube often to large to materialize
Solution: Use surrogate views
![Page 17: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/17.jpg)
Surrogate Processing
![Page 18: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/18.jpg)
Other issues…
Dimension orderingQuery piggybacking Batch updatingManaging Hierarchies of views
![Page 19: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/19.jpg)
Experimental Results
Machine 17 node cluster Node = 1.8 GHz Xeon, 1 GB RAM, 2 * 40
GB IDE drives, running Linux Interconnect = Intel Fast Ethernet
switchTest Data
10 dimensions and 1,000,000 records
![Page 20: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/20.jpg)
RCUBE index Construction
Output: ~640 million rows, 16 Gigabytes
![Page 21: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/21.jpg)
Distributed Query Resolution
Test: Random queries returning ~15% of points (10 experiments per point)
![Page 22: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/22.jpg)
Disk blocks retrieved vs. Disk Seeks
Test: Random queries returning 5-15% of points (15 experiments per point)
![Page 23: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/23.jpg)
Distributed Query Resolution in Surrogate Group-bys
![Page 24: Parallel Multi-Dimensional ROLAP Indexing Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint work with Frank Dehne, Carleton Univ](https://reader035.vdocuments.us/reader035/viewer/2022062417/5517aa9f5503463e368b5dc0/html5/thumbnails/24.jpg)
Thank You
Questions?