computing the cube

1

Computing the cube

Abhinandan Das CS 632

Mar 8 2001

2

On the Computation of Multidimensional Aggregates

Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey Naughton, Raghu Ramakrishnan & Sunita Sarawagi

-- VLDB 1996

3

Motivation OLAP / Multidimensional data analysis Eg: Transactions(Prod,Date,StoreId,Cust,Sales)

Sum of sales by: (P,SId) ; (P) ; (P,D,SId) Computing multidimensional

aggregates is a performance bottleneck

Efficient computation of several related group-bys

4

What is a CUBE? n-dimensional generalization of the

group by operator Group-bys corresponding to all

possible subsets of a given set of attributes

Eg: SELECT P, D, C, Sum(Sales) FROM Transactions CUBE-BY P, D, C

ALL, P, D, C, PD, PC, DC, PDC

5

Approaches Basic group-by methods:

Sort based Hash based

Naïve approach

6

Possible optimizations1.1. Smallest parentSmallest parent2.2. Cache resultsCache results3.3. Amortize scansAmortize scans4.4. Share sortsShare sorts5.5. Share partitionsShare partitions Often contradictory Assumption: Distributive aggregate

function sum, count, min, max ; average-- Non distributive: median

7

Sort based methods Algorithm PipeSort Share-sorts Vs Smallest parent

Optimize to get minimum total cost Cache-results & amortize-scans

Pipelining: ABCD ABC AB A

8

PipeSort Assumption: Have an estimate of

the number of distinct values for each group-by

Input: Search lattice Graph where each vertex represents

a group-by of the cube Edge i j if |i|=|j|+1, ji

9

Search lattice: CUBE-BY ABCD

10

Search lattice (contd...) Each edge eij associated with two

costs: S(eij): Cost of computing j from i when i is

pre-sorted U(eij): Cost of computing j from i when i is

not already sorted Idea: If attribute order of a group-by j is a

prefix of parent i, compute j without sorting (Cost S(eij)) else first sort (Cost U(eij))

11

PipeSort (contd...) Proceed level by level, k=0 to k=N-1 Find best way of computing level k

from level k+1 Weighted bipartite matching:

Make k additional replicas of each node at level k+1

Cost S(eij) on original node, U(eij) on replicas Find minimum cost maximal matching

12

Min cost matching

13

Algorithm PipeSort For level k = 0 to N-1

Generate_plan(k+1 k) Fix sort order of level k+1 nodes

Generate_plan(k+1 k): Make k additional replicas of level

k+1 nodes, assign appropriate edge costs

Find min-cost matching

14

Example plan

15

Tweaks Aggregate and remove duplicates

whilst sorting Use partial sorting order to reduce

sort costs Eg: ABC AC

16

Hash based methods Algorithm PipeHash Can include all stated

optimizations: (If memory available)

For k=N downto 0 For each group-by g at level k+1

Compute in 1 scan of g all children for which g is smallest parent

Save g and deallocate hash table of g

17

PipeHash Limited memory Use

partitioning Optimization share-partitions:

When data is partitioned on attribute A, all group-bys containing A can be computed independently on each partition

No recombination required

18

PipeHash: Overview First choose smallest parent for

each group-by (gives MST) Optimize for memory constraints:

Decide what group-bys to compute together

Choice of attribute for data partitioning

Minimizing overall disk scan cost: NP-Hard!

19

PipeHash

20

Heuristics Optimizations cache-results and

amortize-scans favoured by choosing large subtrees of MST: Can compute multiple group-bys together

However, partitioning attribute limits subtree

Hence choose the partitioning attribute that allows choice of largest subtree

21

Algorithm: Worklist w=MST While w not empty

Choose any tree T from w T’ = select_subtree(T) Compute_subtree(T’)

22

Select_subtree(T) If mem reqd by T < M, return T Else: For any get subtree Ta

Let Pa=max # of partitions of root(T) possible if a used for partitioning

Choose a s.t. (mem reqd Ta)/Pa<M and

Ta is largest subtree over all a Add forest T-Ta to w, return Ta

Aa

23

Compute_subtree(T’) numParts = (mem reqd T’)*

fudge_factor/M Partition root(T’) into numParts For each partition of root(T’)

For each node n in T’ (breadth first) Compute all children of n in 1 scan Release memory occupied by hash table of n

24

Notes on PipeHash PipeHash biased towards smallest-

parent optimization Eg: Compute BC from BCD (fig) In practice, saving on sequential

disk scans less important than reducing CPU cost of aggregation by choosing smallest parent!

25

Overlap method Sort based Minimize disk accesses by

overlapping computation of “cuboids”

Focus: Exploit partially matching sort orders to reduce sorting costs

Uses smallest parent optimization

26

Sorted runs B = (A1,A2,...Aj) ; S=(A1,...Al-1,Al+1,...Aj) A sorted runsorted run of S in B is a maximal run

of tuples in B whose ordering is consistent with the sort order in S Eg:B=[(a,1,2),(a,1,3),(a,2,2),(b,1,3), (b,3,2),(c,3,1)] S=[(a,2),(a,3),(b,3),(b,2),(c,1)] (1st & 3rd)Sorted runs for S: [(a,2),(a,3)],[(a,2)],[(b,3)],

[(b,2)] and [(c,1)]

27

Partitions B, S have common prefix A1,...,Al-1

A partition partition of a cuboid S in B is the union of sorted runs s.t. the first (l-1) columns (ie common prefix) have the same value

Previous eg: Partitions for S in B are: [(a,2),(a,3)], [(b,3),(b,2)] & [(c,1)] Tuples from different partitions need not

be merged for aggregation Partition is independent unit of

computation

28

Overview Begin by sorting base cuboid All other cuboids computed w/o re-sorting Sort order of base cuboid determines sort

order of all other cuboids To maximize overlap across cuboid

computations, reduce memory requirements of individual cuboids

Since partition is unit of computation, while computing one sorted cuboid from another, just need mem sufficient to hold a partition

29

Overview (contd...) When partition is computed, tuples

can be pipelined to descendants; same memory used by next partition

Significant saving: PartSize << CuboidSize

Eg: Computing ABC and ABD from ABCDPartSize(ABC) = 1 PartSize(ABD)=|D|

30

Choosing parent cuboids Goal: Choose tree that minimizes

size of partitions Eg: Better to compute AC from

ACD than ABC Heuristic: Maximize size of

common prefix

31

Example cuboid tree

32

Choosing overlapping cuboids To compute a cuboid in memory,

need memory = PartSize If required memory is available,

cuboid is in PartitionPartition state Else allocate 1 memory page for the

cuboid, and mark as SortRun SortRun state Only tuples of a PartitionPartition cuboid can

be pipelined to descendants

33

Heuristics Which cuboids to compute and in

what state: Opt allocation NP-hard! Heuristic: Traverse tree in BFS

order Intuition:

Cuboids to the left have smaller partition sizes

So require less memory

34

Cuboid computation For each tuple t of B (parent)

If (state==partition) process_partition(t) Else process_sorted_run(t)

Process_partition(t): 3 cases:

Tuple starts new partition Tuple matches existing tuple in partition New tuple: Insert at appropriate place

35

Cuboid computation (contd...) Process_sorted_run(t):

3 cases Tuple starts new sorted run Tuple matches last tuple in current run New tuple: Append tuple to end of current run

Cuboid in Partition state fully computed in 1 pass

Cuboid in SortRun state: Combine merge step with computation of descendant cuboids

36

Example CUBE computation

(M=25 ; 9 sorted runs of BCD, CD to merge)

37

An array based algorithm forsimultaneous multidimensional aggregates

Yihong Zhao, Prasad Deshpande, Jeffrey Naughton

-- SIGMOD ‘97

38

ROLAP Vs MOLAP CUBE central to OLAP operations ROLAP: Relational OLAP systems

PipeSort, PipeHash, Overlap MOLAP: Multidimensional OLAP

systems Store data in sparse arrays instead of

relational tables

39

MOLAP systems Relational tuple: (jacket, K-mart, 1996, $40) MOLAP representation:

Stores only ‘$40’ in a sparse array Position in array encodes (jacket,K-mart,1996)

Arrays are “chunked” and compressed for efficient storage

40

Problem No concept of “reordering” to bring

together related tuples Order cell visits to simultaneously

compute several aggregates whilst minimizing memory requirements and # of cell visits

41

Efficient array storage: Issues Array too large to fit in memory: Split

into “chunks” that fit in memory Even with chunking, array may be

sparse: Compression needed Standard PL technique of storing arrays

in row/column major form inefficient Creates asymmetry amongst dimensions,

favoring one over the other

42

Chunking Divide n-D array into small n-D

chunks and store each chunk as independent object on disk

Keep size of chunk = disk block size

We shall use chunks having same size along each dimension

43

Compressing sparse arrays Store “dense” chunks as is (>40% occ.) Already a significant compression over a

relational table Sparse arrays: Use chunk-offset

compression – (offsetInChunk,data) Better than LZW etc. because:

Uses domain knowledge LZW data requires decompression before

use

44

Loading arrays from tables Input: Table, dim sizes, chunksize If M < array size, partition sets of chunks

into groups which fit in memory eg: Suppose 16 chunks and 2 partitions, group chunks 0-7 & 8-16

Scan table. For each tuple, calculate & store (chunk#,offset,data) into buffer page for corresponding partition

2nd pass: For each partition, read tuples and assign to chunks in memory. Compress.

45

Basic algo (No overlapping) Eg: 3-D array ABC; To compute AB If array fits in memory, sweep plane of size

|A|*|B| along dim C, aggregating as you go If array chunked: Sweep plane of size |Ac|*|Bc| through upper left chunks. Store

result, move to chunks on right Each chunk read in only once Mem: 1 chunk + |Ac|*|Bc| plane

46

Generalization Sweep k-1 dimensional subarrays

through a k-dimensional array Multiple group-bys: Use smallest parent

optimization in cube lattice Advantage over ROLAP: Since

dimension & chunk sizes known, exact node sizes can be computed

Min Size Spanning Tree (MSST): Parent of node n is parent node n’ in lattice of min size

47

Basic array cubing algorithm: First construct MSST of the group-bys Compute a group-by Di1Di2...Dik from

parent Di1...Di.k+1 of min size: Read in each chunk of Di1...Di.k+1 along

dimension Di.k+1 and aggregate each chunk to a chunk of Di1...Dik. Once a chunk of

Di1...Dik is complete, flush and reuse mem

48

Example ABC – 16x16x16 array Chunk size: 4x4x4 Dimension order: ABC Eg: Computing BC: Read in order

1..64 After every 4, flush chunk to disk

and reuse memory

49

3-D array (Dim order ABC)

50

Multi-Way algorithm Single pass version: Assume enough

memory to compute all group-bys in 1 scan

Reduce memory requirements using a special order to scan input array, called dimension orderdimension order

A dimension orderdimension order of the array chunks is a row major order of the chunks with n dimensions D1...Dn in some order

O = (Di1,Di2,...Din)

51

Memory requirements For a given dimension order, can determine

which chunks of each group-by need to stay in memory to avoid rescanning input array

Eg: Suppose D.O. is ABC ie 1..64 BC: Sweep 1 chunk, deallocate & reuse AC: Sweep 4 chunks for entire AC AB: Sweep 16 chunks for entire AB Note: Each BC chunk is generated in DO.

Before writing a BC chunk to disk, use it to compute B,C chunks as if read in D.O.

52

Memory requirements Generalizing: (Xc=chunk size, Xd=dim size) Computing BC requires |Bc|*|Cc| memory Computing AC, AB requires |Ad|*|Cc| and

|Ad|*|Bd| memory RULE1: For a gp-by (Dj1,...,Djn-1) of array

(D1,...Dn) with DO=(D1,...Dn), if (Dj1,...,Djn-1) contains a prefix of (D1,...Dn) of length p, then mem requirement for computing (Dj1,...,Djn-1) is:

p

i

n

piii CD

1

1

1

||*||

53

Minimum Memory Spanning Tree MMST: Got from lattice by

choosing parents for each node N as per RULE1

Choose the parent that minimizes memory requirements: The prefix of the parent node contained in node N must have minimum length

Break ties by choosing node with minimum size as parent

54

MMST

55

Optimal dimension order Different dimension orders may generate

different MMSTs which have vastly different memory requirements

Eg: 4D array ABCD Dimension sizes: 10,100,1000,10000 resp. Chunk size =10x10x10x10

Figure shows MMSTs for dim orders ABCD and DBCA

Optimal dimension order is (D1,...,Dn) where

||...|||| 21 nDDD

57

Multi-pass Multi-way Array Algorithm

Single pass algo assumed we had memory MT required by MMST T of optimal dimension order

If M <= MT, we cannot allocate memory for some of the subtrees of MMST (incomplete subtrees)

Optimal memory allocation to different subtrees likely to be NP-Hard

58

Heuristic Allocate memory to subtrees of

root from the right to left order Intuition: Rightmost node will be

the largest array and we want to avoid computing it in multiple passes

59

Algorithm Create MMST T for opt D.O. O Add T to ToBeComputed list For each tree T’ in ToBeComputed list:

Create working subtree W and incomplete subtrees Is (allocate mem=node chunksize)

Scan chunks of root of T’ in order O Compute group-bys in W and write to disk Write intermediate result of aggregation of Dj1...Djn-1

group by for each chunk of D1,...,Dn

For each R=root(I) Generate chunks of Dj1...Djn-1 from the partitions of R & write

to disk (Merge several intermediate results into 1 chunk) Add I to ToBeComputed

60

ROLAP vs MOLAP Proposed algorithm more efficient than

existing ROLAP techniques (even indirectly): Scan relational table and load into array Compute the cube on resulting array Dump resulting CUBEd array into tables

Why better? ROLAP table sizes much bigger than compressed

array sizes Multiple sorts reqd. MOLAP does not require sorts

since the multidimensional array captures relationships amongst all the different dimensions

computing the cube

Documents

cost of computing j

pdcapproachesbasic group

compute j

sorting cost seij

sort cost ueijpipesort

sid p p

related groupbyswhat

additional replicas