olap operations - ulisboa · olap operations helena galhardas ... references • a. vaisman and e....
TRANSCRIPT
11/9/16
1
OLAP Operations
Helena Galhardas DEI/IST
References
• A. Vaisman and E. Zimányi, Data Warehouse Systems: Design and Implementation, Springer, 2014 (chpt 3)
• J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001 (chpt. 2)
• C. Ciferri, R. Ciferri, L.I. Gómez, M. Schneider, A.A. Vaisman, E. Zimányi, Cube algebra: a generic user-centric model and query language for OLAP cubes. Int. J. Data Warehousing Mining 9(2), 39–65, 2013
• A. Wichert, H. Galhardas, SAD slides, MEIC/IST
2
11/9/16
2
Recap. the multidimensional model
• Multidimensional model enables one to view data from multiple perspectives and at several levels of detail
3
Example Dimensions: perspectives used to analyze the data
– Example: A 3-dimensional cube for sales data with dimensions Product, Time, and Customer, and a measure Quantity
• Attributes describe dimensions – Product dimension may have attributes ProductNumber and UnitPrice (not shown)
• Cells or facts have associated numeric values called measures – Each cell of the data cube represents Quantity of units sold by category, quarter, and
customer’s city
4
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
measure values
dimensions
11/9/16
3
Hierarchies • Allow viewing data at several granularities
– Define a sequence of mappings relating lower-level, detailed concepts to higher-level ones
– The lower level is called the child and the higher level is called the parent
– The hierarchical structure of a dimension is called the dimension schema
– A dimension instance comprises all members at all levels in a dimension
• Example – Hierarchies of the Product, Time, and Customer dimensions
5
All
Category
Product
ProductAll
Year
Semester
Quarter
Month
Day
TimeAll
Continent
Country
State
City
Customer
Customer
Outline
• OLAP operations
6
11/9/16
4
OLAP Operations: definition
• Allows these perspectives and several levels of detail to be materialized by exploiting dimensions and their hierarchies
• Provide an interactive data analysis environment
• Supported by OLAP modules – Ex: Saiku (Pentaho)
7
OLAP Operations (1)
8
Q4
FranceGermany
Product (Category)
Tim
e (Q
uarte
r)
Beverages
Q3
Q2
Q1
Condiments
SeafoodProduce
Cust
omer
(Cou
ntry
)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
...
ParisLyon
Köln
Product (Category)
Tim
e (M
onth
)
Beverages
Mar
Feb
Jan
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Dec
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments SeafoodProduce
Custo
mer(C
ity)
11/9/16
5
9
Q4
Köln
Berlin
Paris
Produ
ct
(Cate
gory
)
Time (Quarter)
Beverages
Q3Q2Q1
Lyon
CondimentsSeafood
Produce
Cus
tom
er (C
ity)
Q4
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Condiments
SeafoodProduce
ParisLyon
Product (Category)Ti
me
(Qua
rter
)Beverages
Q2
Q1
Condiments
SeafoodProduceCus
tomer
(City
)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
OLAP Operations (2)
OLAP Operations (3)
10
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
11/9/16
6
OLAP Operations (4)
11
SUM BY Time, Customer
84
72
93
84
Q4
Customer (City)
Tim
e (Q
uart
er)
Paris
96
Q3
Q2
Q1
Berlin
Lyon
89 106
79
8865105
82 77
61112 102
Köln
max() by quarter and city
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
• According to the authors of the book, agg. functions can be classified as:
– cumulative: compute the measure value of a cell from several other cells(e.g., SUM; COUNT, AVG)
– filtering: filter the members of a dimension that appears in the result (MIN, MAX); must compute not only the aggregated value but also detemine the dimension members that belong to the result
OLAP Operations (5)
12
...
ParisLyon
Köln
Product (Category)
Tim
e (M
onth
)
Beverages
Mar
Feb
Jan
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Dec
...
ParisLyon
Köln
Product (Category)
Tim
e (M
onth
)
Beverages
Mar
Feb
Jan
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Dec
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity) Bilbao
Madrid
11/9/16
7
Algebra of OLAP Operations
• There is not yet a standard definition of OLAP operations in a similar way to the relational algebra
• Many proposals of OLAP algebra in the literature • We adopt the one proposed in [Ciferri et al 2013]
13
Algebra of OLAP Operations - rollup
• Roll-up: aggregates measures along a dimension hierarchy (using an aggregate function) to obtain measures at a coarser granularity ROLLUP(CubeName, (Dimension Level), AggFunction(Measure))
14
Q4
FranceGermany
Product (Category)
Tim
e (Q
uarte
r)
Beverages
Q3
Q2
Q1
Condiments
SeafoodProduce
Cust
omer
(Cou
ntry
)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
ROLLUP(Sales2012, Customer Country, SUM(Quan;ty))
11/9/16
8
Algebra of OLAP Operations – drill-down
• Drill-down moves from a more general level to a more detailed level in a hierarchy – DRILLDOWN(CubeName, (Dimension Level))
15
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
...
ParisLyon
Köln
Product (Category)
Tim
e (M
onth
)
Beverages
Mar
Feb
Jan
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Dec
DRILLDOWN(Sales2012, Time Month)
Algebra of OLAP Operations – sort
• Sort returns a cube where the members of a dimension have been sorted – SORT(CubeName, Dimension, Expression [ASC | DESC]) – where the members of Dimension are sorted according to the value of
Expression
16
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
SORT(Sales2012, Product, Category)
11/9/16
9
• Pivot (or rotate): rotates the axes of a cube to provide an alternative presentation of data – PIVOT(CubeName, (Dimension Axis)*) – where the axes are specified as {X; Y; Z; X1; Y1; Z1; : : :}.
17
Algebra of OLAP Operations – pivot
Q4
Köln
Berlin
Paris
Produ
ct
(Cate
gory
)
Time (Quarter)
Beverages
Q3Q2Q1
Lyon
CondimentsSeafood
Produce
Cus
tom
er (C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
PIVOT(Sales2012, Time X, Customer Y, Product Z)
• Slice: removes a dimension in a cube so a cube of n-1 dimensions is obtained from a cube of n dimensions
– SLICE(CubeName, Dimension, Level = Value)
• Dimension will be dropped by fixing a single Value in the Level; other dimensions unchanged
• Slice supposes that the granularity of the cube is at the specified level of the dimension
18
Algebra of OLAP Operations – slice
Q4
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Condiments
SeafoodProduceQ4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
SLICE(Sales2012, Customer, City = ’Paris’)
11/9/16
10
• Dice: keeps the cells of a cube that satisfy a Boolean condition Φ – DICE(CubeName, Φ)
• Φ is a Boolean condition over dimension levels, attributes, and measures.
19
Algebra of OLAP Operations – dice
ParisLyon
Product (Category)
Tim
e(Q
uart
er)
Beverages
Q2
Q1
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
DICE(Sales2012, (Customer.City = ’Paris’ OR Customer.City = ’Lyon’) AND (Time.Quarter = ’Q1’ OR Time.Quarter = ’Q2’))
• Drill-across: combines cells from two data cubes that have the same schema – DRILLACROSS(CubeName1, CubeName2, [Condition])
20
Algebra of OLAP Operations – drill-across
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Sales2011-‐2012 DRILLACROSS(Sales2011, Sales2012)
11/9/16
11
• Add Measure: adds new measures to a cube – ADDMEASURE(CubeName, (NewMeasure = Expression)* )
• Drop measure: Deletes a measure from a cube schema – DROPMEASURE(CubeName, Measure*)
21
Algebra of OLAP Operations – ADD-MEASURE
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
ADDMEASURE(Sales2011-‐2012, PercChange = (Quan;ty2011-‐Quan;ty2012)/Quan;ty2011)
• Another ex: – Computes the value of a cell by aggregating the measures of several
nearby cells
22
Algebra of OLAP Operations – ADD-MEASURE
...
ParisLyon
Köln
Product (Category)
Tim
e (M
onth
)
Beverages
Mar
Feb
Jan
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Dec
...
ParisLyon
Köln
Product (Category)
Tim
e (M
onth
)
Beverages
Mar
Feb
Jan
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Dec
ADDMEASURE(Sales2012, MovAvg = AVG(Quan;ty) OVER Time 2 CELLS PRECEDING)
11/9/16
12
• Aggregation functions in OLAP are also needed at the current granularity, that is without performing roll-up.
– AggFunction(CubeName, Measure) [BY Dimension*] – Cumulative: compute the measure value of a cell from several other cells; examples are
SUM, COUNT, and AVG – Filtering: Filters the members of a dimension that appear in the result; examples are
MIN and MAX. Filtering functions compute not only the aggregated value, but also the members of the dimension that belong to the result
23
Algebra of OLAP Operations – aggregate functions
SUM BY Time, Customer
84
72
93
84
Q4
Customer (City)
Tim
e (Q
uart
er)
Paris
96
Q3
Q2
Q1
Berlin
Lyon
89 106
79
8865105
82 77
61112 102
KölnQ4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
SUM(Sales2012, Quan;ty) BY Time, Customer
• Another example: max sales by quarter and city
24
Algebra of OLAP Operations – aggregate functions
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uarte
r)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
MAX(Sales2012, Quan;ty) BY Time, Customer
11/9/16
13
• Union merges two cubes having the same schema but disjoint instances • Ex: If CubeSpain is a cube having the same schema as the original cube but containing only
the sales to Spanish customers, we can perform: • Difference removes the cells in a cube that belong to another one; the two cubes must have
the same schema • Drill-through allows to move from data at the bottom level in a cube to data in the
operational systems from which the cube was derived; Could be used when trying to determine the reason for outlier values in a data cube 25
Algebra of OLAP Operations – union, difference, drill-through
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity)
Q4
ParisLyon
Köln
Product (Category)
Tim
e (Q
uart
er)
Beverages
Q3
Q2
Q1
Berlin
Condiments
SeafoodProduce
Custo
mer(C
ity) Bilbao
Madrid
UNION(Sales2012, SalesSpain)
Next Lecture
• Conceptual Data Warehouse Design
26