master of science in engineering in computer science (mse...

22
Master of Science in Engineering in Computer Science (MSE-CS) DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Seminars in Software and Services for the Information Society Umberto Nanni 1 Seminars of Software and Services for the Information Society Umberto Nanni Data Warehousing – Introduction to the Multidimensional Model

Upload: vuongbao

Post on 16-Feb-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Master of Science in Engineering in Computer Science (MSE-CS)

DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE

ANTONIO RUBERTI

(MSE-CS)

Seminars in Software and Services for the Information Society

Umberto Nanni

1Seminars of Software and Services for the Information SocietyUmberto Nanni

Data Warehousing – Introduction to the Multidimensional Model

Multidimensional model

event in the real worldU. Nanni, di anni 28,

acquista

uno scopettone

Pippo, di anni 32,

acquista

brillantante

Pluto, di anni 77,

acquista

dentifricio

Paperino, di anni 16,

acquista

lucido da scarpe

U. Nanni, aged 28,

buysDB design:

datum in the operational DB(rappresenting the event)

vendita di

uno scopettoneone box of “DS”

dish soap is sold

uno scopettonebrillantantedentifriciolucido da scarpebuys

a box of dish soapDB design:

selection of all and only features

having operational relevance

DW design:

..., definition of aggregations (with a given granularity)

of analytic interest to some users

2Seminars of Software and Services for the Information SocietyUmberto Nanni 2

fact in the enterpise DW

(measurement

in an n-dimensional

space)

sold N units of article X

at point od sale Y

on day Z

of analytic interest to some users

Events and Facts

• EVENT (in the real world) and FACT (of

interest) are terms whose meaning is interest) are terms whose meaning is

determined by the granularity (level of detail)

of the multidimensional representation of the

Data Warehouse

Example of FACT:

3Seminars of Software and Services for the Information SocietyUmberto Nanni

Example of FACT:

on 02/05/2012 in “Rome 23” PoS (Point of Sale) 278

boxes of soap have been sold, cashing 745 Euro

Multidimensional Cube

13

/11

/20

12

06

/11

/20

12

07

/11

/20

12

08

/11

/20

12

11

/11

/20

12

12

/11

/20

12

05

/11

/20

12

05

/05

/20

04

brusher

Roma

toothpaste

brusher

dish soap

broom

13

/11

/20

12

06

/11

/20

12

07

/11

/20

12

08

/11

/20

12

11

/

12

/11

/20

12

05

/11

/20

12

Firenze

L’Aquila

Roma

Torino

Venezia

4Seminars of Software and Services for the Information SocietyUmberto Nanni

units: 278

sales: € 745

Foggia

Palermo

Dimensions

13

/11

/20

12

06

/11

/20

12

07

/11

/20

12

08

/11

/20

12

11

/11

/20

12

12

/11

/20

12

05

/11

/20

12

← Ame →

Firenze

L’Aquila

Roma

Torino

Venezia

13

/11

/20

12

06

/11

/20

12

07

/11

/20

12

08

/11

/20

12

11

/

12

/11

/20

12

05

/11

/20

12

← p

lace

toothpaste

brusher

dish soap

broom

5Seminars of Software and Services for the Information SocietyUmberto Nanni

Foggia

Venezia

Palermo

A relational schema

for the multidimensional cube

TABLE(dimens-1, …, dimens-k, measur-1, …, measur-h)

dimensions(classification attributes)

measurements(variables, metrics, indicators, …)

Example:

SALES(product, city, date, numUnitsSold, sales)

6Seminars of Software and Services for the Information SocietyUmberto Nanni

IDENTIFICATIVE KEY

Functional dependence: dimensions → measurements

In the example: product, city, date → numUnitsSold, sales

Dimensional hierarchy

Each dimension is associated with a hierarchy that groups the values at different levels of aggregation (also orthogonally).(also orthogonally).

Each node of the hierarchy is called DIMENSIONAL ATTRIBUTE

Example (time dimension):

day

Functional Dependencies:

7Seminars of Software and Services for the Information SocietyUmberto Nanni

weekmonth

four-month

period

quarter

year

Functional Dependencies:

day→ week

day → month

month → quarter

month → four-month-period

quarter → year

four-month-period → year

ALL

Access to Data Warehouse

Reporting

• predefined needs (poss., parametrized)

• automated extraction of information

OLAP

• analysis needs that are not identifiable in advance (browsing on dimensions and indicators)

• interactive data exploration, searching for information of interest

8Seminars of Software and Services for the Information SocietyUmberto Nanni

information of interest

Data Mining

• analysis needs that are not identifiable in advance

• (semi)automatic data exploration

Reporting

Report with predefined structure and format

• query– typically based on restrictions and / or aggregations– typically based on restrictions and / or aggregations

• presentation– interleaved with text data

– in tabular and / or graphic form

– customized with environment variables (date, user, ...)

• generation– on user demand

9Seminars of Software and Services for the Information SocietyUmberto Nanni

– on user demand

– periodic

– under specified conditions

• distribution– preview/print, e-mail, web, …

OLAP

• implementation:

– analysis session, divided into a series of steps, – analysis session, divided into a series of steps,

each dependent on the results obtained

previously

• tipical users:

10Seminars of Software and Services for the Information SocietyUmberto Nanni

– domain experts, not necessarily computer

experts

OLAP: example

06

/11

/20

12

07

/11

/20

12

08

/11

/20

12

/20

12

12

/11

/20

12

05

/11

/20

12

Firenze

L’Aquila

Roma

Torino0

6/1

1/2

01

2

07

/11

/20

12

08

/11

/20

12

11

/11

/20

12

12

/11

/20

12

05

/11

/20

12

toothpaste

brusher

dish soap

11Seminars of Software and Services for the Information SocietyUmberto Nanni

Palermo

OLAP – Basic operations in the multidimensional

model

RESTRICTION (filtering based on values)

• selection on specific values of dimensional attributes• selection on specific values of dimensional attributes

• in case of equality with a single value, the operator is called

SLICING (one dimension disappears)

AGGREGATION (level of abstrazion increases)

• considered one dimension, group one dimensional attribute

12Seminars of Software and Services for the Information SocietyUmberto Nanni

• considered one dimension, group one dimensional attribute

to a higher level of granularity

• joining up to the highest level (the most generic, consisting of

a single value), a dimension disappears

OLAP: Roll-Up Operator

Aggregation along one dimension

→ use of an aggregate operator on a measurement attribute

Examples:

a. sales of each product by city and month

b. daily sales per city (total on all products)

c. daily sales by product (North / Central / South)

← day →← month →

13Seminars of Software and Services for the Information SocietyUmberto Nanni

← city

← city

→(a)

OLAP: Aggregative Operators

DISTRIBUTIVE: computable from:

1. partial aggregates1. partial aggregates

– Examples: SUM, MAX, MIN, …

ALGEBRAIC: computable from:

1. partial aggregates

2. finite set of support information

– Examples : AVG, standard deviation, …

14Seminars of Software and Services for the Information SocietyUmberto Nanni

– Examples : AVG, standard deviation, …

OLISTIC: computable from:

1. original data

– Example: rank of an array

OLAP: Drill-Down Operator

Disaggregation over one dimension

→ disaggregated data are assumed to be available

Examples:Examples:

a. from “sales of each product by city and month”

to “daily sales”

b. from “monthly sales by city (total on all products)”

to “daily sales by city and product”← date →← month →

15Seminars of Software and Services for the Information SocietyUmberto Nanni

← city

← city

→ (a)

OLAP: Cuboids (dimensions)

• given a data cube, you can perform different roll-up

operations up to the disappearance of a dimension

• given a k-dimensional data cube, there exist 2k• given a k-dimensional data cube, there exist 2k

possible cuboidsall

product date city

roll-up

16Seminars of Software and Services for the Information SocietyUmberto Nanni

product, date, city

product, data product, city date, citydrill-down

roll-up

OLAP: Cuboids (aggregation levels)

• given a data cube, you can perform different roll-up operations,

increasing the aggregation level (up to the disappearance of a

dimension)dimension)

• given a k-dimensional data cube, resp. with n1, n2, …, nk

agregation levels, there exist n1×n2 ×…×nk possible cuboids

{ }

brand month

roll-up

17Seminars of Software and Services for the Information SocietyUmberto Nanni

article, date

articlebrand, month

date

article, month brand, date

drill-down

roll-up

OLAP: Slice-and-Dice Operators

Selection (up to “slicing”)

Examples:

rinsing detergents

Examples:

from “monthly sales of products by city” to:

a. monthly sales of products by city in central Italy

b. monthly sales by city of dishwasher detergents

Se

pt-

12

Oct

-12

Au

g-1

2

Sp

tt-1

2

Oct

-12

Au

g-1

218Seminars of Software and Services for the Information SocietyUmberto Nanni

brooms

rinsing detergentsdishwasher detergents

Firenze

L’Aquila

Roma

Torino

Palermo

(b)

dishwasher detergents

Firenze

L’Aquila

Roma

Torino

Palermo

Sp

tt

Oct

OLAP: Pivoting Operators

Rotating dimensions on a bidimensional table

Example

from: sales of (products by month) by (city)from: sales of (products by month) by (city)

to: sales of (products by city) by (month)

RM FI AQ TO PArinsing det aug-12 34 23 12 56 65

sep-12 56 45 23 44 67oct-12 76 34 34 55 45

dishw. det. aug-12 57 46 35 79 88

aug-12 sep-12 oct-12rinsing det. RM 34 56 76

FI 23 45 34AQ 12 23 34TO 56 44 55PA 65 67 45

dishw. det. RM 57 79 99FI 46 68 57

AQ 35 46 57two

19Seminars of Software and Services for the Information SocietyUmberto Nanni

sep-12 79 68 46 67 90oct-12 99 57 57 78 68

brooms aug-12 46 35 24 68 77sep-12 68 57 35 56 79oct-12 88 46 46 67 57

AQ 35 46 57TO 79 67 78PA 88 90 68

brooms RM 46 68 88FI 35 57 46

AQ 24 35 46TO 68 56 67PA 77 79 57

twooperations

OLAP: Drill-Through Operator

• access data at a level of aggregation lower

than the Data Warehouse

• access to reconciled data (three-level

architecture) or to the operational DB

20Seminars of Software and Services for the Information SocietyUmberto Nanni

• a sort of extreme Drill-Down

OLAP: Drill-Across Operators

• Correlate data from two or more cubes, based on

the dimensional values, returning the value in a new the dimensional values, returning the value in a new

cube

Example:

Starting from

sales by product, date, city

employers by city

21Seminars of Software and Services for the Information SocietyUmberto Nanni

employers by city

calculate:

average sales by product, employee

A Book on Datawarehouse

Matteo Golfarelli, Stefano Rizzi

Data Warehouse DesignData Warehouse Design

Modern Principles and Methodologies

McGraw-Hill, 2009.

Matteo Golfarelli, Stefano Rizzi

22Seminars of Software and Services for the Information SocietyUmberto Nanni

Matteo Golfarelli, Stefano Rizzi

Data Warehouse

Teoria e Pratica della Progettazione (2a ed.)

McGraw-Hill, 2006.