g. papastefanatos 1, p. vassiliadis 2, a. simitsis 3, y. vassiliou 1 (1) national technical...

31
G. Papastefanatos 1 , P. Vassiliadis 2 , A. Simitsis 3 , Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr (2) University of Ioannina, Ioannina, Hellas (Greece) [email protected] (3) HP Labs, Palo Alto, California, USA [email protected] Design Metrics for Data Warehouse Evolution

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

G. Papastefanatos1, P. Vassiliadis2, A. Simitsis3, Y. Vassiliou1

(1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

(2) University of Ioannina, Ioannina, Hellas (Greece) [email protected]

(3) HP Labs, Palo Alto, California, USA [email protected]

Design Metrics for Data Warehouse Evolution

Page 2: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 2

Outline

• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions

Page 3: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 3

Outline

• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions

Page 4: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 4

Motivation

WWW Act1

Act2

Act3

Act4

Act5

Data warehouses are evolving environments, e.g.:–A dimension is removed or renamed–The structure of a dimension table is updated–A fact table is completely decoupled from a dimension–The measures of a fact table change–An ETL source is modified, etc.

Page 5: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 5

Evolution Effects

• SW and data artifacts around the warehouse (e.g., ETL activities, materialized views, reports) are affected:– Syntactically – i.e., become invalid– Semantically – i.e., must conform to the new source database

semantics• Adaptation to new semantics

– time-consuming task– treated in most of the cases manually by the

administrators/developers• Evolution-driven design is missing

Page 6: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 6

We would like to know…

• Can we measure and quantify in a principled way the vulnerability of certain parts of a data warehouse environment and find these constructs that are most sensitive to evolution?

• Can we predict and quantify the impact of a change towards the rest system?

• What are the “right” measures for evaluating the quality of the design of a data warehouse, with respect to its evolution capabilities?

Page 7: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 7

Outline

• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions

Page 8: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 8

Data Warehouse Schema Evolution Our approach

Mechanism for performing what-if analysis for potential changes of database configurations

Graph based representation of database constructs (i.e., relations, views, constraints, queries)

Annotation of graph with rules for adapting queries to database schema evolution

Evolving databases

QueriesDatabase Schema

Graph-based modeling

for uniform representation

Metrics for Evaluating Evolution

Design

Evolving applications

Rules for Handling Evolution

Page 9: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 9

Graph based representation

map-select

map-select

S

Sgroup by

from

=whereop2

op1

GB group by

W.EMP#.FK

op

op

SS

S SS

S

EMP.PK

op

V

WORKS

EMP

SalNameEmp#

Emp# Hours Proj#

HOURS

Emp#

Module

Module

Module

Module

Qfrom

map-select map-selectSUMT_HOURS

op2

op1>=

50K

AND

wherewhere

S

Emp#

S

from

map-select

SELECT Emp#, SUM(Hours) as T_HOURSFROM VGROUP BY Emp#

CREATE VIEW V AS SELECT Emp#, HoursFROM EMP E, WORKS WWHERE E.Emp# = W.Emp#AND E.Sal >= 50K

WORKS (Emp#, Proj#,Hours)

EMP(Emp#, Name, Sal)

Page 10: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 10

Graph Annotation with rules

We annotate

For reacting toW

ith rule

Set of graph elements· Query Node: Q1· Attribute Node: EMP.E_TITLE· View Node: Emps_Prjs, etc.

Set of rules· Propagate· Block· Prompt

Set of evolution events· Add Attribute· Delete Attribute· Rename View, etc.

1

3

2

Page 11: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 11

Graph Adaptation

Annotated Query GraphEvent

Add attribute Phone to relation EMP

Transformed Query Graph

Q

Name

EID

Name

EID

S

S

EMPS

S

map-select

map-select …

ON attribute addition TO EMP THEN propagate

Q: SELECT EID, Name FROM EMP

Q: SELECT EID, Name, Phone FROM EMP

Q

Name

EID

Name

EID

S

S

EMPS

S

map-select

map-select …

ON attribute addition TO EMP THEN propagate

Phone

S

Phone

S

map-select

Page 12: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 12

Outline

• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions

Page 13: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 13

Simple Metrics

Simple: in-degree, out-degree, degreeEMP.Emp# is more

“important” than EMP.SAL, w.r.t. how many nodes depend directly on it

map-select

map-select

S

S

from

=whereop2

op1

W.EMP#.FK

op

op

SS

S SS

S

EMP.PK

op

V

WORKS

EMP

SalNameEmp#

Emp# Hours Proj#

HOURS

Emp#

Module

Module

Module

map-select

op2

op1>=

50K

AND

wherewhere

from

Page 14: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 14

Transitive Metrics

Transitive: in-degree, out-degree, degree

Variant with a view + query is more “complicated” wrt how many nodes are involved in the propagation of EMP.Emp# towards the end

map-select

map-select

S

Sgroup by

from

=whereop2

op1

GB group by

W.EMP#.FK

op

op

SS

S SS

S

EMP.PK

op

V

WORKS

EMP

SalNameEmp#

Emp# Hours Proj#

HOURS

Emp#

Module

Module

Module

Module

Qfrom

map-select map-selectSUMT_HOURS

op2

op1>=

50K

AND

wherewhere

S

Emp#

S

from

map-select

Page 15: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 15

Zoomed-out degrees

41V

WORKS

EMP

Q

3

3

• Only top-level nodes are retained• Only one edge between modules is retained weighted

with the number of edges suppressed

Simple degreesTransitive degrees

Page 16: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 16

Entropy-based metrics

P(v|yk) =

Vy

i

k

i

yvpaths

yvpaths

),(

),(, for all nodes yi V.

V

WORKS

EMP

Q

Probability that a node v is affected by an event occurring on another node yi :

ExamplesP(Q|V) = 1/3, P(Q|EMP) = 1/3, P(V|WORKS) = 1/2

Page 17: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

Entropy-based metrics - continued

Entropy of a node v: The “sensitivity” that a node v is affected by a random event on the graph.

ER'08, Barcelona, October 2008 17

Vy

ii

i

yvPyvPvH )|(log)|( 2 , for all nodes yi V.

Page 18: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 18

Outline

• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions

Page 19: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

Testbed Configuration

ER'08, Barcelona, October 2008 19

TPC-DS benchmark: Web Sales schema with 3 variants–Original (1 fact – 13 dimensions)–Surrounded with views–Customer dimensions merged

Page 20: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 20

Distribution of Evolution Events

Operation Distribution 1 Distribution 2 Rename Measure 29% (15) 0% (0)Add Measure 25% (13) 0% (0)Rename Dimension Attribute 21% (11) 0% (0)Add Dimension Attribute 15% (8) 37% (25)Delete Measure 6% (3) 0% (0)Delete Dimension Attribute 4% (2) 44% (30)Delete FKs 0% 13% (9)Delete Dimension Table 0% 6% (4)

Distr 1: Recorded from the Greek Public sector

Distr 2: Migration to a pure star schema

Page 21: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 21

Evaluating effectiveness

• Effectiveness – how well our metrics can “forecast” the impact of

events over the different constructs of the schema• Configuration

– we used mainly the Distr. 1 of events (real data) – we tested nine configurations based on

• variations of the schema– Web Sales (WS), Web Sales extended with views (WS-views),

star variant of Web Sales (WS-star)• variations of the policy

– Block-All, Propagate-All, Mixture

Page 22: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 22

Events affecting dimensions

(a) WS schema (b) WS-star schema

Page 23: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 23

Total no. events on Views (schema=WS-views, policy=block)

0

10

20

30

40

50

60

70

80

90

100

ALL_SALES

PROMOTIO

NAL_SALE

CUSTOMER_C

UST_ADDRESS

DAYS30

DAYS30_6

0

DAYS60_9

0

DAYS90_1

20

DAYS120

WEBSA

LES_IT

EM_D

ATEDIM AT PT

NumberAffected

StrengthOut

StrengthTotal

WS-views schema

Events affecting views

Page 24: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 24

Events affecting queries

(a) WS schema (b) WS-star schema

Page 25: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 25

Comparison of design configurations

(a) only affected queries (b) all affected nodes

for Distr. 1

Page 26: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 26

Comparison of design configurations

(a) only affected queries (b) all affected nodes

for Distr. 2

Page 27: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 27

Outline

• Motivation• Graph-based modeling & DW Evolution• Metrics for data warehouse evolution• Evaluation• Conclusions

Page 28: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

Conclusions

• A framework for handling the impact of changes in a DW environment

• A set of metrics for DW evolution– simple– transitive– entropy-based

• An extensive experimental evaluation based on both, real and synthetic dataset

• Platform: Hecataeus– A tool for visualizing and performing what-if analysis

for evolution scenariosER'08, Barcelona, October 2008 28

Page 29: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 29

Gracias!

Hecataeus: A tool for visualizing and performing what-if analysis

for evolution scenarios

http://www.cs.uoi.gr/~pvassil/projects/hecataeus/index.html

Page 30: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 30

http://www.cs.uoi.gr/~pvassil/projects/architecture_graph/

Questions?

Page 31: G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr

ER'08, Barcelona, October 2008 31

Gracias!

Sources: http://en.wikipedia.org/wiki/Image:Barcelona_-_planol_ciutat_vella_1860.jpg

http://maps.google.com