data warehousing theory and modelling techniques graduate course on dimensional modelling

22
Data warehousing Data warehousing theory and modelling theory and modelling techniques techniques Graduate course on Graduate course on dimensional modelling dimensional modelling

Upload: adrian-blankenship

Post on 21-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

Data warehousing theory Data warehousing theory and modelling techniquesand modelling techniques

Graduate course on Graduate course on dimensional modellingdimensional modelling

Page 2: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1. EXTENDED DIMENSION 1. EXTENDED DIMENSION TABLE DESIGNSTABLE DESIGNS

1.1.1.1. Many-to-many dimensions Many-to-many dimensions

1.2.1.2. Many-to-many traps Many-to-many traps

1.3.1.3. Role-playing dimensions Role-playing dimensions

1.4.1.4. Organisation and parts hierarchies Organisation and parts hierarchies

1.5.1.5. Time stamping the changes Time stamping the changes

1.6.1.6. Building an audit dimension Building an audit dimension

1.7. conclusion: too few or too many 1.7. conclusion: too few or too many dimensionsdimensions

Page 3: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2. EXTENDED FACT TABLE 2. EXTENDED FACT TABLE DESIGNSDESIGNS

2.1. Facts of differing granularity and 2.1. Facts of differing granularity and allocatingallocating

2.2. Time of day2.2. Time of day

2.3. Multiple units of measurement2.3. Multiple units of measurement

2.4. Value band reporting2.4. Value band reporting

Page 4: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.1. Many-to-many dimensions:1.1. Many-to-many dimensions:if one of the dimensions has many valuesif one of the dimensions has many values

Page 5: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.1. Many-to-many 1.1. Many-to-many dimensions:dimensions:

if one of the dimensions has many valuesif one of the dimensions has many values

Solution: Solution: – A A bridge tablebridge table between the fact between the fact

table and the many-to-many table and the many-to-many dimensiondimension

– we must generalise the original we must generalise the original diagnoses key in the fadiagnoses key in the facct table to be a t table to be a special diagnoses key and we use a special diagnoses key and we use a group keygroup key and and a weighting factora weighting factor which sums up to 1.00which sums up to 1.00

Page 6: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.1.1.1. Many-to-many dimensions: Many-to-many dimensions:if one of the dimensions has many valuesif one of the dimensions has many values

Page 7: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.2.1.2. Many-to-many traps: Many-to-many traps:be aware of different cardinality when a be aware of different cardinality when a

dimension is attached to several fact tablesdimension is attached to several fact tables

Page 8: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.3.1.3. Role-playing dimensions Role-playing dimensions

A role in a data warehouse is a situation where a single dimension A role in a data warehouse is a situation where a single dimension appears several times in the same fact table. Then the underlying appears several times in the same fact table. Then the underlying dimension may exist as a single physical table, but each of the dimension may exist as a single physical table, but each of the roles must be presented in a separately labelled view.roles must be presented in a separately labelled view.

E.g. the telecommunications industryE.g. the telecommunications industryof a single call we might register:of a single call we might register:source system providersource system providerlocal switch providerlocal switch providerlong distance providerlong distance provideradded value service provideradded value service provider

Page 9: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.4. Organisation and parts 1.4. Organisation and parts hierarchieshierarchies

Page 10: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.4. Organisation and parts 1.4. Organisation and parts hierarchieshierarchies

Page 11: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.4. Organisation and parts 1.4. Organisation and parts hierarchieshierarchies

Page 12: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.5. Time stamping1.5. Time stamping

E.g. human resources environment of a E.g. human resources environment of a large enterprise with 100.000 employeeslarge enterprise with 100.000 employees

3 kinds of queries against the HR data:3 kinds of queries against the HR data:1. summary statistics of the entire employee data base (monthly)1. summary statistics of the entire employee data base (monthly)2.2. to profile the employee population at any moment in time to profile the employee population at any moment in time

(month end or not)(month end or not)3.3. we demand that every employee transaction can be we demand that every employee transaction can be

represented distinctly: we thus want to see represented distinctly: we thus want to see eevery transaction on very transaction on a given employee, with the correct transaction sequence and a given employee, with the correct transaction sequence and the correct timing of each transactionthe correct timing of each transaction

3 = 3 = detailed transaction theorydetailed transaction theory or or fundamental truthfundamental truth

Page 13: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.5. Time stamping1.5. Time stamping e.g. an employee transaction dimension time stamped e.g. an employee transaction dimension time stamped

with current and next transaction dates and timeswith current and next transaction dates and times

Page 14: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

1.6. Building an audit dimension1.6. Building an audit dimension during the extract process in the data staging phaseduring the extract process in the data staging phase

Page 15: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2.1. Facts of differing granularity 2.1. Facts of differing granularity and allocatingand allocating E.g. shipment invoiceE.g. shipment invoice

Individual fact records should be on the lowest atomic levelIndividual fact records should be on the lowest atomic level When faced with facts of differing granularity, try to force all When faced with facts of differing granularity, try to force all

the facts to the lowest levelthe facts to the lowest level When allocating facts to the lowest level is impossible, the When allocating facts to the lowest level is impossible, the

higher level facts have to be presented in separate tableshigher level facts have to be presented in separate tables

High-level plans or forecastsHigh-level plans or forecasts: make sure that the actual : make sure that the actual aggregates exist at the same level of the plansaggregates exist at the same level of the plans– 1. Aggregate table and plan table share exactly the same dimensions1. Aggregate table and plan table share exactly the same dimensions

=> combine them in 1 single physical table=> combine them in 1 single physical table– 2. Aggregate table and plan table have 2. Aggregate table and plan table have different different dimensionsdimensions

=> combination in 1 single physical table is impossible=> combination in 1 single physical table is impossible

Page 16: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2.1. Facts of differing granularity 2.1. Facts of differing granularity and allocatingand allocating

Page 17: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2.1. Facts of differing granularity 2.1. Facts of differing granularity and allocating:and allocating: high level plans or high level plans or

forecastsforecasts

Page 18: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2.1. Facts of differing granularity 2.1. Facts of differing granularity and allocating:and allocating: high level plans or high level plans or

forecastsforecasts

The dimension plan version makes it impossible to combine with an aggregate table

Page 19: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2.2. Time of day2.2. Time of day

Page 20: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2.3. Multiple units of measure2.3. Multiple units of measure

The wrong design when fact table quantities need to be expressed in several units of measure

Page 21: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2.3. Multiple units of measure2.3. Multiple units of measure

The recommended design for multiple units of measure

Page 22: Data warehousing theory and modelling techniques Graduate course on dimensional modelling

2.4. Value band reporting2.4. Value band reporting

Report of form:Report of form:

balancebalance number of number of total of total of

rangerange accounts accounts balancesbalances

0-10000-1000 45678 45678 $10222543$10222543

1001-2000 36154 1001-2000 36154 $45455789$45455789

2001-5000 11485 2001-5000 11485 $30851455$30851455