(kajal maam(principles of dimensional modeling

Post on 29-Mar-2015

196 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Uploaded from Google Docs

TRANSCRIPT

Principles Of Dimensional Modeling

Design Requirements

Design of the DW must directly reflect the way the managers look at the business

Should capture the measurements of importance along with parameters by which these parameters are viewed

It must facilitate data analysis, i.e., answering business questions

3

What is Dimensional Modeling (DM)?

DM is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access.

Can be implemented using a relational or a multidimensional DBMS, with some restrictions.

It is different from ER modeling Every dimensional model is composed of one table with a

multipart key, called the fact table, and a set of smaller tables called dimension tables.

Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table.

This characteristic "star-like" structure is often called a star schema.

ER Modeling

A logical design technique that seeks to eliminate data redundancy

Illuminates the microscopic relationships among data elements

Perfect for OLTP systems Responsible for success of transaction processing in

Relational Databases

Problems with ER Model

ER models are NOT suitable for DW? End user cannot understand or remember an ER

Model Many DWs have failed because of overly complex ER

designs Not optimized for complex, ad-hoc queries Data retrieval becomes difficult due to normalization Browsing becomes difficult

ER vs Dimensional Modeling

ER models are constituted to Remove redundant data (normalization) Facilitate retrieval of individual records having

certain critical identifiers Thereby optimizing OLTP performance

Dimensional model supports the reporting and analytical needs of a data warehouse system.

7

Comparison between the ER & Dimensional Model

Dimensional Modeling ER Model

Support adhoc querying for business analyses and complex analyses

Support for OLTP

The data model is multidimensional

The data model has two dimensions

It is asymmetric It is symmetric

Permit redundancy Removes redundancy

It is extensible, application is not changed

If the model is modified ,applications are modified

It can be done independent of expected query patterns

It is variable in structure and very vulnerable to changes in the user’s querying habits

Easy and understandable Hard for people to visualize

Models the business practically Models the micro relationships among data elements

8

Dimension Modeling Concepts

Design goals :user understanability,Query performance,resilience to change

Components of DM: Fact Tables Dimension Tables

9

Inside Dimension table

Dimensional table key Large no. of attributes Textual attributes Attributes not directly related Flattened table,not normalized Ability to drill down/roll up Multiple hierarchies Less number of records

10

Inside Fact Table

Concatenated fact table key Grain/level of data identified Fully-additive-all dimensions Semi-additive-some dimensions Large no. of records Few attributes Sparsity of data Degenerate dimensions

11

Factless Fact Table

Some fact tables have no measured facts Useful to describe events and

coverage ,tables contain information that something has/has not happened

Often used to represent many-to-many relationships

The only thing they contain is concatenated key

12

Star Schema keys

Primary keys Surrogate keys Foreign keys

Modeling Design Process

1. Identify the Business Process Source of “measurements”

2. Identify the Grain What does 1 row in the fact table represent

or mean?

3. Identify the Dimensions Descriptive context, true to the grain

4. Identify the Facts Numeric additive measurements, true to the

grain

Step 1 - Identify the Business Process

This is a business activity typically tied to a source system.

Not to be confused with a business department or function. An Orders dimensional model should support the activities of both Sales and Marketing.

“If we establish departmentally bound dimensional models, we’ll inevitably duplicate data with different labels and terminology.”

Step 2 - Identify the Grain

The level of detail associated with the fact table measurements.

A critical step necessary before steps 3 and 4. Preferably it should be at the most atomic

level possible. “How do you describe a single row in the fact

table?”

Step 3 - Identify the Dimensions

The list of all the discrete, text-like attributes that emanate from the fact table.

They are the “by” words used to describe the requirements.

Each dimension could be though of as an analytical “entry point” to the facts.

“How do business people describe the data that results from the business process?”

Step 4 - Identify the Facts

Must be true to the grain defined in step 2. Typical facts are numeric additive figures. Facts that belong to a different grain belong in

a separate fact table. Facts are determined by answering the

question, “What are we measuring?” Percentages and ratios, such as gross margin,

are non-additive. The numerator and denominator should be stored in the fact table.

18

Advantages of star schema

Easy for users to understand Optimizes navigation Most suitable for query processing

19

DM:Advanced Topics

Slowly Changing dimensionsType 1 changes: Correction of errorsIs used when

the old value of the attribute has no significance or can be discarded.

Easy and Fast

Type 2 changes: preservation history Partitions history so that fact tables properly

reflect original values. Requires use of Surrogate Keys Causes table growth due to additional history rows Users must be aware of the added complexity Effective Dates used secondary to cleaner fact joins

20

Type 3 changes: tentative soft revisions Additional attribute used to capture changes.

Used less frequently then Type 1 or 2. Relate to tentative changes in the source systems. Used to compare performances. Ability to track forward and backward

21

Large Dimensions Rapidly changing dimensions

22

Snowflake Schema

Snowflaking is a method of normalizing the dimension tables in STAR schema

Advantages: Small savings in storage space Normalized structures are easier to update and maintainDisadvantage: Schema less intuitive and end users are put off by

complexity Ability to browse through the contents difficult Degraded query performance because of additional joins

23

Star Schema

24

Flattened Star

25CSE 5331/7331

F'07

Normalized Star

26CSE 5331/7331

F'07

Snowflake Schema

27

Snowflake Schema

Star Schema

Joins: Higher number of Joins Fewer Joins

Ease of Use:

More complex queries and hence less easy to understand

Less no. of foreign keys and hence lesser query execution time

Query Performance:More foreign keys-and hence more query execution time

Less no. of foreign keys and hence lesser query execution time

Ease of maintenance/change:

No redundancy and hence more easy to maintain and change

Has redundant data and hence less easy to maintain/change

Type of Data warehouse:

Good to use for small data warehouses/datamarts

Good for large data warehouses

Dimension table:

It may have more than one dimension table for each dimension

Contains only single dimension table for each dimension

DimTable Normalization:

3 Normal Form2 Normal Denormalized Form

28

Fact Constellation schema

It is shaped like constellation of stars For each star schema or snowflake schema it is possible to

construct a fact constellation schema This schema is more complex than star or snowflake architecture,

which is because it contains multiple fact tables allows dimension tables to be shared amongst many fact tables. solution is very flexible, however it may be hard to manage and

support. The main disadvantage of the fact constellation schema is a more

complicated design because many variants of aggregation must be considered

Different fact tables are explicitly assigned to the dimensions, which are for given facts relevant. This may be useful in cases when some facts are associated with a given dimension level and other facts with a deeper dimension level.

29

Dimensional Model Star Schema

30

Snow-Flake Schema in Dimensional Modeling

31

Fact Constellation Schema

top related