unit-2 · what is dimensional modeling (dm)? dm is the process of dw design is done for particular...
TRANSCRIPT
Unit-2Dimensional Modeling & DW Design
1
Prepared By: Prof. V. K. Wani
What is Dimensional Modeling (DM)?
DM is the Process of DW Design
Is Done for Particular Business Process Units.
DM consist of different types of facts, Dimensions & attributes.
The Logical design of DW is called DM
Purpose of DM
2
Prepared By: Prof. V. K. Wani
Terms Used in DM
1. Dimensions: it has all information about measurableevents/ entity.
2. Dimension Table: Stores Attributes or parametersrelated to the dimensions.
3. Fact Tables: Contains the measures for particularevents. It contains measurement in numeric values andforeign key to dimensional tables.
Dimensional Modeling Cont…3
Prepared By: Prof. V. K. Wani
Dimensional Modeling Cont…4
Prepared By: Prof. V. K. Wani
Dimensional Modeling Process
Prepared By: Prof. V. K. Wani
5
Capture Requirement as per Business Process
Indentify the Dimensional Model Grain
Indentify the Dimensions
Indentify the Facts
Benefits of DM
Prepared By: Prof. V. K. Wani
6
Understandable
Better Query Performance
Extensible
Schema
Prepared By: Prof. V. K. Wani
7
For Database the Schema represent its logicaldesign.
The organization of DM element is also calledschema.
Types of Schema
1. Star Schema
2. Snowflake Schema
3. Fact Constellation/ Galaxy Schema
Star Schema
Prepared By: Prof. V. K. Wani
8
Looks like star with Points of stars as dimension table and center as Fact table. Dimension
Table 1
Dimension Table 4 Dimension
Table 3
Dimension Table 2
Dimension Table 5
Fact Table
Star Schema Cont…
Prepared By: Prof. V. K. Wani
9
Star Schema Cont…
Prepared By: Prof. V. K. Wani
10
Star Schema Cont…
Prepared By: Prof. V. K. Wani
11
Advantages of Star Schema
•Efficient Query Execution.•Provide Enhanced performance•Provides Better analytics•It is a De-Normalized model•It is easy to understand•Star schema is mostly used for BI tools.•Dimensional Attributes can be added or deleted easily.•It is most suitable for ad hoc request•Because of its flexibility and efficiency it is a most appropriatechoice.
12
Prepared By: Prof. V. K. Wani
Disadvantages of Star Schema
•Star Schema has slim scope to represent the facts & dimensionsrelatively.•Adding Historical data is major issue in analysis.•It is not appropriate to store detail data.
13
Prepared By: Prof. V. K. Wani
Snow Flake Schema
Prepared By: Prof. V. K. Wani
14
It is variant to star schema.
Some of Dimensional tables are Normalized.
Data is divided in to additional table because ofnormalization.
Snow flaking is the process of normalizingdimensional table in star schema.
It is more Complex than star schema for DW Design
Normalization of dimensional table reduces theredundancy of data
Snow Flake Schema Cont…
Prepared By: Prof. V. K. Wani
15
Snow Flake Schema Cont…
Prepared By: Prof. V. K. Wani
16
Advantages & Limitations
Advantages
1. Easier to update & maintain Normalized tables
2. Reduces Data Redundancy
3. Suitable for table having large numbers of attributes.
4. Handle Large amount of data than star schema
Disadvantages
1. It is a Complex Structure
2. Navigation through DW is Difficult
3. Query performance is slow .
17
Prepared By: Prof. V. K. Wani
Fact Constellation Schema
Having shape like Constellations of Star
Multiple Fact tables are connected
Most complicated schema
Dimensional tables are connected to multiple fact tables
18
Prepared By: Prof. V. K. Wani
Fact Constellation Schema Cont…19
Prepared By: Prof. V. K. Wani
Fact FactDimension
Fact
Dimension
DimensionDimension
Dimension
Dimension
Dimension
Fact Constellation Schema Cont…
Prepared By: Prof. V. K. Wani
20
Limitations
Prepared By: Prof. V. K. Wani
21
Comparatively Most Complex
Difficult to manage & Support
Complicated Design due to multiple joints
Star Vs Snow Flake Schema
.
22
Prepared By: Prof. V. K. Wani
Fact Tables
1. Recurring Snapshots: Contains a Data with a grain of pertransaction.
Ex.
1. Customer Bank Transaction
2. Sales Transaction
3. Election Vote Counting
23
Prepared By: Prof. V. K. Wani
Customer Transaction Type Amount Date
Cust 1 Debit 1000 10/07/16
Cust 1 Debit 2000 11/07/16
Cust 1 Credit 5000 12/07/16
Fact Tables Cont…
2. Snapshots Fact Tables: Keeps the snapshot of data taken atspecified time.
Ex.
1. Bank Balance
2. Total Sale at end of day.
3. Voting Result
24
Prepared By: Prof. V. K. Wani
Customer Amount Date
Cust 1 1000 10/07/16
Fact Tables Cont…
3. Accumulating Fact Tables: It will list down all themeasurable incident between start and end of process.
Ex.
25
Prepared By: Prof. V. K. Wani
Date Order Status
11/07/16 Customer Put Order
12/07/16 Product Dispatched
13/07/16 Handed Over to Courier Company
14/07/16 Delivered to Courier Company
Dimensions26
Prepared By: Prof. V. K. Wani
Slowly Changing
Role Playing
Junk Dimensions
Degenerate Dimension
Conformed Dimension
BI
Slowly Changing Dimensions (SCD)
Prepared By: Prof. V. K. Wani
27
Is related to scenario where attributes records changed over time.
Types of SCD
1. Type-1 SCD (Overwrite)
2. Type-2 SCD (Partitioning)
3. Type-3 SCD (Alternate Realities)
Ex.
Customer ID Name City
100 ABC Pune
Table 1: Original Data
Slowly Changing Dimensions Cont…
Prepared By: Prof. V. K. Wani
28
1. Type-1 SCD (Overwrite)
The Original Record is replaced by new record
The old Record dose not exist any more.
Customer ID Name City
100 ABC Nashik
Table 2: New Data
Slowly Changing Dimensions Cont…
Prepared By: Prof. V. K. Wani
29
1. Type-2 SCD (Partitioning)
The new record is added in to customer dimension table.
Both the record will kept
Customer ID Name City
100 ABC Pune
101 ABC Nashik
Table 1: New Data
Slowly Changing Dimensions Cont…
Prepared By: Prof. V. K. Wani
30
1. Type-3 SCD (Alternate Realities)
The original record is modified to reflect the change.
Customer ID Name Original City
CurrentCity
Effective Date
100 ABC Pune Nashik 10/10/2015
Table 1: New Data
Conformed Dimension
A Dimension having same meaning for every facttable.
It may be related to many fact table withoutchanging the meaning.
ETL Process Can be more efficient.
It also allows you to expand the dimensions.
31
Prepared By: Prof. V. K. Wani
Degenerate Dimension
A Dimension stored in a fact table & don’t have anyrelated dimension table.
32
Prepared By: Prof. V. K. Wani
Bill Number Product Id Quantity Amount
100 111 3 1000
101 112 1 2000
102 113 2 4000
Junk Dimension
Is a Dimension that offers suitable place to store the junkattributes.
Junk attributes such as random transaction code, flags etc.
33
Prepared By: Prof. V. K. Wani
Junk_ID TxnSuccess
TxnFailed
No Cash Invalid Pin
No Balance
1 1 0 0 0 0
2 0 1 1 0 0
3 0 1 0 1 0
4 0 1 0 0 1
Role Playing Dimension
A Dimension having same Multiple valid relationwith a fact table.
34
Prepared By: Prof. V. K. Wani
Order Date
Shipment Date
Closure Date
Date Key
Click stream Source Data
It helps you to informed you more informed contents by enabling you customer visit.
Primary source for Click Stream is to monitor users browsers window and what urls are visited by customers.
35
Prepared By: Prof. V. K. Wani
Google Analytics as Click stream Source Data
It is a service offered by a Google that tracks and reports website traffic.
Data provided by GA can be used for further analysis.
It is also use full for decision making.
36
Prepared By: Prof. V. K. Wani
Google Analytics Cont…
Prepared By: Prof. V. K. Wani
37
Google Analytics Cont…
Prepared By: Prof. V. K. Wani
38
Google Analytics Cont…
Prepared By: Prof. V. K. Wani
39
Funnel Visualization in Google Analytics
Funnel enables us to visualize a process by visual representation of data. It allows us
1. Identify the steps that are creating confusion for customer
2. To understand customer behavior or status during checkout.
3. Identify the technical issues such as bugs, browsers issues & other technical constraints if any.
4. Determine Poor Performing Pages.
40
Prepared By: Prof. V. K. Wani
Funnel Visualization Cont…
41
Prepared By: Prof. V. K. Wani
Funnel Visualization Cont…
42
Prepared By: Prof. V. K. Wani
Funnel Visualization Cont…
43
Prepared By: Prof. V. K. Wani
Funnel Visualization Cont…
Prepared By: Prof. V. K. Wani
44
First WebpageSearch for Music CD/DVDs
Add the Music CD/ DVDs to Cart
Purchase CD/ DVDs By Making Payment
Final Check Out
Customer can Drop offAt any point/Stage
Funnel Visualization Cont…
Prepared By: Prof. V. K. Wani
45
1000 Visitors Search the CD/DVDs
750 Visitors Added CD/DVDs to Cart
500 Visitors Purchased CD/DVDs
200 Visitors Final Checked out
Facts46
Prepared By: Prof. V. K. Wani
Fact is a measurement of Business Activity.
Facts are numeric in nature.
Facts can be combined or calculated.
Fact table consist of two columns: Key & measures.
Types of Facts
1. Additive Facts
2. Semi Additives Facts
3. Non Additives Facts
Additive Facts
47
Prepared By: Prof. V. K. Wani
Additive Facts are measures of the fact table that can be added through out all dimensions..
Ex
Product Sale of Big Bazar
Date Branch No of Items Sold
1st January 2015 Nashik 1000
1st January 2015 Pune 1500
2nd January 2015 Nashik 2000
2nd January 2015 Pune 2500
Additive Facts
48
Prepared By: Prof. V. K. Wani
The Number of Products sales Across Dates.
The Number of Products sales Across Dates.
Product Sale of Big Bazar
Date No of Items Sold
1st January 2015 1000
1st January 2015 1500
Total Sale 2500
Product Sale of Big Bazar
Branch No of Items Sold
Nashik 1000
Nashik 2000
Total Sale 3000
Semi Additive Facts
49
Prepared By: Prof. V. K. Wani
Semi Additive Facts are measures of the fact table that can be added across some dimensions but can not be in rest of dimensions.
Ex. Account Balance, Student Attendance.Banking System
Date Account No. Current Balance
1st January 2015 1234 1000
1st January 2015 5678 1500
2nd January 2015 1234 2000
2nd January 2015 5678 2500
Semi Additive Facts
50
Prepared By: Prof. V. K. Wani
The Current Balance of all Account on 1st January.
The Current Balance of Account 1234 Across date.
.
Banking System
Date Account No. Current Balance
1st January 2015 1234 1000
1st January 2015 5678 1500
Total Balance Across all Accounts on 1st January 2500
Banking System
Date Account No. Current Balance
1st January 2015 1234 1000
2nd January 2015 1234 2000
Total Balance Across all Accounts on 1st January 3000
Non Additive Facts
51
Prepared By: Prof. V. K. Wani
Non Additive Facts are measures of the fact tablethat can’t be added through out the dimensions.
Ex: Unit Price, Ration Temperature etc…
Product Sale of Big Bazar
Date Branch Profit Margin %
1st January 2015 Nashik 30
1st January 2015 Pune 40
2nd January 2015 Nashik 60
2nd January 2015 Pune 50
Non Additive Facts
52
Prepared By: Prof. V. K. Wani
Adding Profit Margin Keeping Branch Constant
Adding Profit Margin Keeping Date Constant
Product Sale of Big Bazaar
Date Branch Profit Margin %
1st January 2015 Nashik 30
2nd January 2015 Nashik 60
Total Profit Margin 90
Product Sale of Big Bazar
Date Branch Profit Margin %
1st January 2015 Nashik 30
1st January 2015 Pune 40
Total Profit Margin 70
Hierarchical Dimensions
Prepared By: Prof. V. K. Wani
53
The Dimensions which are able to group things together in a such a waythat an organization is able to measure its performance.
The Dimensions which can be further classified.
Ex. Geographic location, Time Etc
Geographic_Key
Street
Locality
City
State
Country
Time_Key
Hours
Days
Weeks
Months
Year
Multi valued Dimensions
Prepared By: Prof. V. K. Wani
54
A Dimensions having multiple values associated with itwith different cases is called as multi valued Dimensions.
Ex.
1. Treatment of Patient on various diagnoses.
2. Account hold by customer of Various Types.
Methods to Handle Scenario
1. Extend The Attribute
2. Bridge Table
Multi valued Dimensions Cont…
Prepared By: Prof. V. K. Wani
55
Bridge Table: Bridge table will combine the variousattributes of dimensions.
Gap Between Fact and dimension table can be bridge bythis method.
Bank Account
Accnt_Id
Cust_ID
Emp_ID
Account
Accnt_Id
Cust_ID
Accnt_Type
Account Bridge
Accnt_Id
Multi valued Dimensions Cont…
Prepared By: Prof. V. K. Wani
56
Thank You
Prepared By: Prof. V. K. Wani
57