l16 l17 datawarehouse
DESCRIPTION
TRANSCRIPT
Rushdi Shams, Dept of CSE, KUET
Database SystemsDatabase Systems
Data WarehousingData WarehousingVersion 1.0Version 1.0
1
Rushdi Shams, Dept of CSE, KUET
The Advent of Data WarehousingThe Advent of Data Warehousing
The existing database models were not The existing database models were not suitable to meet the requirements.suitable to meet the requirements.
The requirements can be categorized into The requirements can be categorized into two-two-
1.1. Operational Use Operational Use 2.2. Decision Support UseDecision Support Use
2
Rushdi Shams, Dept of CSE, KUET
Operational UseOperational Use Requires precise, accurate, and Requires precise, accurate, and instantinstant picture of picture of
databasedatabase Day to day basis business-Day to day basis business-1.1. Customer comesCustomer comes2.2. Orders partsOrders parts
1.1. Search the partsSearch the parts2.2. Book/purchase the partsBook/purchase the parts3.3. Add datesAdd dates
1.1. Bank transactions on the purchase/ bookingBank transactions on the purchase/ booking2.2. InvoiceInvoice
3
Rushdi Shams, Dept of CSE, KUET
Operational UseOperational Use Customer-company direct interactionCustomer-company direct interaction All the information are processed All the information are processed
instantaneously (or almost instantaneously)instantaneously (or almost instantaneously)
4
Rushdi Shams, Dept of CSE, KUET
Decision Support UseDecision Support Use
Operational use magnifies the scope-Operational use magnifies the scope-Which customer, where he lives, what is his Which customer, where he lives, what is his phone number, which part he bought, how phone number, which part he bought, how much he paid, what was the date, bla bla bla…much he paid, what was the date, bla bla bla…
Decision support use narrows the scope-Decision support use narrows the scope-I need only the business related issues- which I need only the business related issues- which customer, which part he bought, how much customer, which part he bought, how much he paid and what was the datehe paid and what was the date
5
Rushdi Shams, Dept of CSE, KUET
Decision Support UseDecision Support Use
… … & the benefits are-& the benefits are- In december, the company may need to stock In december, the company may need to stock
DDR RAM more than HDDDDR RAM more than HDD SATA HDDs are more sold than PATA HDDsSATA HDDs are more sold than PATA HDDs Mr. X is our honourable customer who bought Mr. X is our honourable customer who bought
most of the RAMs and Mr. Y is our honourable most of the RAMs and Mr. Y is our honourable customer who bought most of the SATA HDDscustomer who bought most of the SATA HDDs
6
Rushdi Shams, Dept of CSE, KUET
And The War Begins…And The War Begins…
So, the conflict between lightspeed So, the conflict between lightspeed applications (OLTP) and slog future predictions applications (OLTP) and slog future predictions led an advent of data warehousing.led an advent of data warehousing.
7
Rushdi Shams, Dept of CSE, KUET
Relational DatabasesRelational Databases
Too granular, too many little piecesToo granular, too many little pieces Processing takes longer time for larger Processing takes longer time for larger
transactions by joining those little piecestransactions by joining those little pieces Very effective for Front End applications that Very effective for Front End applications that
are accessed by too many people too are accessed by too many people too frequentlyfrequently
Requires less hardware specificationRequires less hardware specification
8
Rushdi Shams, Dept of CSE, KUET
Data warehousingData warehousing
Processes large amount of informationProcesses large amount of information Too less users (basically the owners)Too less users (basically the owners) Mainly for reporting and analysisMainly for reporting and analysis Hardware requirements are hugeHardware requirements are huge
9
Rushdi Shams, Dept of CSE, KUET
The relation between themThe relation between them
Data Warehousing is simplest form of Data Warehousing is simplest form of relational databaserelational database
Try to only add data and remove data… Try to only add data and remove data… because most often changing requires huge because most often changing requires huge data processingdata processing
And you often do mistake in Keys for just two And you often do mistake in Keys for just two records, in this case you are dealing with records, in this case you are dealing with millions of records- so, think about data millions of records- so, think about data modificationsmodifications
10
Rushdi Shams, Dept of CSE, KUET
The relation between themThe relation between them
The one-many / many-many / many-one The one-many / many-many / many-one relations and key constraints of relational relations and key constraints of relational model is still present in data warehousingmodel is still present in data warehousing
11
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data Model
So, if data warehouse needs a different data So, if data warehouse needs a different data model rather than relational model, what that model rather than relational model, what that would be?would be?
The answer is dimensional data modelThe answer is dimensional data model
12
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data Model
Contains-Contains-1.1. FactsFacts2.2. DimensionsDimensions Fact table contains transactions. For Fact table contains transactions. For
example, invoices of all the customers for example, invoices of all the customers for the last 5 years.the last 5 years.
The dimension tables describe the fact table.The dimension tables describe the fact table.
13
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data ModelStatic Data
Dynamic Data
14
Rushdi Shams, Dept of CSE, KUET
The Star SchemaThe Star Schema
The most effective approach to model data The most effective approach to model data using dimensional data model is the using dimensional data model is the Star Star SchemaSchema
15
Rushdi Shams, Dept of CSE, KUET
The Star SchemaThe Star Schema
16
Rushdi Shams, Dept of CSE, KUET
The Star Schema: Equivalent DiagramThe Star Schema: Equivalent Diagram
17
Rushdi Shams, Dept of CSE, KUET
The Star Schema: PropertiesThe Star Schema: Properties
So, a star schema contains a fact table- which So, a star schema contains a fact table- which is robust as the time goes by, very dynamic, is robust as the time goes by, very dynamic, changes all the timechanges all the time
A star schema contains dimension tables- A star schema contains dimension tables- which are static, changes very little as the which are static, changes very little as the time goes bytime goes by
Star schema aids queries to join a bulky fact Star schema aids queries to join a bulky fact table with dimension tables to be simple and table with dimension tables to be simple and not time complexnot time complex
18
Rushdi Shams, Dept of CSE, KUET
The Snowflake SchemaThe Snowflake Schema
Normalized star schemaNormalized star schema Only the dimensions are normalizedOnly the dimensions are normalized The result is a fact table connected directly The result is a fact table connected directly
with some dimension tables and some with some dimension tables and some dimension tables connected to other dimension tables connected to other dimension tablesdimension tables
19
Rushdi Shams, Dept of CSE, KUET
The Snowflake SchemaThe Snowflake Schema
Fact Table
NormalizedDimension
Dimension
20
Rushdi Shams, Dept of CSE, KUET
The Snowflake Schema: Equivalent The Snowflake Schema: Equivalent ViewView
21
Rushdi Shams, Dept of CSE, KUET
The ProblemThe Problem Not too many tables but too many layersNot too many tables but too many layers The most used Relational algebra in The most used Relational algebra in
dimensional database is dimensional database is JoinJoin Too many tables in joins, too many overheads.Too many tables in joins, too many overheads. There are not many tables here There are not many tables here But too many layers, joining one table But too many layers, joining one table
requires joining other related tables requires joining other related tables And if one of those tables (Fact) have trillions And if one of those tables (Fact) have trillions
of data, you are dead!of data, you are dead!
22
Rushdi Shams, Dept of CSE, KUET
The ProblemThe Problem
If the SALE fact table has 1 million records, and all dimensions contain 10 records each, a Cartesian product would return 106 multiplied by 109 records. That makes for 1015 records
23
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
Convert the snowflake schema into star Convert the snowflake schema into star schema.schema.
24
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
25
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
a join occurs between one fact table and six dimensional tables. That is a Cartesian product of 106 multiple by 106, resulting in 1012 records returned.
26
Rushdi Shams, Dept of CSE, KUET
The DifferenceThe Difference
The difference between 1012 and 1015 is three decimals.
Three decimals is not just three zeroes and thus 1,000 records. The difference is actually 1,000,000,000,000,000 – 1,000,000,000,000 = 999,000,000,000,000.
27
Rushdi Shams, Dept of CSE, KUET
Types of Dimension TablesTypes of Dimension Tables
Dimension tables showed so far are inadequateDimension tables showed so far are inadequate Typically, there are some conventions for Typically, there are some conventions for
dimension tables.dimension tables. Such as dates and locations are two common Such as dates and locations are two common
dimension tables in data warehouses.dimension tables in data warehouses. Why?? Most businesses have two common Why?? Most businesses have two common
issues- date of a transaction, place of shipment/ issues- date of a transaction, place of shipment/ deliverydelivery
28
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: DatesTypes of Dimension Tables: Dates
29
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: DatesTypes of Dimension Tables: Dates
30
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: LocationsTypes of Dimension Tables: Locations
Locations, states, country, continent, etcLocations, states, country, continent, etc
31
Rushdi Shams, Dept of CSE, KUET
Let’s Create a Data Let’s Create a Data Warehouse ModelWarehouse Model
32
Rushdi Shams, Dept of CSE, KUET
The Relational ModelThe Relational Model
33
Rushdi Shams, Dept of CSE, KUET
Step 1Step 1
Identify the Fact tableIdentify the Fact table The Fact table contains (mostly) transactions The Fact table contains (mostly) transactions
that occur day-to-day basis/ that are related that occur day-to-day basis/ that are related with money/ anything that is the main with money/ anything that is the main purpose of a businesspurpose of a business
34
Rushdi Shams, Dept of CSE, KUET
Step 1: Finding the Fact TableStep 1: Finding the Fact Table
35
Rushdi Shams, Dept of CSE, KUET
Step 1Step 1
So, our fact table would be (in this case) So, our fact table would be (in this case) RoyaltyRoyalty
36
Rushdi Shams, Dept of CSE, KUET
Step 2: Find Dimension TablesStep 2: Find Dimension Tables
Find the tables that are static, not dynamic… Find the tables that are static, not dynamic… dynamic one is the Fact table.dynamic one is the Fact table.
We will take a look at both the static We will take a look at both the static (dimension) tables and dynamic (fact) tables (dimension) tables and dynamic (fact) tables when we will finish step 3when we will finish step 3
37
Rushdi Shams, Dept of CSE, KUET
Step 3Step 3
Develop a snowflake schema with the fact and Develop a snowflake schema with the fact and dimension tablesdimension tables
38
Rushdi Shams, Dept of CSE, KUET
Step 3: Snowflake SchemaStep 3: Snowflake Schema
39
Rushdi Shams, Dept of CSE, KUET
Step 3: Snowflake SchemaStep 3: Snowflake Schema
40
Rushdi Shams, Dept of CSE, KUET
Step 4Step 4
Develop a star schema by denormalizing the Develop a star schema by denormalizing the snowflake schemasnowflake schema
41
Rushdi Shams, Dept of CSE, KUET
Step 4: Star SchemaStep 4: Star Schema
42
Rushdi Shams, Dept of CSE, KUET
Step 4: Star SchemaStep 4: Star Schema
43
Rushdi Shams, Dept of CSE, KUET
Surrogate Key: Important Key in Data Surrogate Key: Important Key in Data WarehouseWarehouse
A customer is recognized in table 1 by customer nameA customer is recognized in table 1 by customer name The same person in table 2 is recognized by telephone The same person in table 2 is recognized by telephone
numbernumber The same person in table 3 is recognized by SSN numberThe same person in table 3 is recognized by SSN number If you have to make table 1, 2, 3 as dimension tables, If you have to make table 1, 2, 3 as dimension tables,
then the fact table will not be able to recognize the then the fact table will not be able to recognize the same person having 3 foreign keys from those tablessame person having 3 foreign keys from those tables
44
Rushdi Shams, Dept of CSE, KUET
Surrogate Key: Important Key in Data Surrogate Key: Important Key in Data WarehouseWarehouse
45
Rushdi Shams, Dept of CSE, KUET
Understanding the Fact TableUnderstanding the Fact Table
Facts are numeric valuesFacts are numeric values Facts are not the foreign key fieldsFacts are not the foreign key fields The foreign keys are used to provide more The foreign keys are used to provide more
detail with a fact- which are the main focus of detail with a fact- which are the main focus of the businessthe business
46
Rushdi Shams, Dept of CSE, KUET
ReferenceReference
Beginning Database Design by Gavin Beginning Database Design by Gavin Powell, Wrox Publications, 2005Powell, Wrox Publications, 2005
47