dw concepts ii
TRANSCRIPT
-
7/27/2019 DW Concepts II
1/52
Data Warehousing.
Data Warehouse basic concepts
Data Warehouse Approach
Data Modeling concepts
OLAP (Online Analytical Processing)
Data Warehouse Implementation
Next steps in Data Warehousing
March 2005
-
7/27/2019 DW Concepts II
2/52
Data Warehouse- Concepts
Module 1
Data Warehouse basic concepts
-
7/27/2019 DW Concepts II
3/52
What is DSS?
Decision Support System
Mainly used by business to take some strategic decisions basedon the trends (comparing current fiscal to previous) and projectthe numbers based on history and some parameters
Not to run the business, OLTP systems takes care of the day today activities of a business. Example SAP Order Managementtakes care of the orders which an organization gets. In the DSSwe collect all the data to do the analysis.
-
7/27/2019 DW Concepts II
4/52
Advantages of DSS
A grocery store chain in US gives various information from DSSdirectly to store manager. Example, the system can predict the aparticular stock outage in the store. Based on the history systemknows for every 3 hours there should be sale on one particularitem, if the DSS system did not see a transaction from last 3hours it sends an SMS to current shifts manager mobile. Thatsthe level you can go with the DSS. It takes time to get there.(Now this DW is called Active DW)
One of the retail major does the customer profiling, store salesanalysis etc on the data warehouse, its implemented on
Teradata.
-
7/27/2019 DW Concepts II
5/52
OLTP
Online Transaction processing system
Examples of OLTP systems are order management,payroll etc
Always follows 3
rd
normal form, while designing thedatabase
All the Data Manipulation Language (DMLInsert,Update, Delete, Select) types are active
Deal with specific data (customer x, product z etc)
-
7/27/2019 DW Concepts II
6/52
OLTP vs DSS
More DML operations(Update, Delete,Inserts)
Point Queries
Very specific whileissuing queries
Less history(approximately 6months to 1 year)
Used for day todayactivities (must to runthe business)
No change in the data (Noupdates and deletes)
Queries based on time
period, set of products, set
of customers etc
Maintains the history.
Used mainly for analytics
(trend analysis, customer
behavior etc)
-
7/27/2019 DW Concepts II
7/52
General DSS Architecture
Source Data
OLTP 1
OLTP 2
Market Place
Web clicks
Data
Warehouse
Database
Database
Pre
Defined
Reports
Ad hocReporting
OLAP
Cubes
ODS
Staging
DB
ETL
(Tool or
TSQL)
Close the loop (write back to OLTP about the findings in DSS
Data
Mining
-
7/27/2019 DW Concepts II
8/52
-
7/27/2019 DW Concepts II
9/52
Example for a DSS
OLTP 1OLTP 2
OLTP 3 OLTP 4
Data
WarehouseOLAP
Reporting
Analytics
-
7/27/2019 DW Concepts II
10/52
Data Warehouse- Concepts
Module 2
Data Warehouse Approach
-
7/27/2019 DW Concepts II
11/52
Distributed Approach
Various departments can start creating
different data marts. Each can start working
independently and see the ROI in a short
span. In the long run integrating these data adds
the complexity and Cost will be higher as
there are more systems to maintain.
-
7/27/2019 DW Concepts II
12/52
Gives only part ofthe answer
Requires time andeffort to put thepieces together
No guarantee its
the right answer
Distributed Approach
How We Are Different
-
7/27/2019 DW Concepts II
13/52
Centralized Approach
Centralized data warehouse contains the data in one place, easyto answer any business question. In the long run this has thecost advantage over the non-centralized data warehouse. Notvery easy to implement as it needs more time and resources.ROI wont be seen until the implementation is completed. So
recommended approach is to implement the centralized datawarehouse is, start with one subject area and keep adding onesubject area at a time, this way organization will get the see theROI at various stages.
-
7/27/2019 DW Concepts II
14/52
Delivers oneversion of
the truthfor
increasedconfidenceand speed indecision-making
Centralized Approach to DSS
How We Are Different
-
7/27/2019 DW Concepts II
15/52
Data Warehouse- Concepts
Module 3
Data Modeling concepts
-
7/27/2019 DW Concepts II
16/52
Data Modeling
OLTP3rdNormal Form
DSSDimensional Modeling
(Star Schema
Snow flake schema)
Why Dimensional Modeling?
-
7/27/2019 DW Concepts II
17/52
Data Modeling for OLTP
Usually 3rdnormal form.
Advantages : Flexibility to modify for the
changes. No redundancy of the data in the
model.
Disadvantages : Complex queries to
generate the reports as the number of tables
to join are usually high.
-
7/27/2019 DW Concepts II
18/52
Dimensional Modeling for DSS
Star Schema, Snowflake schema
Based on RDBMS we have to choose what type of model suitsbetter.
Example: Teradata is an RDBMS which can give the results inreasonable time as its a parallel processing database engine in
the market. So we can design the Enterprise data model in the3rdnormal form. But we cant have the same approach for SQLserver or Oracle, we should think of denormalizing the datamodel.
Star Schema makes queries run faster as the number of tables tojoin is less.
In star schema all the hierarchies defined per dimension will bestored in single table. So the data redundancy is high. In snowflake we can have one more table for the hierarchy. Thats thedifference between the star schema and snow flake schema.
-
7/27/2019 DW Concepts II
19/52
Star Schema
Star schema is optimized for queries. You will
have some level of redundant data available
in star schema based data model.
-
7/27/2019 DW Concepts II
20/52
Star Schema (in RDBMS)
-
7/27/2019 DW Concepts II
21/52
Star Schema Example
-
7/27/2019 DW Concepts II
22/52
Star Schema with Sample Data
-
7/27/2019 DW Concepts II
23/52
Snow flake
Snow flake wont have much of redundantdata as most of the dimensions will have alook table. This way the number of joins
between the tables will become more. Both have advantages and dis advantages,
so analyze the end users requirements andspace constraints to pick the best.
-
7/27/2019 DW Concepts II
24/52
The Snowflake Schema
STORE KEYStore Dimension
Store Description
City
District ID
State
District_ID
District Desc.
Region_ID
Region_ID
Region Desc.
Regional Mgr.
STORE KEY
PRODUCT KEY
PERIOD KEY
Dollars
Units
Price
Store Fact Table
-
7/27/2019 DW Concepts II
25/52
The Snowflake Schema No LEVEL in dimension tables
Dimension tables are normalized by decomposing at the attribute level
Each dimension table has one key for each level of the dimensions
hierarchy
The lowest level key joins the dimension table to both the fact table and
the lower level attribute table
-
7/27/2019 DW Concepts II
26/52
The Snowflake Schema
Additional features: The original Store Dimension table, completely de-
normalized, is kept intact, since certain queries can benefit by its all-
encompassing content.
In practice, start with a Star Schema and create the snowflakes withqueries. This eliminates the need to create separate extracts for each
table, and referential integrity is inherited from the dimension table.
Disadvantage: Complicated maintenance and metadata, explosion in the number
of tables in the database
-
7/27/2019 DW Concepts II
27/52
ETL (EExtract)
ExtractGetting data out of the source systems.This may be just a DTS package which pulls thedata, or exporting a table to a flat file in the sourcesystem.
In Teradata we have Fast Export utility where wecan export the data to a flat file.
In Oracle we have SQL*Loader to export the data toa flat file.
In SQL Server we can use a DTS package to do the
same job
-
7/27/2019 DW Concepts II
28/52
ETL (TTransform)
TransformIts not necessary to have the samedata model in source and destination. When thedata model is different from source obviously wehave to modify the source data to destinations datamodel. This process is called transformation.
Example : When we receive data from variousdistis about the reseller information we wont get thegeo information. So in the transformation logic wewill have some code which assigns the respective
geo based on the country from which you aregetting the data.
This is the simple example on transformation.
-
7/27/2019 DW Concepts II
29/52
ETL (LLoad)
LoadLoding the transformed data into thedestination datamoel (data warehouse).
As there are export functionality available in eachRDBMS there is an utility to import the data into the
database. TeradataFast Import
OracleSQL*Loader
Sybase - bcp
-
7/27/2019 DW Concepts II
30/52
Data Refresh in DSS
We have to refresh the data in DSS from varioussource systems in timely manner.
While doing so, either we should do a full refresh ofa particular table or capture only the changed data
(this process is called delta) Usually for fact tables we go for delta refresh and for
dimension tables we go for full refresh. As theenvironment is getting bigger and bigger almost allthe tables will become delta loads.
-
7/27/2019 DW Concepts II
31/52
Data Warehouse- Concepts
Module 4
OLAP (Online Analytical Processing)
-
7/27/2019 DW Concepts II
32/52
What is OLAP?
What is OLAP? Online Analytical Processing. Viewing data in a
multi dimensional way.
Why OLAP? Slice and dice for data warehouse.
RDBMS is a 2 dimensional way of storing /
viewing the data OLAP is a multi dimensional way of storing /
viewing the data
-
7/27/2019 DW Concepts II
33/52
Types in OLAP?
Three types of OLAP in the industry.
1. MOLAPMulti dimensional OLAP (Ex
MSOLAP, Essbase, Cognos).
2. ROLAPRelational OLAP ( Ex Business
Objects, Microstrategy).
3. HOLAPHybrid OLAP
-
7/27/2019 DW Concepts II
34/52
Aggregates
sale prodId storeId date amt
p1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44
p1 s2 2 4
Add up amounts for day 1
In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
81
-
7/27/2019 DW Concepts II
35/52
Aggregates
Add up amounts by day
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
ans date sum
1 81
2 48
sale prodId storeId date amt
p1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44
p1 s2 2 4
-
7/27/2019 DW Concepts II
36/52
Another Example
Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId date amt
p1 1 62
p2 1 19
p1 2 48
drill-down
rollup
sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4
-
7/27/2019 DW Concepts II
37/52
Example
Produc
t
TimeM T W Th F S S
Juice
Milk
Coke
Cream
Soap
Bread
NYSF
LA
10
34
56
32
12
56
56 units of bread sold in LA on M
Dimensions:
Time, Product, Store
Attributes:
Product (upc, price, )
Store
Hierarchies:
ProductBrand
DayWeekQuarter
StoreRegionCountryroll-up to week
roll-up to brand
roll-up to region
-
7/27/2019 DW Concepts II
38/52
Summary of Operations Aggregation (roll-up)
aggregate (summarize) data to the next higher dimension element e.g., total sales by city, year total sales by region, year
Navigation to detailed data (drill-down)
Selection (slice) defines a subcube
e.g., sales where city =Gainesville and date = 1/15/90
Calculation and ranking
e.g., top 3% of cities by average income
Visualization operations (e.g., Pivot)
Time functions
e.g., time average
-
7/27/2019 DW Concepts II
39/52
Data Warehouse- Concepts
Module 3
Data Warehouse Implementation
Steps
-
7/27/2019 DW Concepts II
40/52
Typical Approach
Data Modeling is a cyclic process involving thefollowing steps
Requirement Gathering
Requirement Analysis
Requirement Validation Logical Modeling
Physical Design
Implementation
ValidationThe above cycle repeats for any upgrades or
enhancements
-
7/27/2019 DW Concepts II
41/52
Requirement Gathering
Identify the Business objectives
Identify the reporting requirements
Identify the frequency of report generation
Granularity of Information
Business rules
-
7/27/2019 DW Concepts II
42/52
Requirement Analysis
Study the requirements captured
Identify the subject areas
Identify the Measures and criteria fields
Identify the granularity of information
required
-
7/27/2019 DW Concepts II
43/52
Requirement Validation
Validate the analysis with the customer
Document Sign off.
-
7/27/2019 DW Concepts II
44/52
Logical Modeling
Identify facts and dimensions
Create Logical Model
-
7/27/2019 DW Concepts II
45/52
Physical Design
Analyze Source Systems with respect to Logical Model
Data Quality Analysis
Physical Design
Data type
Indexes
Partitioning
Database creation etc.,
Source to target mapping
Capture Transformation rules
Capture Derivation rules for derived fields
-
7/27/2019 DW Concepts II
46/52
Implementation
Database Creation
Staging Design (Design Extraction Jobs)
Develop ETL Jobs
Unit testing of ETL Jobs
Schedule Jobs Test Load
Data Validation
Performance monitoring
ETL Job tuning
Test Database performance tuning Final loading of data from source to target
-
7/27/2019 DW Concepts II
47/52
Data Warehouse- Concepts
Module 5
Next steps in Data Warehousing
-
7/27/2019 DW Concepts II
48/52
Data Mining
Literally, the purpose is to exploit raw data clusters as if they were gold mines,that is, to look for treasures buried inside. Actually, companies are trying tocomputerize the discovery proceeding of the various existing trends within largeamounts of data.
allowing the forecast of future behaviours.
Mining tools provides the sophisticated algorithms to find the specific trends with
the data available.
Example : MS Analysis Server provides the following algorithms. (Decision Treeand Clustering etc)
-
7/27/2019 DW Concepts II
49/52
Difference between OLAP and Data
Mining
OLAP
Who were my 10 best
customers last year
What was the response
rate to our mailing?
Data Mining
Which 10 customers offer
me the greatest profit
potential
What is the profile ofpeople who are likely to
respond to future
mailings?
-
7/27/2019 DW Concepts II
50/52
Business Activity Monitoring (BAM)
BAM is the technology which is used to monitor the DW or OLTPactively for certain value.
The system can run the set of process when it finds theexception and sends the information to relevant owners to takethe action.
Based on the findings immediately update the relevant OLTPsystem (conceptually its called closing the loop with DSS andOLTP)
Example - INFORAY is a BAM tool which you can use on theDW.
-
7/27/2019 DW Concepts II
51/52
Q & A
-
7/27/2019 DW Concepts II
52/52
Thank You