dwh spring2012 lectureslides week1&2
TRANSCRIPT
Dr. Abdul Basit Siddiqui Assistant Professor
FUIEMS (Lecture Slides Week # 1)
Approach of the Course Develop an understanding of the underlying RDBMS
concepts.
Apply these concepts to VLDB / DSS environments and understand where and why they break down?
Expose the differences between RDBMS and Data Warehouse in the context of VLDB.
Provide the basics of DSS tools such as OLAP, Data Mining and demonstrate their applications.
Demonstrate the application of DSS concepts and limitations of the OLTP concepts through lab exercises.
FUIEMS Data Warehoue & Mining - Spring 2012 2
Summary of the Course Introduction & Background De-Normalization Online Analytical Processing (OLAP) Dimensional Modeling Extract-Transform-Load (ETL) Data Quality Management (DQM) Parallelism, Join and Indexing Techniques Data Mining Concepts Data Cleansing Association Rule Mining Clustering Classification
FUIEMS Data Warehoue & Mining - Spring 2012 3
Books Reference Books
W. H. Inmon, Building the Data Warehouse, John Wiley & Sons Inc., NY
R. Kimball, The Data Warehouse Toolkit, John Wiley & Sons Inc., NY
Paulraj Ponniah, Data Warehousing Fundamentals, John Wiley & Sons Inc., NY
FUIEMS Data Warehoue & Mining - Spring 2012 4
Why this Course? The World is changing / (in fact changed) Either change or Be left behind.
Missing the opportunities or going in the wrong direction has prevented us from growing.
What is the right direction? harnessing the data, in the knowledge driven economy.
Doing what can’t be or difficult to automate.
FUIEMS Data Warehoue & Mining - Spring 2012 6
Historical Overview
1960: Master Files and Reports
1965: Lots of Master Files
1970: Direct Memory Access and DBMS
1975: Online High Performance Transaction Processing
1980: PCs and 4GL Technology (MIS/DSS)
1985: Extract Programs, Extract Processing
1990: The Legacy System’s Web
FUIEMS Data Warehoue & Mining - Spring 2012 7
The Need of the Time
drowning in data AND/BUT starving for information.
Knowledge is power BUT Intelligence is absolute/super power.
FUIEMS Data Warehoue & Mining - Spring 2012 8
The Need of the Time
FUIEMS Data Warehoue & Mining - Spring 2012 9
Data
Information
Knowledge
Intelligence
POWER
($/£)
Scenario 1
FUIEMS Data Warehoue & Mining - Spring 2012 10
ABC Pvt Ltd is a company with branches at
Karachi, Quetta, Peshawar and Lahore. The Sales
Manager wants quarterly sales report. Each
branch has a separate operational system.
Scenario 1 : ABC Pvt Ltd.
FUIEMS Data Warehoue & Mining - Spring 2012 11
Karachi
Quetta
Peshawar
Lahore
Sales
Manager Sales per item type per branch
for first quarter.
Solution 1:ABC Pvt Ltd.
Extract sales information from each database.
Store the information in a common repository at a single site.
FUIEMS Data Warehoue & Mining - Spring 2012 12
Solution 1:ABC Pvt Ltd.
FUIEMS Data Warehoue & Mining - Spring 2012 13
Karachi
Quetta
Peshawar
Lahore
Data
Warehouse
Sales
Manager
Query &
Analysis tools
Report
Scenario 2
FUIEMS Data Warehoue & Mining - Spring 2012 14
One Stop Shopping Super Market has huge
operational database. Whenever Executives wants
some report, the OLTP system becomes slow and
data entry operators have to wait for some time.
Scenario 2 : One Stop Shopping
FUIEMS Data Warehoue & Mining - Spring 2012 15
Operational
Database
Data Entry Operator
Data Entry Operator
Management Wait
Report
Solution 2
Extract data needed for analysis from operational database.
Store it in warehouse.
Refresh warehouse at regular interval so that it contains up to date information for analysis.
Warehouse will contain data with historical perspective.
FUIEMS Data Warehoue & Mining - Spring 2012 16
Solution 2
FUIEMS Data Warehoue & Mining - Spring 2012 17
Operational
database
Data
Warehouse
Extract
data
Data Entry
Operator
Data Entry
Operator
Manager
Report
Transaction
Scenario 3
FUIEMS Data Warehoue & Mining - Spring 2012 18
Cakes & Cookies is a small, new company. President
of the company wants his company should grow. He
needs information so that he can make correct
decisions.
Solution 3 Improve the quality of data before loading it into the
warehouse.
Perform data cleaning and transformation before loading the data.
Use query analysis tools to support adhoc queries.
FUIEMS Data Warehoue & Mining - Spring 2012 19
Solution 3
FUIEMS Data Warehoue & Mining - Spring 2012 20
Query and Analysis
tool President
Expansion
Improvement
sales
time
Data
Warehouse
Case Study
AFCO Foods & Beverages is a new company which produces dairy, bread and meat products with production unit located at Gujranwala.
There products are sold in all the region of Pakistan.
They have sales units at provincial Head Quarters.
The President of the company wants sales information.
FUIEMS Data Warehoue & Mining - Spring 2012 21
Sales Information
January February March April
14 41 33 25
FUIEMS Data Warehoue & Mining - Spring 2012 22
Report: The number of units sold.
113
Report: The number of units sold over time
Sales Information
Jan Feb Mar Apr
Wheat Bread 6 17
Cheese 6 16 6 8
Swiss Rolls 8 25 21
FUIEMS Data Warehoue & Mining - Spring 2012 23
Report : The number of items sold for each product with
time
Product
Sales Information
Jan Feb Mar Apr
Karachi Wheat
Bread
3 10
Cheese 3 16 6
Swiss Rolls 4 16 6
Lahore Wheat
Bread
3 7
Cheese 3 8
Swiss Rolls 4 9 15
FUIEMS Data Warehoue & Mining - Spring 2012 24
Report: The number of items sold in each City for each
product with time
Product
Tim
e
Sales Information
FUIEMS Data Warehoue & Mining - Spring 2012 25
Report: The number of items sold and income in each region for
each product with time.
Jan Feb Mar Apr
Rs U Rs U Rs U Rs U
Karachi Wheat Bread 7.44 3 24.80 10
Cheese 7.95 3 42.40 16 15.90 6
Swiss Rolls 7.32 4 29.98 16 10.98 6
Lahore Wheat Bread 7.44 3 17.36 7
Cheese 7.95 3 21.20 8
Swiss Rolls 7.32 4 16.47 9 27.45 15
Data Warehousing includes
Build Data Warehouse
Online Analysis/Analytical Processing (OLAP).
Presentation.
FUIEMS Data Warehoue & Mining - Spring 2012 26
RDBMS
Flat File
Presentation
Cleaning ,Selection &
Integration
Warehouse & OLAP server Client