introduction to data warehousingcsci253/presentations s12/data warehousin… · a data warehouse is...
TRANSCRIPT
Introduction to Data Warehousing
What is Data Warehouse ?
What is Data Warehouse ?
Data warehouse (DW) is a database mainly used to facilitate
reporting and analysis in businesses.
What is Data Warehouse ? The definition of the data warehouse focuses on data storage. The main
source of the data extracted is cleaned, transformed, cataloged and
made available for use by managers and other business professionals
for data mining, online analytical processing, market research and
decision support.
What is Data Warehouse ?
In other words many large organizations are data rich but information
poor. So Data Warehousing reverses the situation by turning the
organization to be more information rich.
What is Data Warehouse ?
It combines data from multiple and varied sources into one
comprehensive and easily manipulated database.
What is Data Warehouse ? The data stored in the warehouse are uploaded from several operational DB
systems (such as marketplace, sales etc) and integrated into one large DB.
Then it can be subdivided into small group units of data called Data Marts
allowing the user to choose the source and type of data depending on current
needs.
Why Data Warehouse ?
Scenario: ABC Pvt Ltd.
ABC Pvt Ltd is a company with several branches in
different cities. The Sales Manager wants a quarterly sales
report for the whole company. Each branch has it’s own
separate operational system.
Scenario: ABC Pvt Ltd.
City A
City B
City C
City D
Sales
Manager Report on sales per item
type for first quarter.
Solution: Date Warehousing
Extract sales information from each database.
Store the information in a common database at a
single warehouse.
Refresh and update warehouse at regular intervals so that it contains up to date information for analysis.
Retrieve and analyze information from warehouse as it contains all data with a historical perspective.
City A
City B
City C
City D
Data
Warehouse
Sales
Manager
Query &
Analysis tools
Report
Why is Data Warehousing special ?
Operational vs. Warehouse System
A data warehouse is a
- Subject-oriented,
- Integrated,
- Time-variant,
- Nonvolatile
collection of data in support of the management’s decision making
process.
Subject-oriented
Data warehouse can organized around subjects such as sales, products
and customers.
For example, to learn more about your company's sales data, you can
build a warehouse that concentrates on sales. Using this warehouse,
you can answer questions like “Who was our best customer for this
item last year? ” or “Which employee made the most sales last
month?”
It excludes data not useful in decision support process.
Integration
Data Warehouse is constructed by integrating multiple heterogeneous
sources with different data types, such as relational databases, OLTP
files, Flat files, and refined into one common coding scheme.
RDBMS
OLTP
Data
Warehouse
Flat File Data Processing
Data Transformation
Time-variant
Provides information from historical perspective
Every data key structure contains either implicitly or explicitly an
element of time.
A data warehouse generally stores data that is 5-10 years old, to be
used for comparisons, trends, and forecasting.
Nonvolatile
A Data Warehouse is always a physically separate store of data.
Due to this separation, data warehouses do not require transaction processing, recovery, or backup. The data are not updated or changed in any way once they enter the data warehouse, but are only loaded, refreshed and accessed for queries.
Operational vs. Warehouse System
Features Operational Warehouse
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database
professional
Knowledge workers
Function Day to day operation Decision support
Data Current Historical
View Detailed,flat relational Summarized,
multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Mostly read
Operational vs. Warehouse System
Features Operational Warehouse
Focus Data in Information out
Number of records
accessed
tens millions
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Priority High performance,high
availability
High flexibility, end-
user autonomy
The component of the data
warehouse
Two-Tier Architecture
For the dimensional analysis the client software may
require the data to be well structured as a star schema that
simplifies both the software operations and users view for
the data
Three-Tier Architecture
Adding application server between the client and the
warehouse to improve performance and reduce networking
traffic
It will manage the interaction with the warehouse , perform
the calculations and send the results to the client
Three-Tier (MDD) Architecture
The Multi Dimensional Database can store the data in
special structure designed to facilitate the dimensional
analysis
The application for Data
Warehousing
Data warehousing tools / Extract ,
Transform , Load
Summary of the processes the applications do
Data
Extraction and Transformation
Normalized Changed into
Relational Data &
Data
Cleaning
AT&T
ATT
AT and T
are all different
spelling of the
same name
ATT
Stored in
Data warehouse
SQL
End User
Middle
ware
http://www.youtube.com/watch?v=ripMMhMOL0s
Thank You