03 data warehousing
TRANSCRIPT
-
8/8/2019 03 Data Warehousing
1/17
Data Warehousing
Chapter # 3
Carlo Vercellis
-
8/8/2019 03 Data Warehousing
2/17
Data Warehouse data warehouse is the foremost
repository for the data available fordeveloping business intelligencearchitectures and decision supportsystems.
Data Warehousing indicates the wholeset of interrelated activities involvedin designing, implementing and usinga data warehouse.
-
8/8/2019 03 Data Warehousing
3/17
Data Sources of Data Warehouse Internal Data
External Data
Personal Data
-
8/8/2019 03 Data Warehousing
4/17
Internal Data
Transactional or Operational Systemscapturing day to day businessprocesses back-ofce systems, that collect basic
transactional records such as orders, invoices,inventories, production and logistics data;
front-ofce systems, that contain dataoriginating from call-center activities,customer assistance, execution of marketingcampaigns;
web-based systems, that gather salestransactions on e-commerce web sites, visitsto websites, data available on forms lled out
by existing and prospective customers.
-
8/8/2019 03 Data Warehousing
5/17
External Data Data present externally to an organization
For example:
Some agencies gather and make available
data relative to sales, market share andfuture trend predictions for specicbusiness industries, as well as economicand nancial indicators.
Other agencies provide data market
surveys and consumer opinions collectedthrough questionnaires.
Geographic Information Systems (GIS),which represent a set of applications foracquiring, organizing, storing and
presenting territorial data.
-
8/8/2019 03 Data Warehousing
6/17
Personal Data Information and Personal
Assessments stored inside
worksheets or local databases locatedin their computers.
-
8/8/2019 03 Data Warehousing
7/17
OLTP versus OLAP Software applications that are at the heart
of operational systems are referred to ason-line transaction processing (OLTP).
The whole set of tools aimed at performingbusiness intelligence analyses andsupporting decision-making processes goby the name of on-line analytical
processing (OLAP).
We can therefore assume that the function of a data warehouseis to provide input data to OLAP applications.
-
8/8/2019 03 Data Warehousing
8/17
-
8/8/2019 03 Data Warehousing
9/17
Data MartsFunctional or departmental data
warehouse of a smaller size and a
more specic type than the overallcompany data warehouse.
-
8/8/2019 03 Data Warehousing
10/17
-
8/8/2019 03 Data Warehousing
11/17
Architecture & Functions of a DW
-
8/8/2019 03 Data Warehousing
12/17
Data Warehouse: Implementation
Approach
Top-down. The top-down methodology is based on the overalldesign of the data warehouse, and is therefore more
systematic. However, it implies longer development times andhigher risks of not being completed within schedule since thewhole data warehouse is actually being developed.
Bottom-up. The bottom-up method is based on the use ofprototypes and therefore system extensions are madeaccording to a step-by-step scheme. This approach is usuallyquicker, provides more tangible results but lacks an overallvision of the entire system to be developed.
Mixed. The mixed methodology is based on the overall design ofthe data warehouse, but then proceeds with a prototypingapproach, by sequentially implementing different parts of the
entire system. This approach is highly practical and usuallypreferable, since it allows small and controlled steps to be
-
8/8/2019 03 Data Warehousing
13/17
ETL Tools Extraction. During the rst phase, data are extracted
from the available internal and external sources.
Transformation. The goal of the cleaning andtransformation phase is to improve the quality of the
data extracted from the different sources, through thecorrection of inconsistencies, inaccuracies and missingvalues.
Loading. Finally, after being extracted andtransformed, data are loaded into the tables of the
data warehouse to make them available to analystsand decision support applications.
-
8/8/2019 03 Data Warehousing
14/17
Dimensional Model: Star Schema
-
8/8/2019 03 Data Warehousing
15/17
Three-Dimensional Cube
-
8/8/2019 03 Data Warehousing
16/17
Hierarchies of concepts and OLAP
operations Drill-Down
Drill Up
Slice & Dice Pivot
-
8/8/2019 03 Data Warehousing
17/17
Materialization of cubes of data OLAP analyses developed by knowledge
workers may need to access theinformation associated with severalcuboids, based on the specic queries andanalyses being carried out. In order toguarantee adequate response time, it mightbe useful to design a data warehousewhere all (or at least a large portion of)values of the measures of interest
associated with all possible cuboids arepre-calculated. This approach is termedfull materialization of the informationrelative to the data cubes.