chapter 5 data warehousing study sections 5.2, 5.3, 5.5, pages:231-233 & snowflake schema

16
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages:231-233 & Snowflake s

Upload: augusta-newton

Post on 14-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Chapter 5

DATA WAREHOUSING

Study Sections 5.2, 5.3, 5.5, Pages:231-233 & Snowflake schema

Business Intelligence• Companies collect a large amount of data from their business

operations.

• To keep track of that information, a business uses disparate software applications , such as Excel, Access, etc.

• Using multiple software makes it difficult to retrieve information in a timely manner and to perform analysis of the data.

• Business Intelligence (BI) represents the tools and systems that play a key role in integrating and analyzing all corporate data.

• • Generally illustrates intelligence in the areas of customer profiling,

market research, product profitability (by product, region, year), etc.

BI Architecture

Consists of 3 system components

– Data warehouse – Business analytics– Performance management (BPM)

Data warehouse

– A repository of cleaned and integrated historical /stable data for the entire business– Extracted from independent databases (internal & external)– Transformed (ie. cleaned and reformatted)

- A subset of a warehouse limited to a business function is called a Data Mart (eg. Sales).

Business analytics

– Reporting and queries • Multi-dimensional reports, eg. Pivot tables [see Exercise 8];• SQL Queries [Exercise 9]• Cube analysis [Chapter 6]

– Data, text and Web mining and other sophisticated mathematical and statistical tools for searching relationships [Chapter 7]

These are tools that help analyze the data towards finding solutions:

• Dashboard reports

• Production reports

Business Performance Management (BPM)

BPM supports monitoring, measuring, and comparing of sales, profit, cost, profitability, and other performance indicators

Tra

nsac

tion

Dat

a S

yste

ms

Dat

a S

ourc

e =

DW

Dat

a S

ourc

e V

iew

s

Ana

lytic

Too

lsS

QL,

Cub

es

Extraction, Transformation, and Load (ETL) Process

A data warehousing process consists of :

• Extraction (i.e., reading data from a database),

• Transformation (i.e., converting the extracted data from its previous form into the form in which it needs to be so that it can be placed into a data warehouse), and

• Load (i.e., storing the data into the data warehouse)

Data Integration and the Extraction, Transformation, and Load (ETL) Process

ETL

Data from multiple Sources

Newly integrated schema for the Data Warehouse

DW Schema Structures: Star

Note that data is un-normalized

DW Schema Structures: Snowflake

Note that data is normalized

Designing Fact Tables: Normalization• Normalization is the process of gathering attributes into tables to

eliminate redundant data (the redundancy here is EquipID EquipType)

Normalization Exercise

• FIRST (Supplier#, City, CityCode, Part#, Qty)

• Split the table into 3 different tables:– (Supplier#, City)– (City, CityCode)– (Supplier#, Part#,Qty)

Although normalized databases have less data redundancies, they are less efficient in quickly processing the data.

Hence, many DWs use Star schema.

Data Marts provide ‘views’ of the data in the Data Warehouse(we will be working with this in our SQL exercises)

Summary