data warehousing tyler helmle. introduction definition history processes and terms management...

54
Data Warehousing Tyler Helmle

Upload: august-hubbard

Post on 02-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Data Warehousing

Tyler Helmle

Introduction

Definition History Processes and Terms Management Techniques

Definition

Collection of Data Subject Oriented Integrated Time-Variant Non-Volatile

To Support Managements Decision Making Process

Subject Oriented

Particular Subject Instead of Companies Ongoing Operations

Integrated

Gathered from variety of Sources Merged into a Whole

Time-Variant

Specific Time Period Monthly Weekly Daily Etc.

Non-Volatile

Stable Never Remove Data Consistent Picture of Business

Definition

Bill Inmon 1990

Single Subject Data Mart

Possible to be Volatile Size Contraints Rolling Lengths

Definition

Ralph Kimball “a copy of transaction data specifically structured

for query and analysis.” Less insight Less depth Not Less Accurate

History

Concept Originated (Late 1980’s) IBM Researchers Barry Delvin and Paul Murphy “Business Data Warehouse” Operational Systems to Decision Support

History

Trends Corporations had multiple Decision Support

Systems Expensive Redundant

Get to one efficient warehouse Data Marts

History

Bill Inmon “Father of the Data Warehouse” Building of the Data Warehouse

1991 Taught

Data Warehouse

Design

Designing and Rolling Out Warehouse Surajit Chaudhuri and Umeshwar Dayal

Design

Define the architecture, do capacity planning, and select the storage servers, database and OLAP servers, and tools

Integrate the servers, storage, and client tools

Design the warehouse schema and views Define the physical warehouse organization,

data placement, partitioning, and access methods

Design (Cont.)

Connect the sources using gateways, ODBC drivers, or other wrappers

Design and implement scripts for data extraction, cleaning, transformation, load, and refresh

Populate the repository with the schema and view definitions, scripts, and other metadata

Design (Cont.)

Design and implement end-user applications Roll out the warehouse and applications

Design (Cont.)

Database Designs Multidimensional Approach Normalized Approach

Data Warehouse

Raw Data

Business will Decide Department? Market Data? Customer Data? Data Granularity

Data Warehouse

Data Extracting

Extract from Source Systems Database Flat Files

Converts into format for transformation

Data Transforming

Data Validation Difficult Common Problems Occur

Data Transforming

Selection of Columns Translation of Data

Make all data consistent Derive New Calculated Value Filtering Sorting Joining of Data

Data Transforming

Transposing Splitting Columns

Data Transforming Tools

Data Migration Transformation of Data

Data Scrubbing Domain Specific Knowledge Fixing or Eliminating Data

Data Auditing Patterns Inconsistencies

Data Load

Batch Processing Visibility and Management Administrator

Start, Cancel, Suspend, Resume Load Large Quick and Efficient

Data Load

Parallel Processing Splitting Data Files to provide Parallel Access Pipeline

Simultaneous running of several components Component

Simultaneous running of multiple processes

All three types usually operate at same time

Data Refresh

When? How?

Data Warehouse

Data Mart

Subset of Data Major Data Subject Increase Performance Separate Security Prove ROI before adding to Data Warehouse

Data Mart

Ease of Creation Lower Cost than Data Warehouse

Data Mart

Cons Limited Scalability Duplication of Data Data Inconsistency Low ability to leverage enterprise sources of data

Data Warehouse

Metadata

“Data about Data” Clarifies Data Explains Data Helps Navigate Through Data Facilitates (Both Human and Computers)

Understanding Usage Management

Metadata

Different from Data Sometimes can be both Point of view

Types Descriptive Administrative Structural Technical Use

Metadata

Content Describe Resource Describe Content

Mutability Immutable Mutable

Logical Function Sub-symbolic Symbolic Logical

Metadata

Back End Extract, Transform, Load

Front End Label Screens Create Reports

Data Warehouse

End Product

Analysis Querying Reporting Data Mining

Data Mining

Requires Large Amount of Data Determine Patterns in Data Determine Relationships in Data Analyze Results and Conclude

Standard Reporting

Weekly/Monthly/Yearly Same Constant Data Recent Information Arrives at Expected Time on Expected

Medium

Configurable Reports

Select Parameters to Show on Report Ex.

Time Period Region

Support of Data Remains the Same

Ad Hoc Reporting

End Users Pick and Choose Data Requires

Training Knowledge of Data

Do Not Want Users Miss Using the Data

Dashboards / Scorecards

Like Standard Reports Summarized Easy to Read Graphical

Data Warehouse

Data Warehouse Management

Data Warehouse Administrator Visibility Knowledge of what is happening

Loading, Extracting, Etc.

Data Warehouse Management

Responsibilities What is data being used for? Who is using the data? What is the response time? What kinds of activities are being submitted?

Data Warehouse Management

Data Management (Data Monitor) Counts of Data Analyzes Profiles of Data Determines Threshold of Data

Makes DWA’s job easier

Data Warehouse Management

Security Confidential? Access Restriction Encryption Decryption

Pros

One Common Data Source Ease of Reporting and Analysis

Inconsistencies Identified Right Away Retrieve Data Without Slowing Down

Operational Systems Show Actual Performance

Cons

Data Is Not Always Most Current High Costs

Installation Maintenance Support

Hard To Stay Ahead

Summary

Data Warehousing can prove to be a success Common Knowledge of a database

technology As technology and research continues, the

possibilities are endless

Questions?