overview of data warehousing
TRANSCRIPT
-
8/14/2019 Overview Of Data Warehousing
1/62
SUSHILSUSHIL
KULKARNIKULKARNI
OVERVIEW OFOVERVIEW OF
DATA WAREHOUSINGDATA WAREHOUSING
-
8/14/2019 Overview Of Data Warehousing
2/62
-
8/14/2019 Overview Of Data Warehousing
3/62
SUSHIL KULKARNI
DATABASEDATABASE
VSVSDATA WAREHOUSEDATA WAREHOUSE
-
8/14/2019 Overview Of Data Warehousing
4/62
DATABASEDATABASE
VSVS
DATA WAREHOUSEDATA WAREHOUSE
SUSHIL KULKARNI
To accelerate decision making:
1. Right information,
2. Right time,
3.Easily accessible
Problems with database
1. Fragments
2. Operational / Information
Processing
-
8/14/2019 Overview Of Data Warehousing
5/62
SUSHIL KULKARNI
PROBLEMS !PROBLEMS !
-
8/14/2019 Overview Of Data Warehousing
6/62
PROBLEMSPROBLEMS
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
7/62
PROBLEMS !PROBLEMS !
SUSHIL KULKARNI
IT business requires:
1. integrated
2. company-wide view of high quality
3. Fixed network with changing users
Informational Processing systemsdepartment must be separated fromoperational systems to improveperformance
-
8/14/2019 Overview Of Data Warehousing
8/62
PROBLEMS !PROBLEMS !
SUSHIL KULKARNI
No single system of data.
View of databases as a whole is difficult
Organization wants to analyze theactivities in a balance way
Customer relationship with management
-
8/14/2019 Overview Of Data Warehousing
9/62
SUSHIL KULKARNI
DEFINITIONDEFINITION
OFOFDATA WAREHOUSEDATA WAREHOUSE
-
8/14/2019 Overview Of Data Warehousing
10/62
SOSO WHAT ISWHAT IS
DATA WAREHOUSE ?DATA WAREHOUSE ?
SUSHIL KULKARNI
Subject-oriented:
customers, patients, students,products, time.
Integrated: Gathered CENTRALLY from
1.several internal systems of records
2. sources external to the organization
-
8/14/2019 Overview Of Data Warehousing
11/62
WHAT ISWHAT IS
DATA WAREHOUSE ?DATA WAREHOUSE ?
SUSHIL KULKARNI
Time - variant:
Use to study trends and changes.
Non - updatable:
cannot updated by end users.
-
8/14/2019 Overview Of Data Warehousing
12/62
SUSHIL KULKARNI
DATA SYSTEMSDATA SYSTEMS
-
8/14/2019 Overview Of Data Warehousing
13/62
SUSHIL KULKARNI
DATA SYSTEMS AVAILABLE
INFORMATION SYSTEMSOPERATIONAL SYSTEMS
-
8/14/2019 Overview Of Data Warehousing
14/62
OPERATIONAL SYSTEMSOPERATIONAL SYSTEMS
SUSHIL KULKARNI
Used to run a business in real time basedon current data and process large volumesof relatively simple read/write transactions,while providing fast response.
Examples
1. Sales order processing2. Reservation systems
3. Patient registration
-
8/14/2019 Overview Of Data Warehousing
15/62
INFORMATION SYSTEMSINFORMATION SYSTEMS
SUSHIL KULKARNI
Designed to support decision-making based on
1. Historical data
2. Prediction data.
Designed for complex queries or data-miningapplications.
Examples:
1. Sales trend analysis,
2. Customer segmentation
3. Human resources planning
-
8/14/2019 Overview Of Data Warehousing
16/62
DIFFERENCEDIFFERENCE
SUSHIL KULKARNI
Periodical batchupdates and queriesrequiring many or allrows
Many, constantupdates and querieson one or a few tablerows
Volume
Ease of flexible accessand use
Performancethroughput,
availability
Design goal
Broad, ad hoc,complex queries andanalysis
Narrow, planned, andsimple updates andqueries
Scope of usage
Managers, businessanalysts, customers
Clerks, sales-persons,administrations
Primary users
Real and analyzehistorical data.
Real time data entryPurpose
Informational
Systems
Operational
Systems
Characteristics
-
8/14/2019 Overview Of Data Warehousing
17/62
SUSHIL KULKARNI
DATA WAREHOUSEDATA WAREHOUSESCOPESCOPE
-
8/14/2019 Overview Of Data Warehousing
18/62
DATA WAREHOUSE SCOPEDATA WAREHOUSE SCOPE
SUSHIL KULKARNI
Broad :
Required for companies, Very costly, Maybe divided according to Depts.
Narrow:
Required for Personal information
-
8/14/2019 Overview Of Data Warehousing
19/62
SUSHIL KULKARNI
TYPESTYPES
OFOFDATA WAREHOUSEDATA WAREHOUSE
-
8/14/2019 Overview Of Data Warehousing
20/62
TYPES OF DATA WAREHOUSETYPES OF DATA WAREHOUSE
SUSHIL KULKARNI
Point Point
End-users allowed to get operational
databases directly using any tools
-
8/14/2019 Overview Of Data Warehousing
21/62
TYPES OF DATA WAREHOUSETYPES OF DATA WAREHOUSE
SUSHIL KULKARNI
Central Data Warehouses
-
8/14/2019 Overview Of Data Warehousing
22/62
TYPES OF DATA WAREHOUSETYPES OF DATA WAREHOUSE
SUSHIL KULKARNI
1. EIS : Executive Information System 2. DSS: Decision Support System
3. Reporting
Distributed Data Warehouse:
Certain Components of DW are distributedacross a number of different physicaldatabases
-
8/14/2019 Overview Of Data Warehousing
23/62
SUSHIL KULKARNI
BIG PICTUREBIG PICTURE
-
8/14/2019 Overview Of Data Warehousing
24/62
BIG PICTUREBIG PICTURE
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
25/62
SUSHIL KULKARNI
END USERSEND USERS
-
8/14/2019 Overview Of Data Warehousing
26/62
END USERSEND USERS
SUSHIL KULKARNI
Executives and managers
"Power" users (business and financialanalysts, engineers, etc.)
Support users (clerical, administrative,
etc.)
-
8/14/2019 Overview Of Data Warehousing
27/62
SUSHIL KULKARNI
DATA MARTDATA MART
-
8/14/2019 Overview Of Data Warehousing
28/62
DATA MARTSDATA MARTS
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
29/62
DATA MARTSDATA MARTS
SUSHIL KULKARNI
Create many DMs
Limited scope Independent ETL process or derived from
DW
Examples:
1.Financial DM
2. Marketing DM
3. Supply chain DM
-
8/14/2019 Overview Of Data Warehousing
30/62
D.M. PICTURED.M. PICTURE
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
31/62
SUSHIL KULKARNI
DATADATA
ININDATA WAREHOUSEDATA WAREHOUSE
DATA INDATA IN
-
8/14/2019 Overview Of Data Warehousing
32/62
DATA INDATA IN
DATA WARE HOUSEDATA WARE HOUSE
SUSHIL KULKARNI
one version of the truth across theenterprisewith meaning full recordesFor IT staff : clean, consistent, and
documented formatted data.
For engineer or analyst: convenient, in acommon formatted data, exportable toother common formats
DATA INDATA IN
-
8/14/2019 Overview Of Data Warehousing
33/62
Production Data: Data from different
Operational systems with heterogeneous
platforms
Internal Data: Private data of organization
like spread sheets, documents, customerprofiles
DATA INDATA IN
DATA WARE HOUSEDATA WARE HOUSE
SUSHIL KULKARNI
DATA INDATA IN
-
8/14/2019 Overview Of Data Warehousing
34/62
External Data: Data from external sources.Statistics relating to their industry
produced by external agencies
Example: DW of car rental company
contains data on the current productionschedules of the leading automobilemanufactures
DATA INDATA IN
DATA WARE HOUSEDATA WARE HOUSE
SUSHIL KULKARNI
DATA INDATA IN
-
8/14/2019 Overview Of Data Warehousing
35/62
Archived Data: Data from current business
and old data store in archive files
DATA INDATA IN
DATA WARE HOUSEDATA WARE HOUSE
SUSHIL KULKARNI
DATA INDATA IN
-
8/14/2019 Overview Of Data Warehousing
36/62
Methods of archiving data:
1. Recent data is archived to separate
archival database that may be online 2. Old data is archived to flat files on disk
storage
3. Oldest data is archived to tapecartridges or microfilms or kept off line
DATA INDATA IN
DATA WARE HOUSEDATA WARE HOUSE
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
37/62
SUSHIL KULKARNI
DATA BASEDATA BASE
STRUCTURESTRUCTURE
-
8/14/2019 Overview Of Data Warehousing
38/62
DATA BASE STRUCTUREDATA BASE STRUCTURE
SUSHIL KULKARNI
DW made up of three separate databases:
1. Interim data store
2. Meta data repository
3. Production DW
-
8/14/2019 Overview Of Data Warehousing
39/62
SUSHIL KULKARNI
OLTPOLTP
ANDANDOLAPOLAP
-
8/14/2019 Overview Of Data Warehousing
40/62
OLTPOLTP
SUSHIL KULKARNI
On line transaction processing
Standard Normalized Structure
Designed for transactions: Insert, Updates,
Delete
-
8/14/2019 Overview Of Data Warehousing
41/62
OLAPOLAP
SUSHIL KULKARNI
On line analytical processing , StarSchema [See Table]
Read Only
Historical data
Aggregated data
-
8/14/2019 Overview Of Data Warehousing
42/62
SUSHIL KULKARNI
ARCHETECTUREARCHETECTURE
ARCHITECTUREARCHITECTURE
-
8/14/2019 Overview Of Data Warehousing
43/62
ARCHITECTUREARCHITECTURE
ANDAND
END-TO - PROCESSEND-TO - PROCESS
SUSHIL KULKARNI
BACK END TOOLSBACK END TOOLS
-
8/14/2019 Overview Of Data Warehousing
44/62
BACK END TOOLSBACK END TOOLS
ANDAND
UTILITIESUTILITIES
SUSHIL KULKARNI
Tools are used to extract & loading data
Data extraction from foreign sources bygateways & interfaces
Examples: EDA/SQL, ODBC, Oracle OpenConnect, Sybase Enterprise ConnectInformix Enterprise gate way
-
8/14/2019 Overview Of Data Warehousing
45/62
PROCESS OF BRINGING DATAPROCESS OF BRINGING DATA
TO DATA WAREHOUSETO DATA WAREHOUSE
ETL PROCESSETL PROCESS
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
46/62
Large volumes of data from multiple
sources are involvedHigh probability of errors and anomalies in
the data
Tools that help to detect data anomaliesand correct them can have a high payoff
CLEANINGCLEANING
SUSHIL KULKARNI
CLEANINGCLEANING
-
8/14/2019 Overview Of Data Warehousing
47/62
Examples where data cleaning becomesnecessary are:
1. Inconsistent field lengths,
2. Inconsistent descriptions,
3. Inconsistent value assignments,
4. Missing entries and violation of
integrity constraints.Different, classes of data cleaning tools used to
extract & loading data
1. Data Migration
2. Data scrubbing
3. Data Auditing tools
CLEANINGCLEANING
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
48/62
Data migrationtools allow simple
transformation rules to be specified
Examples: replace the string genderby
sex.
Warehouse Manager from Prism is anexample of a popular tool of this kind.
DATA MIGRATIONDATA MIGRATION
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
49/62
Data scrubbingtools use domain-specific
knowledge
Example: Postal addresses, to do the
scrubbing of data.
Use parsing and fuzzy matchingtechniques to accomplish cleaning from
multiple sources.
Tools: Integrity and Trillum
DATA SCRUBBINGDATA SCRUBBING
SUSHIL KULKARNI
DATA AUDITINGDATA AUDITING
-
8/14/2019 Overview Of Data Warehousing
50/62
Data auditingtools make it possible todiscover rules and relationships by
scanning data.
Example: Tool may discover a suspicious
pattern (based on statistical analysis) that
a certain car dealer has never receivedany complaints.
DATA AUDITINGDATA AUDITING
SUSHIL KULKARNI
LOADINGLOADING
-
8/14/2019 Overview Of Data Warehousing
51/62
Additional preprocessing required:
1.Checking integrity constraints
2. Sorting; summarization, aggregation
3.Other computation to build the derived tables
stored in the warehouse
Batch load utilities are used for this purpose. Inaddition to populating the warehouse, a load utility
must allow the system administrator to monitorstatus, to cancel, suspend and resume a load, andto restart after failure with no loss of data integrity.
LOADINGLOADING
SUSHIL KULKARNI
REFRESHREFRESH
-
8/14/2019 Overview Of Data Warehousing
52/62
Refreshing a warehouse consists inpropagating updates on source data tocorrespondingly update the base data andderived data stored in the warehouse.
Two sets of issues: whento refresh, andhowto refresh.
REFRESHREFRESH
SUSHIL KULKARNI
SUMMARIZATIONSUMMARIZATION
-
8/14/2019 Overview Of Data Warehousing
53/62
Required lot of space to store and requirecomputer time as well as resources.Some of the summaries may containfigures that explain the summary.
Advantage is that the data warehouse isnot calculating the summaries.
SUMMARIZATIONSUMMARIZATION
SUSHIL KULKARNI
METADATAMETADATA
-
8/14/2019 Overview Of Data Warehousing
54/62
Administrative metadata
Business metadata includes business
terms and definitions,
Operational metadata includesinformation that is collected during theoperation of the warehouse:
METADATAMETADATA
SUSHIL KULKARNI
-
8/14/2019 Overview Of Data Warehousing
55/62
The ETL Process
Capture
Scrub or data cleansing
Transform
Load and Index
ETL = Extract, transform, and load
SSt i d t ili ti
-
8/14/2019 Overview Of Data Warehousing
56/62
Steps in data reconciliationSteps in data reconciliation
Static extract=
capturing a snapshot of
the source data at a
point in time
Incremental extractIncremental extract=
capturing changes that
have occurred since the
last static extract
Capture = extractobtaining a snapshot of a
chosen subset of the source data for loading
into the data warehouse
Steps in data reconciliationSteps in data reconciliation
-
8/14/2019 Overview Of Data Warehousing
57/62
Scrub = cleanseuses pattern
recognition and AI techniques to
upgrade data quality
Fixing errors:misspellings,
erroneous dates, incorrect field
usage, mismatched addresses,
missing data, duplicate data,
inconsistencies
Also:decoding, reformatting,
time stamping, conversion, key
generation, merging, error
detection/logging, locating
missing data
Steps in data reconciliationSteps in data reconciliation
Steps in data reconciliationSteps in data reconciliation
-
8/14/2019 Overview Of Data Warehousing
58/62
Transform = convert data from
format of operational system to
format of data warehouse
Record-level:Selection data partitioning
Joining data combining
Aggregation data
summarization
Field-level:single-field from one field to
one field
multi-field from many fields to
one, or one field to many
Steps in data reconciliationSteps in data reconciliation
Steps in data reconciliationSteps in data reconciliation
-
8/14/2019 Overview Of Data Warehousing
59/62
Load/Index = place transformed
data into the warehouse and
create indexes
Refresh mode:bulkrewriting of target data at
periodic intervals
Update mode: onlychanges in source data are
written to data warehouse
Steps in data reconciliationSteps in data reconciliation
Si l fi ld t f tiSi l fi ld t f ti
-
8/14/2019 Overview Of Data Warehousing
60/62
Single-field transformationSingle-field transformation
In general some transformation function
translates data from old form to new form
Algorithmic transformation uses a formula
or logical expression
Tablelookup another approach
Multi field transformationMulti field transformation
-
8/14/2019 Overview Of Data Warehousing
61/62
Multi field transformationMulti field transformation
M:1 from many source
fields to one target field
1:M from one
source field to
many target fields
-
8/14/2019 Overview Of Data Warehousing
62/62
T H A N K S !T H A N K S !