david m. kroenke’s database processing, 10th edition © 2006 pearson prentice hall 15-1 david m....
TRANSCRIPT
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-1
David M. Kroenke
Database Processing Chapter 15
Business Intelligence &
Data Warehousing
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-2
Business Intelligence (BI) Systems
• Business Intelligence (BI) systems are information systems that assist managers and other professionals:– To analyze current and past activities, and– To predict future events.
• Two broad categories:– Reporting– Data mining
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-3
The Relationship ofOperational and BI Applications
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-4
Reporting Applications
• Reporting system applications:– Filter, Sort, Group, Simple Calculations using
SQL– Classify entities (customers, products,
employees, etc.)• RFM Analysis
– Deal with critical report delivery
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-5
Data Mining Applications
• Data mining applications are used to:– Perform what-if analysis– Make predictions– Facilitate decision making
• Data mining applications use sophisticated statistical and mathematical techniques.
• Report delivery is not as critical.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-6
Need for Data Warehousing• Integrated, company-wide view of high-quality
information (from disparate databases)• Separation of operational and informational systems
and data (for improved performance)
Comparison of Operational and Informational Systems
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-7
Definitions• Data Warehouse:
– A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
– Subject-oriented: e.g. customers, patients, students, products– Integrated: Consistent naming conventions, formats, encoding
structures; from multiple data sources– Time-variant: Can study trends and changes– Non-updatable: Read-only, periodically refreshed
• Data Mart:– A data warehouse that is limited in scope
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-8
Data Warehouse vs. Data Mart
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-9
Components of a Data Warehouse
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-10
Data Warehouse and Data Marts
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-11
Data Warehouses and Data Marts:Problems of Using Transaction Data for BI
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-12
The ETL Process
• Extract/Capture– Static vs. Incremental
• Transform– Scrub or data cleansing– Data selection, joining, aggregation
• Load and Index– Refresh vs. Update
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-13
Data Warehouse ETL Sequence
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-14
Components of a Star Schema
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-15
Star Schema Example
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-16
Issues Regarding Star Schema
• Dimension table keys should be surrogate:– Keys may change over time– Length/format consistency
• Granularity of Fact Table – what level of detail? – Transactional grain – finest level– Aggregated grain – more summarized– Finer grain: better market basket analysis capability, but much more
data (more dimension tables, more rows in fact table)
• Duration of the database – how much history should be kept?– Natural duration – 13 months or 5 quarters– Financial institutions may need longer duration– Older data is more difficult to source and cleanse
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-17
Data Warehousing at Wal-Mart
• As of 2000 (Foote and Krishnamurthi, 2001)
– Held 7.5 TB, with plans to reach 24 TB (1TB = 250M pages of text)
– Kept 65 weeks of data– Had invested $4 Billion– Power users generated $12,000/query
• As of 2005 (Wall Street Journal, December 3-4, 2005)
– Held 570 TB (more than Internet’s fixed pages)– Predicted Hurricane Ivan would spur demand for easy
breakfasts• Stocked Florida stores with Pop-Tarts
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-18
Reporting Systems:RFM Analysis
• RFM Analysis analyzes and ranks customers according to purchasing patterns:– R = Recent (most recent order)– F = Frequent (how often an order is made)– M = Money (dollar amount of orders)
• Customers are sorted into five groups, each containing 20% of the customers.
• Each group is given a numerical value:– 1 = Top 20%– 2, 3, 4 = Each 20% in between top and bottom 20%– 5 = Bottom 20%
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-19
Reporting Systems:RFM Analysis (Continued)
Ajax ordered recently (1), orders often (1) but does not order the most expensive items (3) – Try to sell Ajax more expensive goods!
Bloominghams has not ordered recently (5), but has ordered often (1) and purchased the most expensive items (1).This customer may be looking for a different vendor – better call!
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-20
Reporting Systems:OnLine Analytical Processing [OLAP]
• An OLAP report has measures and dimensions:– Measure — A data item of interest.– Dimension — A characteristic of a measure.
• OLAP cube — A presentation of a measure with associated dimensions.– An OLAP cube can have any number of axes.– The terms OLAP cube and OLAP report are
synonymous.
• OLAP allows drill-down — a further division of the data into more detail.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-21
Reporting Systems:OLAP Drill Down:
Product Family by Store Type
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-22
Reporting Systems:OLAP Drill Down:
Product Family and Store Location by Store Type
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-23
Reporting Systems:OLAP Drill Down:
Store Location and Product Family by Store Type
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-24
Data Mining Applications
• Data mining applications use sophisticated statistical and mathematical techniques to find patterns and relationships that can be used to classify and predict.– Unsupervised data mining — Statistical techniques
are used to identify groups of entities with similar characteristics.
• Cluster Analysis– Supervised data mining:
• A model is developed.• Statistical techniques are used to estimate parameter values
of the model.– Regression analysis
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-25
Data Mining Applications:The Convergence of the Disciplines
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-26
Data Mining Applications:Three Popular Data Mining Techniques
• Decision tree analysis — Classifies entities into groups based on past history.
• Logistic regression — Produces equations that offer probabilities that certain events will occur.
• Neural Networks — Complex statistical prediction techniques
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-27
Data Mining Applications:Market Basket Analysis
• Market Basket Analysis — Determines patterns of associated buying behavior.– Support — The probability that two items will be
purchased together.– Confidence — The probability that an item will be
purchased given the fact that the customer has already purchased another particular item.
– Lift — the ratio of confidence to the basic probability that a particular item will be purchased.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall
15-28
Data Mining Applications:Market Basket Analysis Example