university of manitoba asper school of business 3500 dbms bob travica business analytics and...
TRANSCRIPT
University of ManitobaAsper School of Business
3500 DBMSBob Travica
Business Analytics and Decision Making
OLTP, OLAP & SAP
Chapter 9 & SAP Materials
Updated 2015
DDBB
SSYYSSTTEEMMSS
2 of 20
OLTP vs. OLAP
Online Transaction Processing (OLTP) = relational database systems
Online Analytical Processing (OLAP)
DDBB
SSYYSSTTEEMMSS
3 of 20
OLAP via Data Warehousing
Online TransactionProcessing (OLTP):Querying Databaseswith 3NF tables
Operations’data
Predefinedreports
Online Analytical Processing (OLAP);Data warehousing;Data Mining.Usually de-normalized data.
Periodicaltransfers
Interactivedata analysis
Flat files
MIS 3500
DDBB
SSYYSSTTEEMMSS
OLTP & OLAP in Enterprise Systems
Enterprise Systems (Enterprise Resource Planning Systems) support both.
Example: An SAP-based system can be a TPS, MIS and DSS for the entire organization. DSS capability draws on data warehousing & cubing.
Process approach to organization with data flowing smoothly end-to-end. Process link up horizontally (department-to-department) and vertically (process-sub-process).
Business process* is for the most part the system process.
More...
4 of 20
DDBB
SSYYSSTTEEMMSS
5 of 20
Date Warehousing Goals
Data warehouse (DW) yields multidimensional view of data (cubing):
Integrate data from different sources to get a larger picture of
business Data aggregations (summaries on different dimensions)
DW expands scope of data analysis: Statistical analysis (test hypotheses on relationships between
pieces of data) Discover new relationships by querying cubes or applying data
mining software.
DDBB
SSYYSSTTEEMMSS
6 of 20
Extraction, Transformation, and Loading
Data warehouse:All data must be consistent.
Customers
Convert “Client” to “Customer”
Apply standard product numbers
Convert currencies
Fix region codes
Transaction data from diverse systems.
• Preparations performed on data – ETL process
Extract
Transform (Transport)
Load
DDBB
SSYYSSTTEEMMSS
7 of 20
Three-Dimensional View of Data: Cube
Product
Sales atLocation
Sales D
ate
Logic similar tocrosstab query And pivot table.
• Created in a datawarehouse
P5
P1 P2P4P3
DDBB
SSYYSSTTEEMMSS
8 of 20
Data Hierarchy
Year
Quarter
Month
Week
Day
LevelsRoll-upTo get higher-level totals
Drill-downTo get lower-level details
DDBB
SSYYSSTTEEMMSS
LocationLocationID
Detail
9 of 20
Datawarehouse Tables: Star Design
Revenue=Price*Quantity
Fact Table *
SaleSaleDateQuantityDiscount
Dimension
Inputted from Product and Sale
Revenue broken down by product, sales location, and desired time period (time column/s – day of year, or evensmaller; basis for rollup).
ProductProductID
Price
Dimension
Dimension
Design is:- Hierarchical (dimension tables have no direct association)- De-normalized (fact table): Price & Quantity inputted to Fact table;
New keys usually used in the fact table (e.g., SaleTbl#-Row#).
SaleSaleDateQuantityDiscountStoreID
Calculatedfact
DDBB
SSYYSSTTEEMMSS
10 of 20
Datawarehouse Tables: Snowflake Design
MerchTblRowSaleTblRowPriceQuantity
OLAPItems
ItemIDDescriptionPriceCategory
Product
SaleIDSaleDateCustomerID DiscountSalesTax
Sale
CustomerIDPhoneFirstNameLastNameAddressZipCodeCityID
Customer
CityIDZipCodeCityState
City
Design is:-Network-like (dimension tables can connect directly)-Still partly normalized (Sale-Customer-City)
Fact Table Dimension Tables
• Advantage: Design simplified in the Fact table part.
DDBB
SSYYSSTTEEMMSS
SAP Datawarehouse
11 of 20More on SAP Datawarehouse
Datawarehouse Cube Details
Can also be Dimensions
DDBB
SSYYSSTTEEMMSS
12 of 20
Multidimensional View of Data – Precursors to DW: Excel Pivot Table
Can place data in rows or columns.By grouping months, can instantly get quarterly or monthly totals.
Quarter MonthQuarter 1 Quarter 2 Quarter 3 Quarter 4 Grand Total
LastName EmployeeIDDataCarpenter 8 Sum of Animal 1,668.91 606.97 426.39 7.20 2,709.47
Sum of Merchandise 324.90 78.30 99.00 128.70 630.90Eaton 6 Sum of Animal 522.37 341.85 562.50 1,426.72
Sum of Merchandise 30.60 54.90 107.10 192.60Farris 7 Sum of Animal 5,043.36 1,059.70 796.47 6,899.53
Sum of Merchandise 826.92 188.10 306.00 1,321.02Gibson 2 Sum of Animal 4,983.51 1,549.83 2,556.10 9,089.44
Sum of Merchandise 668.25 238.50 450.90 1,357.65Hopkins 4 Sum of Animal 3,747.96 1,194.88 372.65 128.41 5,443.90
Sum of Merchandise 476.91 252.90 121.50 7.20 858.51James 5 Sum of Animal 3,282.77 2,373.08 437.88 150.11 6,243.84
Sum of Merchandise 505.89 693.45 99.00 99.00 1,397.34O'Connor 9 Sum of Animal 2,643.69 180.91 510.12 3,334.72
Sum of Merchandise 263.70 83.70 55.80 403.20Reasoner 3 Sum of Animal 4,577.43 625.74 589.68 2,500.24 8,293.09
Sum of Merchandise 762.30 89.10 116.80 396.90 1,365.10Reeves 1 Sum of Animal 1,120.93 1,120.93
Sum of Merchandise 263.88 263.88Shields 10 Sum of Animal 1,008.76 162.15 1,170.91
Sum of Merchandise 62.10 22.50 84.60Total Sum of Animal 28,599.69 7,591.11 2,840.72 6,701.03 45,732.55Total Sum of Merchandise 4,185.45 1,624.05 569.50 1,495.80 7,874.80
Time hierarchy“Dimensions”
“Facts”(Measures)
DDBB
SSYYSSTTEEMMSS
13 of 20
Multidimensional View of Data: CUBE Option in SQL 99
Bird 1 135.00 0 0Bird 2 45.00 0 0…Bird (null) 32.00 0 0Bird (null) 607.50 1 0Cat 1 396.00 0 0Cat 2 113.85 0 0…Cat (null) 1293.30 1 0(null) 1 1358.8 0 1(null) 2 1508.94 0 1(null) 3 2362.68 0 1…(null) (null) 8451.79 1 1
Category Month Amount Gc Gm
SELECT Category, Month, Sum, GROUPING (Category) AS Gc, GROUPING (Month) AS Gm
FROM …GROUP BY CUBE (Category, Month...)
DDBB
SSYYSSTTEEMMSS
14 of 20
GROUPING SETS: Hiding Details
Bird (null) 607.50Cat (null) 1293.30…(null) 1 1358.8(null) 2 1508.94(null) 3 2362.68…(null) (null) 8451.79
Category Month Amount
SELECT Category, Month, SumFROM …GROUP BY GROUPING SETS ( ROLLUP (Category),
ROLLUP (Month),( ) )
DDBB
SSYYSSTTEEMMSS
15 of 20
SQL RANK FunctionsSELECT Employee, SalesValue RANK() OVER (ORDER BY SalesValue DESC) AS rankDENSE_RANK() OVER (ORDER BY SalesValue DESC) AS denseFROM SalesORDER BY SalesValue DESC, Employee;
Employee SalesValue rank dense
Jones 18,000 1 1
Smith 16,000 2 2
Black 16,000 2 2
White 14,000 4 3
DENSE_RANK does not skip numbers
• Therefore, advances in SQL motivate DBMS vendors to support OLAP and data warehousing.
DDBB
SSYYSSTTEEMMSS
16 of 20
Broader Data Analysis withData Mining
Goal: To discover unknown relationships in the data that can be used to make better decisions. Exploratory analysis. A bottom-up approach that scans the data to find relationships Some statistical routines, but they are not sufficient
Statistics relies on averages
Sometimes the important data lies in more detailed pairs
Supervised by developer vs. unsupervised (self-organizing artificial neural networks)
DDBB
SSYYSSTTEEMMSS
17 of 20
Common Techniques
1. Classification/Prediction
2. Association Rules/Market Basket Analysis
3. Clustering
DDBB
SSYYSSTTEEMMSS
18 of 20
1. Classification(Prediction)
Purpose: “Classify” things that are causes and those that are effects.
Examples Which borrowers/loans are most likely to be successful?
Which customers are most likely to want a new item?
Which companies are likely to file bankruptcy?
Which workers are likely to quit in the next six months?
Which startup companies are likely to succeed?
Which tax returns are fraudulent?
DDBB
SSYYSSTTEEMMSS
19 of 20
Classification Process
Clearly identify the outcome/dependent variable. Identify potential variables that might affect the outcome. Use sample data to test and validate the model. Regression/correlation analysis, decision tables and trees,
etc.
Income Credit History Job Stability Credit Success
50000 Good Good Yes
75000 Mixed Bad No
DDBB
SSYYSSTTEEMMSS
20 of 20
2. Association/Market Basket
Purpose: Determine what events or items go together/co-occur.
Examples: What items are customers likely to buy together?
(Business use: Consider putting the two together to
increase cross-selling.)
DDBB
SSYYSSTTEEMMSS
21 of 20
Association Challenges
If an item is rarely purchased, any other item bought with it seems important. So combine items into categories.
Some relationships are obvious. Burger and fries.
Some relationships are puzzling/meaningless. Hardware store found that toilet rings sell well only when a new
store first opened. But what does it mean?
DDBB
SSYYSSTTEEMMSS
22 of 20
3. Cluster Analysis Purpose: Determine groups of people or some entities. Examples
Are there groups of customers? (If so, we could target them; market segmentation)
Do the locations for our stores have elements in common? (If so, we can search for similar clusters for new locations.)
Do employees have common characteristics? (If so, we can hire similar, or dissimilar, people.)
Small intra-cluster distance
Large inter-cluster distance