data warehousing isys 650. what is a data warehouse? a data warehouse is a subject-oriented,...
Post on 22-Dec-2015
225 views
TRANSCRIPT
![Page 1: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/1.jpg)
Data Warehousing
ISYS 650
![Page 2: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/2.jpg)
What is a data warehouse?• A data warehouse is a subject-oriented, integrated,
nonvolatile, time-variant collection of data in support of management’s decision.
– Subject-oriented: data is organized around major subjects of the enterprise, such as sales, rather than individual transactions, and is oriented to decision making.
– Integrated: the same piece of information collected from various systems is referred to in only one way.
• Example: Gender: M, F; Male, Female; Sex: 0, 1– Nonvolatile: Data is loaded into a data warehouse on a
scheduled basis.– Time-variant: Historical data to support time-series and
trend analysis.
![Page 3: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/3.jpg)
What is a Data Warehouse?• A physical repository where relational data are
specially organized to provide enterprise-wide, cleansed data in a standardized format
• “The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”
![Page 4: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/4.jpg)
Need for Data Warehousing• Separation of operational and informational systems and
data for improved performance.
![Page 5: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/5.jpg)
Types of Data in a DW
• Current detailed data: consistent at the time the data is extracted from the transaction system.
• Old detailed data: need to be archived.• Summarized data• Metadata:
– A directory of what is in the warehouse.– A guide to mapping data from transaction
database to data warehouse
![Page 6: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/6.jpg)
Data MartA departmental data warehouse that stores only relevant data
– Dependent data mart A subset that is created directly from a data warehouse
– Independent data martA small data warehouse designed for a strategic business unit or a department
![Page 7: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/7.jpg)
DW Framework
DataSources
ERP
Legacy
POS
OtherOLTP/wEB
External data
Select
Transform
Extract
Integrate
Load
ETL Process
EnterpriseData warehouse
Metadata
Replication
A P
I
/ M
iddl
ewar
e Data/text mining
Custom builtapplications
OLAP,Dashboard,Web
RoutineBusinessReporting
Applications(Visualization)
Data mart(Engineering)
Data mart(Marketing)
Data mart(Finance)
Data mart(...)
Access
No data marts option
![Page 8: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/8.jpg)
Extraction, transformation, and load (ETL)
Data Integration and the Extraction, Transformation, and Load (ETL) Process
Packaged application
Legacy system
Other internal applications
Transient data source
Extract Transform Cleanse Load
Datawarehouse
Data mart
![Page 9: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/9.jpg)
Representation of Data in DW
• Dimensional Modeling – a retrieval-based system that supports high-volume query access
• Star schema – the most commonly used and the simplest style of dimensional modeling– Contain a fact table surrounded by and connected to several
dimension tables– Fact table contains the descriptive attributes (numerical values)
needed to perform decision analysis and query reporting– Dimension tables contain classification and aggregation information
about the values in the fact table
• Snowflakes schema – an extension of star schema where the diagram resembles a snowflake in shape
![Page 10: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/10.jpg)
Multidimensionality
• MultidimensionalityThe ability to organize, present, and analyze data by several dimensions, such as sales by region, by product, by salesperson, and by time (four dimensions)
• Multidimensional presentation – Dimensions: products, salespeople, market segments, business units,
geographical locations, distribution channels, country, or industry– Measures: money, sales volume, head count, inventory profit, actual
versus forecast– Time: daily, weekly, monthly, quarterly, or yearly
![Page 11: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/11.jpg)
Example: Northwind Database
![Page 12: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/12.jpg)
Examples of Sales Analysis
• Total sales by Product• Sales related to Customer:
– Location: Sales by City, Country
• Sales related to Time:– Quarterly, monthly, yearly Sales
• Sales related to Employee:
![Page 13: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/13.jpg)
Analyze Sales DataDetailed Business Data
• Total sales:• Amount of each detail line:
Quantity*UnitPrice*Discount• Sum (Quantity*UnitPrice*Discount)
• Total quantity sold:• Sum(Quantity)
• Detailed business data: – Quantity*UnitPrice*Discount– Quantity
![Page 14: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/14.jpg)
Dimensions for Data Analysis:Factors relevant to the detailed business data
• Analyze sales by:– Product, product category– Location: City, State, Country– Time:
• Quarterly, yearly sales
– Employee:– And combinations of these dimensions:
• Ex: Product and Location, Product and Time
![Page 15: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/15.jpg)
Data Warehouse Design- Star Schema -
• Dimension tables– contain descriptions about the subjects of the
business such as customers, employees, locations, products, time periods, etc.
• Fact table– contain detailed business data with links to
dimension tables.
![Page 16: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/16.jpg)
Define Product Dimension
• Product Table:– ProductID, ProductName, SupplierID, CategoryID,
QuantityPerUnit, UnitPrice, UnitsInStock, UnitsOnOrder, ReorderLevel, Discontinued
• Product dimension table:– ProductID, ProductName, CategoryID
![Page 17: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/17.jpg)
Define Employee Dimenstion
• Employees Table: EmployeeID, LastName, FirstName, Title, TitleOfCourtesy, BirthDate, HireDate, Address, City, Region, PostalCode, Country, HomePhone, etc.
• Employee Dimension:– EmployeeID, FullName, Title, EmpCity
![Page 18: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/18.jpg)
Define Location Dimension• Customers table:
– CustomerID, CompanyName, ContactName, ContactTitle, Address, City, Region, PostalCode, Country, Phone, Fax
• Location dimension:– LocationCode, City, Country– Define Location Code: This is an artificial code created
to link detailed business data with the city and country.– In the Northwind database, I used the Make Table
query to create a Location table from the Customers table with City and Country fields. Then I used the Customers table’s design view to add a LocationCode field with the Auto Number data type.
![Page 19: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/19.jpg)
Define Period Dimension • Period:
– In the Orders table: OrderDate– In the data warehouse we define Period to be:
PeriodCode, Year, Quarter• OrderDate: 04-Jul-1996 -> 1996, 3, 7• OrderDate: 20-Dec-1996 -> 1996, 4, 12
– In Access: Create view based on Orders table• Year:Year(OrderDate); Month:Month(OrderDate)• Quarter:
– Quarter: IIf([month]<=3,1,IIf([month]<=6,2,IIf([month]<=9,3,4)))
– Define Period Code: • PeriodCode:Cstr(Year) + Cstr(Quarter)• 1996, 3, 7 -> 19963• 1996, 4, 12 -> 19964
![Page 20: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/20.jpg)
Star Schema
FactTableLocationCodePeriodCodeEmployeeIDProductID
QtyAmount
LocationDimension
LocationCodeCity
Country
EmployeeDimensionEmployeeID
FullNameTitle
EmpCity
ProductDimensionProductID
ProductNameCategoryID
PeriodDimensionPeriodCode
YearQuarter
![Page 21: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/21.jpg)
A Query to retrieve data for Fact Table
![Page 22: Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data](https://reader033.vdocuments.us/reader033/viewer/2022042702/56649d7d5503460f94a5fb68/html5/thumbnails/22.jpg)
Transfer Data Between Access Databases
• Create/Query/Design View– 1. Create the query with the data to transfer– 2. Click Make Table button
• Make table in the same database• Make table in other database
– 3. Click Run