data warehouses
Post on 04-Feb-2016
61 Views
Preview:
DESCRIPTION
TRANSCRIPT
1Data WarehousesBUAD/American University
Data Warehouses
2Data WarehousesBUAD/American University
Definition
• Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization.
3Data WarehousesBUAD/American University
Need forData Warehousing
• Integrated, company-wide view of high-quality information.
• Separation of operational and informational systems and data– operational system: a system that is used to run a
business in real time, based on current data– informational system: systems designed to support
decision making based on stable point-in-time or historical data
4Data WarehousesBUAD/American University
Factors AllowingData Warehousing
• Relational DBMS.
• Advances in hardware: speed and storage capacity.
• End-user computing interfaces and tools.
5Data WarehousesBUAD/American University
Data Warehouse Architectures
• Two-level
– source system files containing operational data
– transformed and integrated data warehouse
• Three-level
– Operational data.
– Enterprise data warehouse (EDW)- single source of data for decision making.
– Data marts - limited scope; data selected from EDW; customized decision-support for individual user groups
6Data WarehousesBUAD/American University
Generic data warehouse architecture
7Data WarehousesBUAD/American University
Three-layer architecture
8Data WarehousesBUAD/American University
Reasons for theThree-Level Architecture
• EDW and data marts have different purposes and data architectures.
• Data transformation is complex and is best performed in two steps.
• Data marts customized decision support for different groups
• Architecture– Operational data, reconciled data, Derived data.
9Data WarehousesBUAD/American University
Three-layer data architecture
10Data WarehousesBUAD/American University
Data Characteristics
• Status vs. Event data.
– A transaction is a business activity that triggers one or more business events: event data captures them
• Transient vs. Periodic data.
– Transient: data in which changes to existing records are written over previous records, thus destroying previous data content
– periodic data: data that are never physically altered or deleted once added
11Data WarehousesBUAD/American University
Example of DBMS log entry
12Data WarehousesBUAD/American University
Transient operational data
13Data WarehousesBUAD/American University
Reconciled DataCharacteristics
• Detailed
• Historical
• Normalized
• Enterprise-wide
• Quality controlled
14Data WarehousesBUAD/American University
The Data Reconciliation Process
• Capture: capture the relevant data from source files to fill EDW– Static - initial load.– Incremental - ongoing update.
• Scrub or data cleansing– missing data, name reconciliation– Pattern recognition and other artificial
intelligence techniques.
15Data WarehousesBUAD/American University
Steps in data reconciliation
16Data WarehousesBUAD/American University
The Data Reconciliation Process
• Transform
– Convert the data format from the source to the target system.
– Record-Level Functions
• Selection.
• Joining.
• Aggregation (for data marts).
– Field-Level Functions
• Single-field transformation
• Multi-field transformation
17Data WarehousesBUAD/American University
The Data Reconciliation Process
• Load and Index– Refresh Mode
• When the warehouse is first created.
• Static data capture.
– Update Mode• Ongoing update of the warehouse.
• Incremental data capture.
18Data WarehousesBUAD/American University
Derived DataCharacteristics
• Type of data– Detailed, possibly periodic.– Aggregated.
• Distributed to departmental servers.
• Implemented in star schema.
19Data WarehousesBUAD/American University
Star Schema
• Also called the dimensional model.
• Fact and dimension tables.– Fact table: consists of factual or quantitative
data about the business– Dimension table: hold descriptive data
• Grain of a fact table - time period for each record.
20Data WarehousesBUAD/American University
Components of a star schema
21Data WarehousesBUAD/American University
Star schema example
22Data WarehousesBUAD/American University
Star schema with sample data
23Data WarehousesBUAD/American University
Example of snowflake sample
24Data WarehousesBUAD/American University
Size of the fact table
• Total number of stores: 1,000
• Total number of products: 10,000
• Total number of periods: 24
• Total rows: 1000 * 10,000 * 24 = 240,000,000
• On average 50% items record sales,– no of rows = 120,000,000
25Data WarehousesBUAD/American University
Types of Data Marts
• Dependent - Populated from the EDW.
• Independent - Data taken directly from the operational databases.
26Data WarehousesBUAD/American University
The User Interface
• The role of metadata.• Traditional query and reporting tools.• On-line analytical processing (OLAP)• The use of a set of graphical tools that provides
users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.
27Data WarehousesBUAD/American University
The User Interface
– Slicing a cube.– Pivot
• Rotate the view for a particular data point to obtain another perspective.
• E.g. take a value from the units column and obtain by-store values.
– Drill-down
28Data WarehousesBUAD/American University
Slicing a data cube
top related