database management system prepared by dr. ahmed el-ragal reviewed & presented by mr. mahmoud...
TRANSCRIPT
Database Management System
Prepared by Dr. Ahmed El-Ragal
Reviewed & Presented By Mr. Mahmoud Rafeek Alfarra
College Of Science & Technology- Khan younis Information Technology & Computer Science Dep.
Part 8Data Warehousing
Info sources for Info sources for organizationsorganizations
April 20, 2023Data Warehouse2
HR
Financial/ Accounting
ERP
CRM and
eCRM
Internet
Procurement CallCenter
Inventory
Islands of Islands of informationinformation
Applications reporting Applications reporting systems?systems?
April 20, 2023 Data Warehouse 3
Ad-Ad-hochoc
Standard ReportsStandard Reports
ParameterizedParameterizedReportsReports
Data WarehousingData Warehousing
April 20, 2023Data Warehouse4
EnterpriseEnterpriseData Data
WarehouseWarehouse
HR
Financial/ Accounting
ERP
CRM and
eCRM
Internet
Procurement CallCenter
Inventory
April 20, 2023Data Warehouse5
A Simple Definition A data warehouse is a collection of data created to support decision- making applications.
April 20, 2023Data Warehouse6
Another Definition
Data warehousing is the entire process of data extraction, transformation, and loading of data to the warehouse and the access of the data by end users and applications.
April 20, 2023Data Warehouse7
Data Warehouse-- DefinedData Warehouse-- DefinedCollection of Data in Support of Management Reporting Needs and Decision-Making Processes.
Organized by subject areas (known as data marts) and structured for query. Integrated across subject areas. Associated with a point in time, such as pay period, fiscal year, semester. Query only, that is, the data does not change.
April 20, 2023Data Warehouse8
A Data Warehouse A Data Warehouse Designer’sDesigner’s
View of the BusinessView of the Business
MARKET
P R O D U C T
TIM
E
The threeThe threekey wordskey wordsbecome the become the dimensionsdimensionsof the cube.of the cube.
The points The points inside the inside the cube store the cube store the measurementmeasurementssof the of the business –business –a combination a combination of Product, of Product, Market,Market,and Time.and Time.These points These points arearethe business the business facts.facts.
April 20, 2023Data Warehouse9
Goals of the Data WarehouseGoals of the Data Warehouse
Provide Access to various scattered Data.
Include Business Descriptions of Data.
Promote Consistency Across Subject Areas.
Assure Data Quality.
Support User Directed Analysis.
April 20, 2023Data Warehouse10
Runseverynight
ExternalExternalDataData
OperationalOperationalDataData
ExtractExtractTransformTransform
LoadLoad
The Data Loading ProcessThe Data Loading Process
April 20, 2023Data Warehouse11
ExternalExternalDataData
ExtractExtractTransformTransform
LoadLoad
Ready for AccessReady for Accessand Queryand Query
OperationalOperationalDataData
DataDataWarehouseWarehouse
BusinessBusinessDescriptionsDescriptions
The Data Loading ProcessThe Data Loading Process
Characteristics of a Data Characteristics of a Data WarehouseWarehouse
April 20, 2023Data Warehouse12
Subject oriented – organized based on use (on subject not on application).
Integrated – inconsistencies removed.Nonvolatile – stored in read-only format
(historical ).Time variant – data are normally time series.Summarized – in decision-usable format.Large volume – data sets are quite large.Non normalized – often redundant.Metadata – data about data are stored.Data sources – comes from nonintegrated
sources.
Data Warehousing Data Warehousing ArchitectureArchitecture
April 20, 2023Data Warehouse13
Data MartsData Marts
ExtractExtractTransformTransformLoadLoad
External Sources
Operational db systems
ServeServe
OLAP serversOLAP servers
Analysis
Query/ Reporting
Data Mining
April 20, 2023Data Warehouse14
Data Warehouse vs. Data MartData Warehouse vs. Data Mart
Source: adapted from Strange (1997).
Data ExtractionData Extraction
April 20, 2023Data Warehouse15
Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data).
Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems).
Increasing performed by specialized ETL software.
Reasons for “Dirty” DataReasons for “Dirty” Data
April 20, 2023Data Warehouse16
Dummy Values. Absence of Data. Multipurpose Fields. Cryptic Data. Contradicting Data. Inappropriate Use of Address Lines. Violation of Business Rules. Reused Primary Keys. Non-Unique Identifiers. Data Integration Problems.
Data CleansingData Cleansing
April 20, 2023Data Warehouse17
Source systems contain “dirty data” that must be cleansed.
ETL software contains rudimentary data cleansing capabilities.
Specialized data cleansing software is often used.
Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
Steps in Data CleansingSteps in Data Cleansing
April 20, 2023Data Warehouse18
Parsing
Correcting
Standardizing
Matching
Consolidating
ParsingParsing
April 20, 2023Data Warehouse19
Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.
Examples include parsing the first, middle, and last name; street number and street name; and city and state.
CorrectingCorrecting
April 20, 2023Data Warehouse20
Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.
Example include replacing an incomplete address and adding a zip code.
StandardizingStandardizing
April 20, 2023Data Warehouse21
Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.
Examples include adding a pre name, replacing a nickname, and using a preferred telephone.
MatchingMatching
April 20, 2023Data Warehouse22
Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.
Examples include identifying similar names and addresses.
ConsolidatingConsolidating
April 20, 2023Data Warehouse23
Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.
Data StagingData Staging
April 20, 2023Data Warehouse24
Often used as an interim step between data extraction and later steps.
Accumulates data from asynchronous sources using flat files, FTP sessions, or other processes.
At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse.
There is usually no end user access to the staging file.
An operational data store may be used for data staging.
Data TransformationData Transformation
April 20, 2023Data Warehouse25
Transforms the data in accordance with the business rules and standards that have been established.
Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates.
Building The WarehouseBuilding The WarehouseTransforming Data
April 20, 2023Data Warehouse26
TransformTransformChangeChange
CombineCombine
CalculateCalculate
buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...
reg_idreg_id246...
total_salestotal_sales17.6052.808.82
...
buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...
reg_idreg_id246...
total_salestotal_sales17.6052.808.82
...
buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...
priceprice.55
1.10.99...
qtyqty32489...
buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...
reg_idreg_idIIIVVI...
total_salestotal_sales17.6052.808.82
...
buyer_namebuyer_nameBarr, AdamChai, SeanO’Melia, Erin...
priceprice.55
1.10.99...
qtyqty32489...
total_salestotal_sales17.6052.808.82
...
buyer_firstbuyer_firstAdamSeanErin
...
buyer_lastbuyer_lastBarrChai
O’Melia...
reg_idreg_id246...
total_salestotal_sales17.6052.808.82
...
Data LoadingData Loading
April 20, 2023Data Warehouse27
Data are physically moved to the data warehouse.
The loading takes place within a “load window”.
The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for decisional activities that affect operations.
Meta DataMeta Data
April 20, 2023Data Warehouse28
Data about data.Needed by both information technology
personnel and users.IT personnel need to know data sources
and targets; database, table and column names; refresh schedules; data usage measures; etc.
Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.
Reviewed By Mr. Mahmoud Rafeek Alfarra