kushal data warehousing ppt
DESCRIPTION
In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting (1) and data analysis (2). Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.TRANSCRIPT
BY BY Kushal SinghKushal Singh Acute Informatics Pvt Acute Informatics Pvt
What is Business Intell igence?
BI is an abbreviation of the two words Business Intelligence, bringing the right information at the right time to the right
people in the right format.
What is Data What is Data Warehousing?Warehousing?
Data Warehouse is a Data Warehouse is a subject-oriented, subject-oriented, integrated, nonvolati leintegrated, nonvolati le and and time-time-variantvariant collection of data in support of collection of data in support of management’s decisions.management’s decisions.
What is Business Intelligence?
Operational
data source1
The architecture
Query Manage
Warehouse Manager
DBMS
Operational
data source 2
Meta-dataHigh
summarized data
Detailed data
Lightly
summarized
data
Operational
data store (ods)
Operational
data source n
Archive/backup
data
Load Manager
Data mining
OLAP(online analytical processing) tools
Reporting, query,
application development,
and EIS(executive information system) tools
End-user
access toolsTypical architecture of a data warehouse
Operational data store (ODS)
The benefits of data The benefits of data warehousingwarehousing
• The potential benefits of data warehousing The potential benefits of data warehousing are high returns on investment..are high returns on investment..
• substantial competitive advantage..substantial competitive advantage..• increased productivity of corporate increased productivity of corporate
decision-makers..decision-makers..
8
Data Warehouse Data Warehouse CharacteristicsCharacteristics
Key Characteristics of a Data WarehouseKey Characteristics of a Data Warehouse
Subject-orientedSubject-oriented IntegratedIntegratedTime-variantTime-variantNon-volatileNon-volatile
9
Subject OrientedSubject Oriented• Example for an insurance company :
PolicyPolicyCustomerCustomer
Data
LossesLosses PremiumPremium
Commercial and Life
Insurance Systems
Commercial and Life
Insurance Systems
Auto and Fire Policy
Processing Systems
Auto and Fire Policy
Processing Systems
Data
Accounting System
Accounting System
Claims Processing
System
Claims Processing
SystemBilling System
Billing System
Applications Area Data Warehouse
10
IntegratedIntegrated• Data is stored once in a single integrated location
(e.g. insurance company)
Data WarehouseDatabase
Subject = Customer
Auto PolicyProcessing
System
Auto PolicyProcessing
System
Customer data stored in severaldatabases
Fire PolicyProcessing
System
Fire PolicyProcessing
System
FACTS, LIFECommercial, Accounting
Applications
FACTS, LIFECommercial, Accounting
Applications
11
Time - VariantTime - Variant
Data is tagged with some element of time - creation date, as of date, etc.
Data is available on-line for long periods of time for trend analysis and forecasting. For example, five or more years
Data Warehouse Data
Time Data
{Key
• Data is stored as a series of snapshots or views which record how it is collected across time.
12
Non-Volatile
• Existing data in the warehouse is not overwritten or
updated. External Sources
• Read-Only
DataWarehouseDatabaseData
WarehouseEnvironment
Data Warehouse
Environment
ProductionDatabases
ProductionApplications
ProductionApplications
• Update• Insert• Delete
• Load
Comparision of OLTP systems and data Comparision of OLTP systems and data warehousing systemwarehousing system
OLTP systemsOLTP systems Data warehousing systemsData warehousing systems
Hold current dataHold current dataStores detailed dataStores detailed data
Data is dynamicData is dynamicRepetitive processingRepetitive processing
High level of transaction throughputHigh level of transaction throughputPredictable pattern of usagePredictable pattern of usage
Transaction-drivenTransaction-drivenApplication-orentedApplication-orented
Supports day-to-day decisionsSupports day-to-day decisionsServes large number of clerical/operation usersServes large number of clerical/operation users
Holds historical dataHolds historical dataStores detailed, lightly, and highly summarized Stores detailed, lightly, and highly summarized datadataData is largely staticData is largely staticAd hoc, unstructured, and heuristic processing Ad hoc, unstructured, and heuristic processing Medium to how level of transaction throughputMedium to how level of transaction throughputUnpredictable pattern of usageUnpredictable pattern of usageAnalysis drivenAnalysis drivenSubject-orientedSubject-orientedsupports strategic decisionssupports strategic decisionsServes relatively how number of managerial Serves relatively how number of managerial users users
OLTPOLTP
Online Transaction Online Transaction ProcessingProcessing
OOn n LLine ine TTransaction ransaction PProcessingrocessing
• What is a Transaction ? What is a Transaction ?
– AA Logical unit of workLogical unit of work
– Examples:Examples:– Drawing Money from a bank accountDrawing Money from a bank account– Booking a seat on an airlineBooking a seat on an airline
TransactionsTransactions• It is a unit of program execution that It is a unit of program execution that
accesses & possibly updates various data accesses & possibly updates various data items.items.
• A transaction is a A transaction is a logical unit of worklogical unit of work that that performs some useful function for a user. performs some useful function for a user.
• In end of the transaction the system must In end of the transaction the system must be: be: – in the prior state (if the transaction fails) or in the prior state (if the transaction fails) or – the status of the system should reflect the the status of the system should reflect the
successful completion (if the transaction successful completion (if the transaction succeeded). succeeded).
• May take a database from one consistent May take a database from one consistent state to another.state to another.
Characteristics of TransactionsCharacteristics of Transactions
AA tomicitytomicityCC onsistencyonsistencyII solationsolationDD urabilityurability
OLAPOLAP
Online Analytical ProcessingOnline Analytical Processing
Types of OLAP
• ROLAP (Relational Online Analytical ROLAP (Relational Online Analytical Processing)Processing)
• MOLAP (Multidimensional Online MOLAP (Multidimensional Online Analytical Processing)Analytical Processing)
• HOLAP (Hybrid Online Analytical HOLAP (Hybrid Online Analytical Processing)Processing)
ROLAPROLAP
• ROLAP (Relational online analytical Processing)
• Used for reporting • Tools: Report studio
MOLAPMOLAP
• MOLAP (Multidimensional online Analytical processing)
• Used to build cubes• Tools: Powerplay, Transformer
HOLAPHOLAP
• HOLAP (Hybrid online analytical Processing)
• Used for Data modeling• This will support both MOLAP and ROLAP • Tools: Framework manager, Query Studio.
DimensionsDimensions
• It’s descriptive information about a It’s descriptive information about a measures like product, location, customer measures like product, location, customer etc.etc.
Types of DimensionsTypes of Dimensions
• Confirmed DimensionsConfirmed Dimensions• Degenerated DimensionsDegenerated Dimensions• Junk DimensionsJunk Dimensions
FactsFacts
• Fact is containing measures and IDs.Fact is containing measures and IDs.• Ex; Revenue, Cost, Amount etcEx; Revenue, Cost, Amount etc
Measure TypesMeasure Types
• Additive Measures:Additive Measures: Which can be added Which can be added across all the dimensionsacross all the dimensions
• Non Additive Measures:Non Additive Measures: Which can not be Which can not be added across all the dimensionsadded across all the dimensions
• Semi Additive Measures:Semi Additive Measures: Which can be Which can be added across some dimensions and which added across some dimensions and which can not be added across some other can not be added across some other dimensionsdimensions
Schema’s In Data warehousingSchema’s In Data warehousing
• STAR SHEMASTAR SHEMA• SNOW-FLAKE SCHEMASNOW-FLAKE SCHEMA• STAR-FLAKE SCHEMASTAR-FLAKE SCHEMA
28
Star SchemaStar Schema
Region_Dimension_Tableregion _id
NENWSESW
region _doc
NortheastNorthwestSoutheastSouthwest
account _id
100000110000120000130000140000
account _doc
ABC ElectronicsMidway ElectricVictor ComponentsWashburn, Inc.Zerox
Account_Dimension_Table
Product_Dimension_Tableprod_grp_id
102030
prod_id
100140220
prod_grp_desc
Fewer devicesCircuit boardsComponents
prod_desc
Power supplyMotherboardCo-processor
month
01-199602-199603-1996
mo_in_fiscal_yr
456
month_name
JanuaryFebruaryMarch
prod_id
100140220
region_id
SWNESW
account_id
100000110000100000
vend_id
100200300
net-sales
30,00023,00032,000
gross_sales
50,00042,00049,000
Monthly_Sales_Summary_Table
Time_Dimension_Table
Fact Table
Dimension Tables
Vendor_Dimension_Tablevend_id
100200300
vendor_desc
PowerAge, Inc.Advanced Micro DevicesFarad Incorporated
month
01-199602-199603-1996
SNOW-FLAKE SCHEMASNOW-FLAKE SCHEMA
Factless Fact TableFactless Fact Table• It’s just a bridge between table where we used to join It’s just a bridge between table where we used to join
tables.tables.
• In this scenario we can only track the eventIn this scenario we can only track the event..
SCDSCD(Slowly Changing Dimensions)(Slowly Changing Dimensions)
• TYPE 0TYPE 0• TYPE 1TYPE 1• TYPE 2TYPE 2• TYPE 3TYPE 3
ETLETL (Extract, Transform and Loading) (Extract, Transform and Loading)
INFORMATICAINFORMATICA
DesigningDesigning
FRAMEWORK FRAMEWORK MANAGERMANAGER
Relational DatabaseRelational Database & &
DMRDMR
REPORTINGREPORTING
IBM COGNOSIBM COGNOS