gic2011 aula3-ingles
TRANSCRIPT
Information & Knowledge Management - Class 3
Marielba ZacariasProf. Auxiliar DEEI
FCT I, Gab 2.69, Ext. 7749Data-warehousing
[email protected]://w3.ualg.pt/~mzacaria
Data Warehousing
Data collection for analysis and reporting taks
Historical data
Stored in a distinct environment from operational data
Structure different from data-bases
Why
Operational and analitical data have different requirements in terms of
usage (frequency, response time)
hardware
software
structure
7
operationalAtomic
dwDept.dw
individualdw
Detaileddaily
current valueHigh access prob.
Application oriented
More granularTemporalIntegrated
Subject orientedSumarized
derived,Some primitive
Typical of MarketingEngineeringProductionAccounting
temporalAd-hoc
HeuristicNão-repetitive
Oriented to PC orworkstations
The “arquitected” environment”
“data-marts”
8
J. Jones123 Main St.Credit - AA
1986-87J. Jones
456 High St.Credit - B
1987-89J. Jones
456 High St.Credit - A
1989 – pte.J. Jones
123 Main St.Credit - AA
Jan – 4101Fev – 4209Mar- 4175Apr - 4215
Clientes Desde 1982Com saldos
> 5,000 e crédito
>= B
operacional Atomicodw Dept. individual
MonthlySales?
Jones Credit?
Jones Credit
History?
Client typesin analysis?
Type of questions
Data-warehouse designRequirement Gatherings
Physical Environment Setup
Data Modeling
ETL
OLAP Cube Design
Front End Development
Report Development
Performance Tuning
Query Optimization
Quality Assurance
Rolling out to Production
Production Maintenance
Incremental Enhancements
Requirements Gathering
Take into account users
Executive with little time and knowledge about technical terms
Interviews, JAD sessions
User Reporting/Analysis Requirements
Hardware, training requirements
Data source identification
Concrete project plan
Physical Environment Setup
Setup Servers, DBMS and databases, ETL, OLAP Cubes and reporting services
Create three environments
development, testing, production
Data-modeling
Depends on initial data source identification
Conceptual, logical and physical data modeling
Should be related to the information
architecture!!!!
Data ModelingDimensional Approach
Transactional data is partitioned in facts
Numeric transaction data
products ordered, price
Dimensions
provide context for facts
order date, customer name, product number, location info, salesperson
Dimensional Approaches
Star
Fact table (typically a transaction)
Dimensions (context of the transaction)
Snowflake
Dimensions indirectly linked to fact tables
OLAP Cube DesignSpecification of detailed reporting needs in terms of the multi-dimensional structure previously defined (star or snowflake), but regarded as a n-dimensional cube
star/snowflake and cubes are pretty much the same thing
cubes are more appropriate for not IT users
SQL ServerIntegration Examples II
Qualitative data
Description term ActionId
team meeting 18
hr distribution 19
project list 19
team meeting 19
hr distribution 26
project list 26
claims application 27
claims application 28
cards application maintenance 29
claims application integration 30
hr distribution 31
project list 31
claims application 34
claims application 35
hr distribution 36
project list 36
Front-end developmentFront-ends range from
in-house development with scripting languages php, asp, or perl
to off-the-shelf products such as Crystal Reports or higher-end products such as Actuate
OLAP vendors also offer front-ends of their own
Report DevelopmentDerived from requirements
Main point of contact between the data-warehouse and users
User customization
Report Delivery (web, e-mail, sms, file formats)
Access privileges
Performance Tuning
ETL
Query Processing
Users loose interest after 30 sec!
Query optimization
Report Delivery
Query Optimization
Understand how your DBMS executes queries
Store intermediate results in temporary tables
Query Optimization tips
Use indexes
Partition tables (vertically and horizontally)
De-normalize (less joins)
Server Tuning
Quality Assurance
Test plan with quality criteria for data
Critical success factor
Often overlooked
Performed by people with knowledge of the business data not data-warehouses
Resistance
Rolling to production
Seems easy but..
Putting everyone online may take a full week in some cases
Online access can be as simple as sending a link by e-mail
Production MaintenanceBackup and recovery processes
Crisis Management
Monitoring end-user usage
Capture runaways queries before whole system is slowed down
To measure usage for ROI calculations and future enhancements
Incremental enhancements
Accomplish small changes such as changing original geographical designations
A company may add new sales regions
No matter how simple, never do them directly in production environment