introduction to data warehouse by sathish yellanki
DESCRIPTION
Introduction Notes To Data WarehousingTRANSCRIPT
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 1
Welcome To The World of Data Analysis And Business Intelligence
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 2
What is A Data Warehouse?
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 3
• A Data Warehouse is A Relational OR Multidimensional Database That is Designed For Query And Analysis.
• Data Warehouse is Not Optimized For Transaction Processing Instead Optimized For Query And Data Analysis.
• Data Warehouses Are Designed To Consolidate Historical And Transactional Data Derived From Multiple Data Sources.
• Data Warehouses Separate Analysis Workload From Transaction Workload And Enable An Organization To Consolidate Data From Several Data Sources.
• Data in A Data Warehouse Usually Stores Many Months OR Years of Data To Support Historical Analysis.
• The Data in A Data Warehouse is Typically Loaded Through An Extraction, Transformation, And Loading (ETL) Process From One OR More Data Sources.
• The Data Sources That Feed The Data For The Data Warehouse Can Be OLTP Applications, Mainframe Applications, OR External Data Providers.
• The End Users of The Data Warehouse Perform Data Analyses That is Often Time-Related.
• Sophisticated Analyses Include Trend Analyses And Data Mining, Which Use Existing Data To Forecast Trends OR Predict Futures.
• The Data Warehouse Typically Provides The Foundation For A Business Intelligence Environment.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 4
What Are The Key Characteristics of A Data Warehouse? • Within The Data Warehouse Some Data is De-Normalized For
Query Simplification And To Improve Analysis Performance. • Large Amounts of Historical Data is Used. • Queries Often Retrieve Large Amounts of Data. • Both Planned And Ad-Hoc Queries Are Common. • The Data Load is Controlled By Quantum And By Time.
What is The Key Point For Successful Data Warehouse? • High Data Throughput is The Key Point For A Successful Data
Warehouse.
What Are The Common Data Warehousing Tasks? • Configure A Database For Use As A Data Warehouse. • Take The Initial Steps in Consolidating The Data. • Begin To Define The Target Objects in The Warehouse. • Define Strategies For Extracting, Transforming, And Loading Data into
The Target. • Consider Performance Issues And Advanced ETL Concepts. • Deploy To Target Schemas And Execute ETL Logic. • Analyze Metadata Dependencies. • Write Efficient SQL. • Refresh The Data Warehouse. • Optimize Operations For Query Analysis. • Eliminate Performance Bottlenecks. • Review Some Basics of Data Warehouse Backup And Recovery. • Review Some Basics of Data Warehouse Security.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 5
What Does An Data Warehouse Environment Includes? • An Extraction, Transportation, Transformation, And Loading
(ETL) Solution. • An Online Analytical Processing (OLAP) Engine. • Client Analysis Tools. • Applications That Manage The Process of Gathering Data And
Delivering it To Business Users.
What is Data Warehouse By Definition? • A Data Warehouse As Per Bill Inmon is Collection of Data in
Support of Management's Decision Making Process Associated With Properties Like • Subject-Oriented • Integrated • Time-Variant • Non-Volatile
Subject Oriented • Data Warehouses Are Designed To Help The End User To
Analyze Data Specific To An Area of The Business.
Integrated • Data Warehouses Must Put Data From Disparate Sources into A
Consistent Format. • Data Warehouses Must Resolve Problems Such As
• Naming Conflicts
• Inconsistencies Among Units of Measure
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 6
Nonvolatile • Nonvolatile Means That, Once Entered into The Warehouse,
Data Should Not Change.
• Nonvolatile is Logical Because The Purpose of A Warehouse is To Enable The End User To Analyze What Has Occurred.
• The Final Rule is Historical Data in A Data Warehouse Should Never Be Altered.
Time Variant • In Order To Discover Trends in Business, Analysts Need Large
Amounts of Data.
• This is Very Much in Contrast To Online Transaction Processing (OLTP) Systems, Where Performance Requirements Demand That Historical Data Be Moved To An Archive.
• Historical Data is Kept in A Data Warehouse With A Time Span of 3 Months, 6 Months, 12 Months, OR Even Older Data From A Data Warehouse.
What is Ralph Kimball View in Building Data Warehouse? • A Data Warehouse is A Copy of Transaction Data Specifically
Structured For Query And Analysis As Per The Subject of Analysis in Business Trends, Which is A Functional View of A Data Warehouse.
• Kimball Did Not Address How The Data Warehouse is Built, Rather He Focused on The Functionality of A Data Warehouse.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 7
Contrasting Environments of OLTP And Data Warehousing
OLTP
Complex Data Structures (3NF Databases)
Few
Many
Normalized DBMS
Rare
Data Warehouse
Multidimensional Data Structures
Many
Some
Denormalized DBMS
Common
Data Structures
Indexes
Joins
Duplicated Data
Derived Data And Aggregates
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 8
Difference Between Data Warehouses And OLTP Systems Workload • Data Warehouses Are Designed To Accommodate Ad-Hoc Queries. • We Might Not Know The Workload of The Data Warehouse in Advance,
Hence A Data Warehouse Should Be Optimized To Perform Well For A Wide Variety of Possible Query Operations.
• OLTP Systems Support Only Predefined Operations. The Applications Might Be Specifically Tuned OR Designed To Support Only Specified Operations.
Data Modifications • A Data Warehouse is Updated on A Regular Basis By The ETL Process
Which is Run Nightly OR Weekly Using Bulk Data Modification Techniques.
• The End Users of A Data Warehouse Do Not Directly Update The Data Warehouse.
• In OLTP Systems, End Users Routinely Issue Individual Data Modification Statements To The Database.
• The OLTP Database is Always Up To Date, And Reflects The Current State of Each Business Transaction.
Schema Design • Data Warehouses Often Use Denormalized OR Partially Denormalized
Schemas Which is Generally A Star Schema To Optimize Query Performance.
• OLTP Systems Often Use Fully Normalized Schemas To Optimize Update OR Insert OR Delete Performance, And Should Guarantee Data Consistency.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 9
Typical Operations • A Typical Data Warehouse Query Scans Thousands OR Millions of Rows. • A Typical OLTP Operation Accesses Only A Handful of Records.
Historical Data • Data Warehouses Usually Store Many Months OR Years of Data, To
Support Historical Analysis. • OLTP Systems Usually Store Data From Only A Few Weeks OR Months,
Which is Only Historical Data As Needed To Successfully Meet The Requirements of The Current Transaction.
Can We Visualize The Data Warehouse Architecture? • Different Data Warehousing Systems Have Different Types of
Structural Requirements. • Some Data Warehouses May Have An Operational Data
Store(ODS), While Some May Have Multiple Data Marts. • Some Data Warehouses May Have A Small Number of Data
Sources, While Some May Have Dozens of Data Sources. • Because of These Dynamic Differences That Arise A Data
Warehouse is Considered As A Collection of Different Layers Rather Than Specifics of Any One System.
What Are The Different Layers of A Data Warehouse? • Data Source Layer • Data Extraction Layer • Staging Area • ETL Layer • Data Storage Layer
• Data Logic Layer • Data Presentation Layer • Metadata Layer • System Operations Layer
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 10
Components of The Data Warehouse Architecture
Metadata Layer
Data Source Layer
Data Extraction
Layer
Staging Area
ETL Layer
Data Storage Layer
Data Logic Layer
Data Presentation
Layer
System Operations Layer
Data Source Layer • Data Source Layer Represents The Different Data Sources That
Feed Data into The Data Warehouse. • The Data Source Can Be of Any Format Which Can Be Plain
Text File, Relational Database, Other Types of Database, Excel File.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 11
• Many Different Types of Data Can Be A Data Source • Operations Data, Such As Sales Data, HR Data, Product Data,
Inventory Data, Marketing Data, Systems Data. • Web Server Logs With User Browsing Data. • Internal Market Research Data. • Third-Party Data, Such As Census Data, Demographics Data, OR
Survey Data.
Data Extraction Layer • In Data Extraction Layer The Data Gets Pulled From The Data
Source into The Data Warehouse System. • In The Data Extraction Layer There is Likely Some Minimal Data
Cleansing, But There is Unlikely Any Major Data Transformation.
Staging Area • Staging Area is The Area Where Data Sits Prior To Being
Scrubbed And Transformed into A Data Warehouse OR Data Mart.
• Having One Common Area Makes it Easier For Subsequent Data Processing OR Integration in Data Mart OR Data Warehouse.
ETL Layer • ETL Layer is The Layer Where Data Gains its "Intelligence", As
Logic is Applied To Transform The Data From A Transactional Nature To An Analytical Nature.
• ETL Layer is Also The Layer Where Data Cleansing is Implemented.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 12
• The ETL Design Phase is Often The Most Time-Consuming Phase in A Data Warehousing Project, To Make The Process More Quick And Productive An ETL Tool is Often Used in This Layer.
Data Storage Layer • Data Storage Layer is The Layer Where The Transformed And
Cleansed Data Actually Resides OR Sits. • Based on Scope And Functionality of The Project, Three Types
of Entities Can Be Found Here • Data Warehouse • Data Mart • Operational Data Store (ODS)
• Depending on The Requirement of Analysis Conducted Upon The System, We May Have Just One of The Three, Two of The Three, OR All of The Three Types.
Data Logic Layer • Data Logic Layer is The Layer Where Business Rules Are Stored. • Business Rules Stored in Data Logic Layer Do Not Affect The
Underlying Data Transformation Rules, But The Business Rules Do Affect The Way The Report Looks Like.
Data Presentation Layer • Data Presentation Layer Refers To The Information That
Reaches The End Users. • Data Presentation Layer Can Be in A Form of A Tabular OR
Graphical Report in A Browser.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 13
• The Report Can Be An Emailed Report That Gets Automatically Generated And Sent Everyday, OR An Alert That Warns Users of Exceptions, Among Others.
• Usually An OLAP Tool AND/OR A Reporting Tool is Used in The Data Presentation Layer.
Metadata Layer • Metadata Layer is The Layer Where Information About The Data
Stored in The Data Warehouse System is Stored. • A Logical Data Model Would Be An Example of Something That's
in The Metadata Layer. • A Metadata Tool is Often Used To Manage Metadata.
System Operations Layer • System Operations Layer Includes Information on How The
Data Warehouse System Operates, Such As • ETL Job Status • System Performance • User Access History
What Are The Different Data Warehouse Architectures? • Data Warehouses And Their Architectures Vary Depending
Upon The Specifics of An Organization's Situation. • Before Deciding The Type of Data Warehouse We Have To Study
The Constructive Requirement of The Client And His Need in Data Analysis.
• The Requirement Phase of The Data Warehouse Decides This.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 14
Three Common Architectures • Basic Data Warehouse Architecture • Staging Area Based Data Warehouse Architecture • Staging Area And Data Marts Integrated Data Warehouse Architecture
Basic Data Warehouse Architecture
Data Source
Operational System
Operational System
Flat Files
Metadata
Raw Data Summary
Data
Warehouse
Analysis
Reporting
Mining
Users
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 15
Staging Area Based Data Warehouse Architecture
Data Source
Operational System
Operational System
Flat Files
Metadata
Raw Data Summary
Data
Warehouse
Analysis
Reporting
Mining
Users
Staging Area
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 16
Staging Area And Data Marts Integrated Data Warehouse Architecture
Data Source
Operational System
Operational System
Flat Files
Metadata
Raw Data
Summary Data
Warehouse
Analysis
Reporting
Mining
Users
Staging Area
Sales
Purchasing
Inventory
Data Marts
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 17
Let Us Understand The Data Warehouse Design
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 18
What Are The Steps in Data Warehouse Project Cycle? • Requirement Gathering • Physical Environment Setup • Data Modeling • Extraction, Transformation And Loading(ETL) • OLAP Cube Design • Front End Development • Report Development • Performance Tuning • Query Optimization • Quality Assurance • Rolling Out To Production • Production Maintenance • Incremental Enhancements
What Are Components Set For Building A Data Warehouse? Task Description • Typically Needs To Be Accomplished During This Particular Data
Warehouse Design Phase. Time Requirement • A Rough Estimate of Time This Particular Data Warehouse Task Takes. Deliverables • Documents Fully Describing The Steps And Results of Particular Task,
Important For Consultants To Communicate Results To Clients. Possible Pitfalls • Things To Watch Out For Failure in Implementing Successful Data
Warehouse.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 19
Requirements Gathering
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 20
Task Description • Requirements Gathering is The First Step in Which The Project Team
Will Engage on Highest Priority. • End Users May Not Be Familiar With The Data Warehousing Process OR
Concept, Hence The Help of The Business Sponsor is Essential. • Requirement Gathering Can Be Conducted As One-To-One Meetings OR
As Joint Application Development (JAD) Sessions, Where Multiple People Are Talking About The Project Scope in The Same Meeting.
• Requirements Gathering Phase May Identify The End User Reporting OR Analysis Requirements, And The Project Team Should Spend The Remaining Period of Time Trying To Satisfy These Requirements.
• Based on The Information Gathered Above, A Disaster Recovery Plan Needs To Be Developed So That The Data Warehousing System Can Recover From Accidents That Disable The System.
Time Requirement • 2 To 8 weeks.
Deliverables • A List of Reports OR Cubes To Be Delivered To The End Users By The
End of This Current Phase. • An Updated Project Plan That Clearly Identifies Resource Loads And
Milestone Delivery Dates.
Possible Pitfalls • Data From Multiple Sources Can Be Critical For Management. • Political Battles That Center on The Willingness of Information Sharing. • Unwillingness of Certain Groups To Release OR Participate in DWH. • High Level Influence To Make Sure Everyone Cooperates.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 21
Physical Environment Setup
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 22
Task Description • Set Up The Physical Servers And Databases And Configure The Servers
And Databases. • It is Necessary To Set Up A Development And A Production
Environment At The Minimum OR Development, Testing, And Production Environments Separately.
• The Different Physical Environments Should Be Set Up And Configured For The Different Processes That Are Initiated And Operated in Data Warehouse, Each Environment Individually For ETL, OLAP Cube, And Reporting.
• It is Best For The Different Environments Are Configured With Their Own Distinct Application And Database Servers.
Time Requirement • The Servers And Database Configurations Should Take Less Than 1
Week.
Deliverables • Hardware OR Software Setup Document For All of The Environments,
Including Hardware Specifications, And Scripts OR Settings For The Software.
Possible Pitfalls • To Save on Capital, Often Data Warehousing Teams Will Decide To Use
Only A Single Database And A Single Server For The Different Environments.
• Instead of Having An Environment Physically Separated The Teams May Go For Logical Separation, Which Can Effect The Production And Development Parallel.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 23
Data Modeling
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 24
Task Description • Data Modeling is The Most Important And Crucial Step in The Data
Warehousing Project. The Foundation of The Data Warehousing System Totally Depends on The Data Model.
• A Good Data Model Will Allow The Data Warehousing System To Grow Easily, As Well As Allowing For Good Performance.
• In Data Warehousing Project, The Logical Data Model is Built Based on User Requirements, And Then it is Translated into The Physical Data Model.
• Part of The Data Modeling Process is Mostly Dedicated in Identification of Data Sources in The Actual Relational System.
Time Requirement • 2 To 6 Weeks.
Deliverables • Identification of Data Sources. • Logical Data Model. • Physical Data Model.
Possible Pitfalls • It is Essential To Have A Subject-Matter Expert As Part of The Data
Modeling Team, Which Can Be Costly Resource Some Times. • Subject Matter Expert Can Be A Person Who Can Be An Outside
Consultant OR Someone In-House Who Has Extensive Experience in The Industry, Which Could Be Scarcity.
• The Absence of The Subject Matter Expert Makes it Difficult To Get A Definitive Answer on Many of The Questions That Arise in Reporting Requirement, And The Entire Project Gets Dragged Out.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 25
Extraction, Transformation And Loading(ETL)
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 26
Task Description • The ETL (Extraction, Transformation, Loading) Process Typically Takes
The Longest Time To Develop, ETL Process Can Easily Take Up To 50% of The Data Warehouse Implementation Cycle OR Longer.
• The Main Reasons Are • Time To Get The Source Data • Understand The Necessary Columns • Understand The Business Rules • Understand The Logical And Physical Data Models
Time Requirement • 1 To 6 Weeks.
Deliverables • Data Mapping Document • ETL Script / ETL Package In The ETL Tool
Possible Pitfalls • Major Time Because of Lack of Experience Too Little Development Time
is Allocated For The Development Team To Develop The ETL. • End Users Can Excuse The Formatting Mistakes OR Pattern Combination
Mistakes But Cannot Tolerate The Data Which is Wrong by Values. • Lack of Experienced Personnel in The ETL Team Can Make The Process
More Complicated Than Necessary Creating Difference of Opinions. • In ETL Design, The Primary Goal Should Be To Optimize Load Speed
Without Sacrificing on Quality, Which is Many Times Not At All Applied. • Design Goal is To Cover All Possible Future Uses, Including Practical OR
Just A Figment of Someone's Imagination, Costing The Performance of The Entire Data Warehousing System.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 27
OLAP Cube Design
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 28
Task Description • The Design of The OLAP Cube Can Be Derived From The Requirement
Gathering Phase. • End Users Have Some Idea on What They Want, But it is Difficult For
Them To Specify The Exact Report OR Analysis They Want To See. • It is Usually A Good Idea To Include Enough Information So That End
User Can Feel That The Data Warehouse is Containing At-Least Some Combination of Data As Expected.
• OLAP Cube is A Data Structure That Keeps A Pattern Based Data Combinations Very Specific To The Requirement of The Reports of The End User.
• The Data Warehousing is An Iterative Process, Hence No One Can Ever Meet All The Requirements All At Once in One Single Data Warehouse.
Time Requirement • 1 To 2 Weeks.
Deliverables • Documentation Specifying The OLAP Cube Dimensions And Measures. • Actual OLAP Cube OR Report.
Possible Pitfalls • We Should Make Sure That The OLAP Cube-Building Process is
Optimized And it Should Be Tested Thoroughly For its Performance And Optimization Before Execution.
• It is Common For The Data Warehouse To Be on The Bottom of The Nightly Batch Load, And After The Loading of The Data Warehouse, There Usually Isn't Much Time Remaining For The OLAP Cube To Be Refreshed.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 29
Front End Development
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 30
Task Description • Data Warehouse Has No Recognition And Identification if The Users
Cannot Visualize The Reports, Regardless of its Strength of The OLAP Engine And The Integrity of The Data.
• The Front End Development is An Important Task And Challenging Environment of A Data Warehouse.
• The Reports Should Be Delivered Over The Web, Hence The Front-End Developers Should Conform To The Interne And Standardization To Standard Browsers.
• The Front-End Technology For Development Can Range From An Internal Front-End Development Team Using Scripting Languages Such As ASP, PHP, OR PERL, To Off-The-Shelf Products Such As Seagate Crystal Reports, To The More Higher-Level Products Such As Actuate. OR Business Objects OR COGNOS OR BI Publisher.
Time Requirement • 1 To 4 Weeks.
Deliverables • Front End Deployment Documentation
Possible Pitfalls • End Users Does Not Care The Complexity And Advancement of The
Front-End Technology And Least Cares The Infrastructure. • He Expects That The Information He Needs For Analysis Should Be
Presented in Timely Manner, With Higher Degree of Accuracy And Consistency.
• The Clients Rigidness Can Be A Major Concern For The Development Teams.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 31
Report Development
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 32
Task Description • Report Specification Typically Comes Directly From The Requirements
Phase Itself.
• Report Development Plays A Very Important Role in Determining The Success of The Data Warehousing Project As End User Final Delivery Point is Reports.
• The Criticality of Report Design Depends on The Kind of Information And Arrangement The Client Expects in Presentation, The Dynamism Also Plays A Very Important Role Here.
• The Major Tasks of Accomplishments in report Design Are
User Customization
The Report Development Process Need To Consider The Metrics And Filter Standards Upon The Data.
Report Delivery
What Report Delivery Methods Are Needed? in Addition To Delivering The Report To The Web Front End, Other Possibilities That May Include Are Delivery Via Email, Via Text Messaging, OR in Some Form of Spreadsheet.
Access Privileges
Special Attention Needs To Be Paid To Who Should Have What Access To Which Information.
• Report Development Does Not Happen Only During The Implementation Phase of The Data Warehouse, It Can Even Continue After The System Goes into Production State As There May Be Requests For Additional Reports As To The Strategy of The Business Requirements of The Customer.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 33
• The Categories of Report Requests Raised By The Client Can Include Two Different Situations Data is Already Available in The Data Warehouse • In This Case, it is Straightforward To Develop The New Report into
The Front End. • There is No Need To Wait For A Major Production Push in This Case
Before Making New Reports Available. Data is Not Yet Available in The Data Warehouse • This Means That The Request Needs To Be Prioritized And Put into A
Future Data Warehousing Development Cycle.
Time Requirement • 1 To 2 Weeks.
Deliverables • Report Specification Documentation. • Reports Set Up in The Front End OR Reports Delivered To User's
Preferred Channel. • Confirmation Document on The Tasks That Are Completed As To The
Standards of The Client.
Possible Pitfalls • Make Sure The Exact Definitions of The Report Are Communicated To
The Users, Else The Interpretation of The Report Can Be Erroneous. • There Can Be Some Difference of Opinion From The Developers To
Subject Experts in Understanding The State of The Terminology And Jargon of The Business Domains.
• The Type of The Reporting Tool Determines The Complexity of Development.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 34
Performance Tuning
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 35
Task Description • Three Major Areas Where A Data Warehousing System Uses
Performance Tuning Extraction, Transformation And Loading • The Data Load is Usually Very High Time-Consuming Process, Data
Warehousing Related Batch Jobs Are Typically of Lower Priority. Query Processing • In A ROLAP Environment OR in A System Where The Reports Are
Run Directly Against The Relationship Database, Query Performance Can Be An Issue.
• It is Ideal For The Data Warehousing Team To Invest Some Time To Tune The Query, Especially The Most Popular Ones.
Report Delivery • End Users May Experience Significant Delay in Receiving Reports
Due To Factors Other Than The Query Performance, Which Include Network Traffic, Server Setup, And The Way Front-End is Built.
Time Requirement • 3 To 5 Days.
Deliverables • Performance Tuning Document Along With Goal And Result
Possible Pitfalls • Make Sure The Development Environment Mimics The Production
Environment As Much As Possible. • Performance Enhancements Seen on Less Powerful Machines
Sometimes Do Not Materialize on The Larger, Production-Level Machines.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 36
Query Optimization
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 37
Task Description • For Any Production Database, SQL Query Performance Becomes An
Issue Sooner or Later in The Process of The Project. • Having Long-Running Queries Consumes System Resources Making The
Server And Application Run Slowly, May Lead To Table Locking And Data Corruption Issues.
Principles For Query Optimization • We Should Understand How Our Database is Executing Our Query. • Retrieve As Little Data As Possible Through The Query. • Store Intermediate Results That Are Produced By The Query. • Sometimes Logic For A Query Can Be Quite Complex. Due To The Use of
Sub-Queries, Inline Views, And Union-Type Statements, in Which The Results Are Not Stored in The Database, But Are Immediately Used Within The Query, Leading To Performance Issues.
• To Increase Query Performance Store The Intermediate Results in A Temporary Table, And Break Up The Initial SQL Statement into Several SQL Statements.
• Build An Index on The Temporary Table To Speed Up The Query Performance.
Query Optimization Strategies. • Use Indexing Standards. • Aggregate Table With The Aggregated Data. • Vertical Partitioning • Horizontal Partitioning • De-Normalization • Server Tuning
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 38
Quality Assurance
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 39
Task Description • Once The Development Team Declares That Everything is Ready For
Further Testing, The QA Team Takes Over The Responsibility. • The QA Team is Always Proposed From The Client. • Usually The QA Team Members Will Know Little About Data
Warehousing, Hence May Resend The Need To Have To Learn Another Tool OR Tools.
• Sometimes The QA Process is Overlooked, Because of Which The Target Time of The Project May Increase.
Time Requirement • 1 To 4 Weeks.
Deliverables • QA Test Plan • QA Verification That The Data Warehousing System is Ready To Go into
Production
Possible Pitfalls • Usually The QA Team Members Know Little About Data Warehousing,
Because of Which The Process Can Become A Vicious Cyle. • Make Sure The QA Team Members Get Enough Education So That They
Can Complete The Testing Themselves. • The Client May Not Have Metrics OR Parameters That Are Definite For
The Quality He Expects. • The Data Warehousing Team Should Know The Standards of The QA
That Are Defined By The Client, Else The Directionality of The Process Can Reach Great Deviations.
• QA Parameters May Be Logical Terms OR Physical By Nature.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 40
Rollout To Production
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 41
Task Description • Once The QA Team Gives Assurance on The Standards And Quality, it is
Time For The Data Warehouse System To Go Live. • Depending on The Number of End Users, Roll Out To Production Can
Take Up To A Full Week To Bring Everyone Online. • It is Easy When The End Users Access The Data Warehouse Over The
Web, Making Going Production Easier By Sending An URL Via Email.
Time Requirement • 1 To 3 Days.
Deliverables • Delivery of The Data Warehousing System To The End Users.
Possible Pitfalls • We Have To Take Care of The User Education Needs For Addressing The
System.
Production Maintenance Task Description • Once The Data Warehouse Goes into Production, it Needs To Be
Maintained By Change Requests That May Arise. • Tasks Such As Regular Backup And Crisis Management Become
Important And Should Be Planned Very Properly. • We Have To Consistently Monitor End User Usage of The System. • The Purpose of Monitoring The Data Warehouse
• To Capture Any Runaway Requests So That They Can Be Fixed Before Slowing The Entire System Down.
• To Understand How Much Users Are Utilizing The Data Warehouse For Return-on-investment Calculations And Future Enhancement Considerations.
Sunday, August 31, 2014 Data Warehouse Concepts By Sathish Yellanki Slide No : 42
Time Requirement • It is Always Ongoing
Deliverables • Consistent Availability of The Data Warehousing System To The End
Users.
Possible Pitfalls • Usually By This Time Most of The Developers Could Have Left The
Project, So it is Essential That Proper Documentation Be Left For Those Who Are Handling Production Maintenance.
Incremental Enhancements Task Description • Once The Data Warehousing System Goes Live, There Are Often Needs
For Incremental Enhancements, Which Are Simple Small Changes That Follow The Business Itself.
Deliverables • Change Management Documentation • Actual Change To The Data Warehousing System
Possible Pitfalls • Many Times The Changes Are Simple To Make, it is Very Tempting To
Just Go Ahead And Make The Change in Production. • Many Unexpected Problems Can Pop Up if Changes Are Taken Suddenly. • It is Strongly Recommend That The Typical Life Cycle is Followed
• Development • QA • Production