Download - Agile Data Warehousing
Agile Data WarehousingFrom Start to Finish
Presenter: Davide Mauri, Architect & Mentor, SolidQModerator: Alex Whittles
Technical Assistance
2
If you require assistance during the session, type your inquiry into the question pane on the right side.
Maximize your screen with the zoom button on the top of the presentation window
Type your questions in the question pane on the right side
Thank You Sponsors
Welcome to the Azure family!Try DocumentDB today!
http://documentdb.com
Solutions from Dell help you monitor, manage, protect and improve your SQL Server environment.
http://software.dell.com/sql-pass-vc-dell-sql-server-solutions
Planning on attending PASS Summit 2014? Start saving today!
• The world’s largest gathering of SQL Server & BI professionals
• Take your SQL Server skills to the next level by learning from the world’s SQL Server experts, in 190+ technical sessions
• Over 5000 attendees, representing 2000 companies, from 52 countries, ready to network & learn
Use discount code 24HOP14to save $200!
$1,895UNTIL SEPTEMBER
26, 2014
www.PASSSummit.com
Davide Mauri
SolidQ Mentor Board of Directors, SolidQ Italy Microsoft SQL Server MVP Works with managers to build
effective, tailor-made BI solutions for customers
@mauridb
Agile Data WarehousingFrom Start to Finish
Davide Mauri, Architect & Mentor, SolidQ
Agenda
What is a DWH, really? Agile: the only way to succeedEngineering the DWHETL Design PatternsETL AutomationTesting
What is a DWH, really?
The Data-Driven Age
Isn’t the DWH and “old” thing?
Big Data, In Memory and all the new stuff, can’t just replace the Data Warehouse?
The answer would be “yes”, if a DWH would be a simple “container” of data.
But it’s much more than this.
What is a DWH, really?
In this new era, data is like water.
Who will ever drink from untested, untrusted, uncertified data?
What is a DWH, really?
Would a manager or a decision maker, take a decision based on data of which he doesn’t know the source, the integrity and the correctness?
What is a DWH, really?
The Data Warehouse is the place where managers and decision makers will look for• Correct• Trusted• UpdatedData in order to make a conscious decision
What is a DWH, really?
The answer is now easy:
What is DWH, really?
A place to store consolidated data coming from the whole companyA place where cleanse, verify and certify dataA place where historic data is storedA place that holds the single version of truth (if there is one!)Forms the core of a BI solutionUser friendly Data models, designed to make data analysis easier
Modern Data Environment
MasterData
EDWData Mart
Big Data
UnstructuredData
BI Environment
Analytics Environment
StructuredData Data Scientist
Decision Maker
Agility: the only way to succeed
EDW: Reality Check
EDW is the trusted container of all company data
It cannot be created in “one day”
It has to grow and evolve with business needs.
It will never be 100% complete
The story so far
Adapt to Survive
“50% of requirements change in the first year of a BI project”
Andreas Bitterer, Research VP, Gartner
Agile Principles
Small design upfront. Prototype.
Delivery quickly, Deliver frequently.
Users are part of the development team!Feedback is a key part of the successThey’ll grow with the solution and the solution will grow with them
Embrace Changes!
http://agilemanifesto.org/principles.html
Agile Challenges
Delivery Quickly and Fast Challenge: keep high quality, no matter who’s doing the
work
Embrace Changes Challenge: don’t introduce bugs. Change the smallest part
possible. Use automatic Testing to preserve and assure data quality.
Engineering the DWH
Engineering the solution
To be Agile, some engineering practices needs to be included in our work model
Agility != Anarchy
Engineering: Apply well-known models Define, Apply & Enforce rules Automate and/or Check rules application Measure Test
24
Engineering the solution
Favor Kimball Approach (for user-facing models) Dimensional Modeling Fact & Measures Dimensions
Use views to introduce abstraction layers Reduce the “friction” between layers (source / stage / dwh
/ dm) Apply the “Information Hiding Principle”
Engineering the solution
Define & Force the application of well-known ETL patterns
SCD1 / SCD2 Incremental / Partition Load
Divide Et Impera At least two SSIS solutions many small SSIS Packages 5 Databases (STG, CFG, LOG, MD, DWH)
Design Pattern
“A general reusable solution to a commonly occurring problem within a given context”
Design Pattern
Generic ETL Pattern Partition Load Incremental/Differential Load
Generic DWH/BI Design Pattern Slowly Changing Dimension
SCD1, SCD2, ecc. Fact Table
Transactional, Snapshot, Temporal Snapshot
Design Pattern
Specific SQL Server Patterns Change Data Capture Change Tracking Partition Load SSIS Parallelism
ETL Automation
No Monkey Work!
Let the people think and let the machines do the «monkey» work.
Invest on Automation?
Faster development Reduce Costs Embrace Changes
Less bugs
Increase solution quality and make it consistent throughout the whole product
Hi-Level Vision
STGETLETL
OLTP DWH
ETL
Technical Process
Business Process
Technical Process
ETL Phases
«E» and «L» must be Simple, Easy and Straightforward Completely Automated Completely Reusable
«E» and «L» have ZERO value in a DWH Solution
Should be done in the most economic way
Automation Tools
PowerShell / .NET Supported by SMO & SSIS API Microsoft creates platforms not only products!
BIML – BI Markup Language From Varigence Free with BIDS Helper Full support with MIST
Metadata
Metadata is needed in order to make automation a repeatable process
Source to Staging Info Staging to DWH info
Dimension Keys Dimension & Fact Table relationship
Extended Properties + SQL Server DMVs help to maintain metadata coherent
Unit Testing
Unit Testing
Data MUST be tested.
It’s like water, remember?
If trust is lost, DWH is an#epicfail
Unit Testing
Before releasing anything data in the DW must be tested.
User has to validate a sample of data (e.g.:total invoice amount of January 2012)
That validated value will become the reference value
Before release, the same query will be executed again. If the data is the expected reference data then test is green otherwise the test fails
Unit Testing
Of course test MUST be automated when possible Visual Studio NUnit extensions
NBI BI.Quality
What to test? Aggregated results Specific values of some «special» rule Fixed bugs/tickets
40
The perfect BI process & architecture
AGILE BI
Iterative!
Questions?
Like What You Heard?
Davide will be presenting at PASS Summit 2014!
PreConference: Agile Data Warehousing: Start to Finish
General Session: Agile BI: Unit Testing and Continuos
Integration
Use discount code 24HOP14to save $200!
@mauridb
Thank You for Attending