etl with ssis 101 - belle's sql · pdf filedata warehousing etl understanding the data...
TRANSCRIPT
ETL with SSIS 101
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
whoami? aka shameless plugs
� Donabel Santos
� SQL Geek (I love SQL Server!), PowerShell enthusiast
� Principal, QueryWorks Solutions� Training, Consulting
� Instructor, BC Institute of Technology
� Blogger – sqlmusings.com
� Tweep – @sqlbelle
� Author – some articles + upcoming PowerShell/SQL Server book (summer 2012)
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Agenda
� What ETL is
� SSIS Refresher
� Demo, demo, demo
� SSIS Tips
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Materials
� Presentation materials will be at my blog
� http://www.sqlmusings.com
� Demo files
� Slides
� Questions? Comments? Email me
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Extract Transform Load (ETL)
� A way to bridge “source” systems and
“destination” systems
� Mostly used for cleaning data and loading
data warehouse
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Lifecycle: Extract, Transform, Load
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Data Warehousing ETL
� Understanding the data – what needs to be captured and reported
� Also understanding your own Data Quality
� Naming conventions
� Schemas would also be helpful (stg, etl, tmp, dwh)
� Auditing
� When packages last run, when records are last updated etc
� NULL Values
� Not acceptable in a DW
� Will cause loss of data when joins are done
� Maps to an “Unknown” default value in the dimensions
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Data Warehousing ETL
� Staging
� Good for auditing, recoverability, releasing sources (tables, files etc)
� Checksum
� Allows you to easily figure out if any records have changed
� Helpful for Slowly Changing Dimensions (SCD)
� Surrogate Key (SK) vs Business Key/Alternate Key (BK or AK)
� SK is a “meaningless” number that is the PK in the DW
� Faster; no dependencies in source keys; avoids collision
� BK or AK is the PK in the source
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Data Warehousing ETL – Dealing with Changes
� Kimball – Slowly Changing Dimensions (SCD)
� SCD Type 1
� Only keep last value
� How: UPDATE
� When to use: errors, don’t want to keep history, don’t have a lot of storage
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
SK FName City
1 Jamie Vancouver
SK FName City
1 Jamie Richmond
Data Warehousing ETL – Dealing with Changes
� Kimball – Slowly Changing Dimensions (SCD)
� SCD Type 2
� Keep all historical values
� HOW: Insert new records, update start and end dates
� When to use: you want to keep history
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
SK FName City StartDate EndDate
1 Jamie Vancouver 2012-01-01 2012-03-10
2 Jamie Richmond 2012-03-01 NULL
Data Warehousing ETL – Dealing with Changes
� Kimball – Slowly Changing Dimensions (SCD)
� SCD Type 3
� Keep only current, plus last value before current
� How: one more column that has previous value
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
SK FName PreviousCity EffectiveDate CurrentCity
1 Jamie Vancouver 2012-03-10 Richmond
SSIS REFRESHER
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
SSIS Components – Data Flow
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Source Transformation Destination
Common Transformations
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Expressions
� Allow you to have more dynamic (and powerful) SSIS packages
� Allow you to set properties at runtime
� “Configurable” properties depend on which component you’re working with
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Expressions
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
� Check if field is null� ISNULL(RegistrationDate)
� Checks if SellDate is June or later� DATEPART("mm",SellStartDate) > 6
� Calculates markup� ROUND(ListPrice / StandardCost,2) * 100
� Checks if product is a manufactured finished good and size is not a specified bike size� MakeFlag == TRUE && FinishedGoodsFlag == TRUE && Size !=
(DT_WSTR,10)@BikeSize
� Ternary Operator� UPPER(SUBSTRING(Color,1,1)) + LOWER(SUBSTRING(Color,2,15))
� File with today’s date� "C:\\SSIS\\File_" + (DT_WSTR, 30)(DT_DBDATE)GETDATE() + ".txt"
MISSION POSSIBLE
Our mission for today, should we choose to accept it …
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Sources
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Excel 2007
Access 2007
MySQL
Text File
XML
Mission Accomplished
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
GENERAL SSIS TIPS
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Tip: Package Configuration Files
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Tip: EncryptSensitiveWithUserKey
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
Tip: Credentials and Proxies
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
What we covered
� What ETL is
� SSIS Refresher
� Demo, demo, demo
� SSIS Tips
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
ETL/SSIS Resources
� BI Developer Network
� http://www.bidn.com/
� Jamie Thomson – SSIS Junkie
� http://sqlblog.com/blogs/jamie_thomson/
� Pragmatic Works
� http://pragmaticworks.com/
� Matt Masson – SSIS Team Blog
� http://blogs.msdn.com/b/mattm/
� Andy Leonard
� http://sqlblog.com/blogs/andy_leonard/
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]
THANK YOU!
Presentation materials will be at my blog
http://www.sqlmusings.com
Demo files
Slides
Cheatsheets
Questions? Comments? Email me