etl with ssis 101 - belle's sql · pdf filedata warehousing etl understanding the data...

31
ETL with SSIS 101 read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected] 1

Upload: dinhdieu

Post on 07-Mar-2018

234 views

Category:

Documents


4 download

TRANSCRIPT

ETL with SSIS 101

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

whoami? aka shameless plugs

� Donabel Santos

� SQL Geek (I love SQL Server!), PowerShell enthusiast

� Principal, QueryWorks Solutions� Training, Consulting

� Instructor, BC Institute of Technology

� Blogger – sqlmusings.com

� Tweep – @sqlbelle

� Author – some articles + upcoming PowerShell/SQL Server book (summer 2012)

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Agenda

� What ETL is

� SSIS Refresher

� Demo, demo, demo

� SSIS Tips

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Materials

� Presentation materials will be at my blog

� http://www.sqlmusings.com

� Demo files

� Slides

� Questions? Comments? Email me

[email protected]

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

ETL

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Extract Transform Load (ETL)

� A way to bridge “source” systems and

“destination” systems

� Mostly used for cleaning data and loading

data warehouse

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Lifecycle: Extract, Transform, Load

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Data Warehousing ETL

� Understanding the data – what needs to be captured and reported

� Also understanding your own Data Quality

� Naming conventions

� Schemas would also be helpful (stg, etl, tmp, dwh)

� Auditing

� When packages last run, when records are last updated etc

� NULL Values

� Not acceptable in a DW

� Will cause loss of data when joins are done

� Maps to an “Unknown” default value in the dimensions

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Data Warehousing ETL

� Staging

� Good for auditing, recoverability, releasing sources (tables, files etc)

� Checksum

� Allows you to easily figure out if any records have changed

� Helpful for Slowly Changing Dimensions (SCD)

� Surrogate Key (SK) vs Business Key/Alternate Key (BK or AK)

� SK is a “meaningless” number that is the PK in the DW

� Faster; no dependencies in source keys; avoids collision

� BK or AK is the PK in the source

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Data Warehousing ETL – Dealing with Changes

� Kimball – Slowly Changing Dimensions (SCD)

� SCD Type 1

� Only keep last value

� How: UPDATE

� When to use: errors, don’t want to keep history, don’t have a lot of storage

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

SK FName City

1 Jamie Vancouver

SK FName City

1 Jamie Richmond

Data Warehousing ETL – Dealing with Changes

� Kimball – Slowly Changing Dimensions (SCD)

� SCD Type 2

� Keep all historical values

� HOW: Insert new records, update start and end dates

� When to use: you want to keep history

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

SK FName City StartDate EndDate

1 Jamie Vancouver 2012-01-01 2012-03-10

2 Jamie Richmond 2012-03-01 NULL

Data Warehousing ETL – Dealing with Changes

� Kimball – Slowly Changing Dimensions (SCD)

� SCD Type 3

� Keep only current, plus last value before current

� How: one more column that has previous value

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

SK FName PreviousCity EffectiveDate CurrentCity

1 Jamie Vancouver 2012-03-10 Richmond

SSIS REFRESHER

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

BIDS Tour

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

SSIS Components – Data Flow

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Source Transformation Destination

Common Transformations

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Variables

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Expressions

� Allow you to have more dynamic (and powerful) SSIS packages

� Allow you to set properties at runtime

� “Configurable” properties depend on which component you’re working with

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Expressions

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Expressions

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

� Check if field is null� ISNULL(RegistrationDate)

� Checks if SellDate is June or later� DATEPART("mm",SellStartDate) > 6

� Calculates markup� ROUND(ListPrice / StandardCost,2) * 100

� Checks if product is a manufactured finished good and size is not a specified bike size� MakeFlag == TRUE && FinishedGoodsFlag == TRUE && Size !=

(DT_WSTR,10)@BikeSize

� Ternary Operator� UPPER(SUBSTRING(Color,1,1)) + LOWER(SUBSTRING(Color,2,15))

� File with today’s date� "C:\\SSIS\\File_" + (DT_WSTR, 30)(DT_DBDATE)GETDATE() + ".txt"

MISSION POSSIBLE

Our mission for today, should we choose to accept it …

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Sources

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Excel 2007

Access 2007

MySQL

Text File

XML

Mission Accomplished

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

GENERAL SSIS TIPS

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Tip: Package Configuration Files

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Tip: EncryptSensitiveWithUserKey

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

Tip: Credentials and Proxies

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

What we covered

� What ETL is

� SSIS Refresher

� Demo, demo, demo

� SSIS Tips

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

ETL/SSIS Resources

� BI Developer Network

� http://www.bidn.com/

� Jamie Thomson – SSIS Junkie

� http://sqlblog.com/blogs/jamie_thomson/

� Pragmatic Works

� http://pragmaticworks.com/

� Matt Masson – SSIS Team Blog

� http://blogs.msdn.com/b/mattm/

� Andy Leonard

� http://sqlblog.com/blogs/andy_leonard/

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

read my blog sqlmusings.com follow me on twitter @sqlbelle contact me [email protected]

THANK YOU!

Presentation materials will be at my blog

http://www.sqlmusings.com

Demo files

Slides

Cheatsheets

Questions? Comments? Email me

[email protected]