integration services in sql server 2008

36
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP

Upload: wilbur

Post on 04-Jan-2016

55 views

Category:

Documents


4 download

DESCRIPTION

Integration Services in SQL Server 2008. Allan Mitchell SQL Server MVP. Who am I. SQL Server MVP SQL Server Consultant Joint author on Wrox Professional SSIS book Worked with SQL Server since version 6.5 www.SQLDTS.com and www.SQLIS.com. Today’s Schedule. SSIS in SQL Server 2008 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Integration Services in  SQL Server 2008

Integration Services in SQL Server 2008

Allan MitchellSQL Server MVP

Page 2: Integration Services in  SQL Server 2008

Who am I

• SQL Server MVP• SQL Server Consultant• Joint author on Wrox Professional SSIS book• Worked with SQL Server since version 6.5• www.SQLDTS.com and www.SQLIS.com

Page 3: Integration Services in  SQL Server 2008

Today’s Schedule

• SSIS in SQL Server 2008• Behind the scenes– Threading– Pipeline Limiter

• Front of shop– Lookup Component. Caching

and other good stuff.– Data Profiling– Change Data Capture

Page 4: Integration Services in  SQL Server 2008

Behind the Scenes

Threading

Page 5: Integration Services in  SQL Server 2008

Threading

• Loading speed will benefit from getting as many threads spinning as possible.

• In SSIS 2005 this was not always optimised• Blocking Transforms really wrecked your day.• Quaint workarounds

Page 6: Integration Services in  SQL Server 2008

Threading - Multicast Transform

• SSIS 2005• Incredibly quick to generate n copies of the input• All outputs are Synchronous• All outputs on same execution tree• All outputs on same thread.• Round robin

Page 7: Integration Services in  SQL Server 2008

Threading - Multicast Transform

• SSIS 2008• Incredibly quick to generate n copies of the input• All outputs are still Synchronous• All outputs on same execution Path• All outputs can get their own thread

Page 8: Integration Services in  SQL Server 2008

Execution “Tree” 2005

begin execution tree 0 output "Generated" (8) input "Multicast Input 1" (20) output "Multicast Output 1" (21) input "TrashInput" (25) output "Multicast Output 2" (32) input "TrashInput" (29) output "Multicast Output 3" (44)end execution tree 0

Page 9: Integration Services in  SQL Server 2008

Execution “Tree” 2008Begin Path 0 output "Generated" (7); component "Konesans Data Generator Source" (1) input "Multicast Input 1" (36); component "Multicast" (35) Begin Subpath 0 output "Multicast Output 1" (37); component "Multicast" (35) input "TrashInput" (41); component "Trash Destination" (39) End Subpath 0 Begin Subpath 1 output "Multicast Output 2" (48); component "Multicast" (35) input "TrashInput" (45); component "Trash Destination 1" (43) End Subpath 1End Path 0

Page 10: Integration Services in  SQL Server 2008

Demo

Multicast Transform in 2005 and 2008

Page 11: Integration Services in  SQL Server 2008

Behind the Scenes

Pipeline Limiter

Page 12: Integration Services in  SQL Server 2008

Pipeline What?

• That’s right – Limiter• Doesn’t that mean things will go slower in 2008?– No. It happens now in 2005 you just don’t know it

• Why would I want to restrict the pipeline?– Data = buffers = memory– Memory is reused when the buffer terminates– Push back from a component = no reuse of memory =

run out of memory !!!!!

Page 13: Integration Services in  SQL Server 2008

Pipeline Limiter (Nov CTP)

Information: 0x400492E0 at Data Flow Task, SSIS.Pipeline: During last execution the pipeline suspended output "Union All Output 1" (487) of component "Union All" (485) for 10445 milliseconds to limit the number of in-memory buffers.

Page 14: Integration Services in  SQL Server 2008

Pipeline Limiter (Feb CTP/RTM)

• No “Information” event is now raised externally• Everything sent to the log messages• User:PipelineComponentTime• Filter out “Message” attribute starting with “The

Component%”

Page 15: Integration Services in  SQL Server 2008

Demo

Pipeline Limiter

Page 16: Integration Services in  SQL Server 2008

Front of Shop

Lookup Component Caching Options

Page 17: Integration Services in  SQL Server 2008

The only way it works in 2005

• Caching options• Full – Before the Data Flow task really gets going (Initialize) the lookup

components cache the entirety of their reference datasets• Partial– The lookup component will try for a match in cache first then go

to disk. No pre-caching• None– Misnomer really. The last query result is cached but that’s it.

Every other query is to disk.

Page 18: Integration Services in  SQL Server 2008

The only way it works in 2005

• Full Cache is fastest but– Might take longest to cache– Could take large memory amount to cache

• The reference datasets are not passed around– In a loop on the same task you read the reference

dataset n times– Not transferrable across Data Flow tasks

Page 19: Integration Services in  SQL Server 2008

The way it can work in 2008

• The Cache Transform– No transformation happens!– Can be used as a destination

• Cache Connection Manager– File. Can be read by the Raw File adapter.– In Memory– Allow the reference dataset to be passed around like

luggage

Page 20: Integration Services in  SQL Server 2008

The way it can work in 2008

• Just because you can cache the dataset like this does not mean you have to

• Default option is still to use the 2005 way.

Page 21: Integration Services in  SQL Server 2008

Row Redirection

• One of my main dislikes about this component in 2005

• If a lookup component gets no match then by default it fails

• (Row yielded no match during lookup)• You have to configure the error output to handle

this.– Redirect down the error output– Ignore (pass the looked up value as a NULL)

Page 22: Integration Services in  SQL Server 2008

Row redirection

• In 2008 we say hello to the “No Match Output”

Page 23: Integration Services in  SQL Server 2008

Demo

Caching options and redirection

Page 24: Integration Services in  SQL Server 2008

Points to note

• Lookup transform is still case sensitive when done in cache

• When done on disk via OLE DB Connection Manager it is governed by their rules.

• Watch out for different behaviour – Be careful out there.

Page 25: Integration Services in  SQL Server 2008

Data Profiling Task

• Control Flow Task• Profiles data in your database• Quick Setup or verbose• Loads of different metrics you can extract• Uses an ADO.Net Connection Manager• Only SQL2K and above can be profiled

Page 26: Integration Services in  SQL Server 2008

Data Profiling Task

• Fantastic for seeing how rubbish the data is• Useful for identifying distribution of values• Candidate keys in tables• Lengths of columns• Can be read externally or through a variable• XML whichever way you look at it

Page 27: Integration Services in  SQL Server 2008

Demo

Data Profiling Task

Page 28: Integration Services in  SQL Server 2008

Change Data Capture (CDC)

• Billed as an ETL feature• We’ll use it if it’s there• What is it?• How do I use it?

Page 29: Integration Services in  SQL Server 2008

30

Traditional ETL vs. CDC-ETL

Operational Data Sources

DWETL Engine

loadextract transform

• Moves Entire Data Set

• Requires ‘Window of Operation’ (hours to days)

• Frequency/Latency – monthly, weekly, sometimes daily

Traditional BULK ETL

BULKBULKChange Stream Real-time Updates

CDC-based ETL

• Moves Only Changes to the Data

• No ‘Window of Operation’

• Frequency/Latency – multiple times a day, real-time

• Periodic or Continuous Change Flow

Page 30: Integration Services in  SQL Server 2008

Why CDC is Cool

• Increased efficiency of ETL Process• Incremental Extractions• Identify changed rows and columns• Identify operation on data, all changes or net

changes• Log Based (Transactional Replication LR)• Lightweight

Page 31: Integration Services in  SQL Server 2008

Why CDC is Cool

• Viewable as net changes or complete journal• Key point is about the “Change” what you view

about those changes is up to you• Differs from Change Capture in that you see the

data• No excuses for not considering incremental

extracts now.

Page 32: Integration Services in  SQL Server 2008

Terminology around CDC

• Capture Instance – Base Object (Max 2 per object)• Capture Process – Reading of the log and placing

the rows into the change tables• Retention Periods

Page 33: Integration Services in  SQL Server 2008

Demo

Using Change Data Capture

Page 34: Integration Services in  SQL Server 2008

Some of the things I did not cover

• C# is now a scripting option

Page 35: Integration Services in  SQL Server 2008

Questions?

Page 36: Integration Services in  SQL Server 2008

Contact Details/Resources

[email protected] / www.SQLDTS.com

www.SQLServerFAQ.com