ssis best pratice

47
Bob Duffy Irish SQL Academy 2008. Level 300

Upload: devender-singh-rautela

Post on 16-Apr-2015

221 views

Category:

Documents


30 download

DESCRIPTION

SSIS Best practice, SSIS Best Practice

TRANSCRIPT

Page 1: SSIS Best Pratice

Bob Duffy

Irish SQL Academy 2008. Level 300

Page 2: SSIS Best Pratice
Page 3: SSIS Best Pratice

DTS 2000

1.75 Developers

SSIS 2005

*Figures are only approximations and should not be referenced or quoted

Page 4: SSIS Best Pratice
Page 5: SSIS Best Pratice

• Minimize staging (else use RawFiles if possible)

• Hardware Infrastructure: Disks, RAM, CPU, Network

• SQL Infrastructure: File Groups, Indexing, Partitioning

Optimize and Stabilize the basics

• Replace destinations with RowCount

• Source->RowCount throughput

• Source->Destination throughputMeasure

• OVAL performance tuning strategy

• The Three S‟s

• Data Flow Bag of TricksTune

• Lookup patterns

• Script vs custom transformParallelize

Page 6: SSIS Best Pratice
Page 7: SSIS Best Pratice
Page 8: SSIS Best Pratice
Page 9: SSIS Best Pratice
Page 10: SSIS Best Pratice

• Increase the efficiency of every aspect

Sharpen

• Parallelize, partition, pipelineShare

• Buy faster, bigger, better hardwareSpend

Page 11: SSIS Best Pratice
Page 12: SSIS Best Pratice
Page 13: SSIS Best Pratice
Page 14: SSIS Best Pratice
Page 15: SSIS Best Pratice
Page 16: SSIS Best Pratice

But be aware of limitations

Page 17: SSIS Best Pratice
Page 18: SSIS Best Pratice
Page 19: SSIS Best Pratice

RowBased

(synchronous)

Partially Blocking

(asynchronous)

Blocking(asynchronous)

Page 20: SSIS Best Pratice
Page 21: SSIS Best Pratice
Page 23: SSIS Best Pratice
Page 24: SSIS Best Pratice
Page 25: SSIS Best Pratice
Page 26: SSIS Best Pratice
Page 27: SSIS Best Pratice

4 Gb Fiber Channel

Destination server runs SQL

Server

DatabaseEMC CX3-80

Destination server:Unisys ES7000/One32 sockets each with dual core Intel 3.4 GHz CPUs256 GB RAMWindows Server 2008SQL Server 2008

1 Gb Ethernet connections

2 Gb Fiber Channel

Source servers run SSIS

Source data EMC CX600

Source servers:Unisys ES3220L2 sockets each with 4 core Intel 2 GHz CPUs4 GB RAMWindows Server 2008SQL Server 2008

Page 28: SSIS Best Pratice

Make: Unisys

Model: ES7000/one Enterprise Server

OS: Microsoft Windows Server 2008 x64 Datacenter Edition

CPU: 32 socket dual core Intel® Xeon 3.4 GHz (7140M)

RAM: 256 GB

HBA: 8 dual port 4Gbit FC

NIC: Intel® PRO/1000 MT Server Adapter

Database: Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4)

Storage: EMC Clariion CX3-80 (Qty 1)

11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC

Quantity: 4

Make: Unisys

Model: ES3220L

OS: Windows2008 x64 Enterprise Edition

CPU: 2 socket quad core Intel® Xeon processors @ 2.0GHz

RAM: 4 GB

HBA: 1 dual port 4Gbit Emulex FC

NIC: Intel PRO1000/PT dual port

Database: Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4)

Storage: 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)

Page 29: SSIS Best Pratice
Page 30: SSIS Best Pratice

C1

C1

C1

C1

Page 31: SSIS Best Pratice
Page 32: SSIS Best Pratice

Orders TablePartition

1Partition

2Partition

3Partition

4Partition

5Partition

6Partition

55Partition

56

Orders_1 Orders_2 Orders_3 Orders_4 Orders_5 Orders_6 Orders_55 Orders_56. . .

SSIS

SSIS

SSIS

SSIS

SSIS

SSIS

SSIS

SSIS. . .

orders.tbl.1 orders.tbl.2 orders.tbl.3 orders.tbl.4 orders.tbl.5 orders.tbl.6 orders.tbl.55 orders.tbl.56

Page 33: SSIS Best Pratice
Page 34: SSIS Best Pratice
Page 35: SSIS Best Pratice

(Package details removed to protect the innocent)

Page 36: SSIS Best Pratice
Page 37: SSIS Best Pratice

• Iterative design, development & testingFollow Microsoft

Development Guidelines

• People & Processes

• Kimball‟s ETL and SSIS books are an excellent referenceUnderstand the Business

• Resource contention, processing windows, …

• SSIS does not forgive bad database design

• Old principles still apply – e.g. load with/without indexes?

Get the big picture

• Will this run on IA64 / X64?

• No BIDS on IA64 – how will I debug?

• Is OLE-DB driver XXX available on IA64?

• Memory and resource usage on different platforms

Platform considerations

Page 38: SSIS Best Pratice

• Break complex ETL into logically distinct packages (vsmonolithic design)

• Improves development & debug experience

Process Modularity

• Separate sub-processes within package into separate Containers

• More elegant, easier to develop

• Simple to disable whole Containers when debugging

Package Modularity

• Use Script Task/Transform for one-off problems

• Build custom components for maximum re-use

Component Modularity

Page 39: SSIS Best Pratice

Concise naming conventions

Conformed “blueprint” design patterns

Presentable layout

Annotations

Error Logging

Configurations

Page 40: SSIS Best Pratice

Get as close to the data as possible

• Limit number of columns

• Filter number of rows

Don‟t be afraid to leverage TSQL

• Type conversions, null coercing, coalescing, data type sharpening

• select nullif(name, „‟) from contacts order by 1

• select convert(tinyint, code) from sales

Performance Testing & Tuning

• Connect Output to RowCount transform

• See Performance Best Practices

„FastParse‟ for text files

Page 41: SSIS Best Pratice

BEFORE:

selectdbo.Tbl_Dim_Store.SK_Store_ID

, Tbl_Dim_Store.Store_Num,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di

v_ID, 0) as SK_Merch_Div_IDfrom dbo.Tbl_Dim_Storeleft outer join dbo.Tbl_Dim_Merchant_Division

on dbo.Tbl_Dim_Store.Merch_Div_Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_Num

where Current_Row = 1

AFTER:

select * from etl.uf_FactStoreSales(@Date)

Page 42: SSIS Best Pratice

Use the power of TSQL to clean the data 'on the fly'

Page 43: SSIS Best Pratice

• Too many moving parts is inelegant and likely slow

• But don‟t be afraid to experiment – there are many ways to solve a problem

Avoid over-design

• Allocate enough threads

• EngineThreads property on DataFlow Task

• See Performance Talk

Maximize Parallelism

• Synchronous vs. Asynchronous components

• Memcopy is expensive

Minimize blocking

• For example, minimize data retrieved by LookupTxMinimize

ancillary data

Page 44: SSIS Best Pratice

• Full Cache – for small lookup datasets

• No Cache – for volatile lookup datasets

• Partial Cache – for large lookup datasets

Three Modes of Operation

• Full Cache is optimal, but uses the most memory, also takes time to load

• Partial Cache can be expensive since it populates on the fly using singleton SELECTs

• No Cache uses no memory, but takes longer

Tradeoff memory vs.

performance

• Catch is that it requires Sorted inputs

• See SSIS Performance white paper for more details

Can use Merge Join component

instead

Page 45: SSIS Best Pratice

• Can written in any .Net language

• Must be signed, registered and installed – but can be widely re-used

• Quite fiddly for single task

Custom components

• Can be written in VisualBasic.Net or C#

• Are persisted within a package – and have limited reuse

• Have template methods already created for you

Scripts

Page 47: SSIS Best Pratice