2 sql server 2008 etl drilldown shane bartle principal consultant bin 309 pat martin anz sql premier...

Post on 24-Dec-2015

220 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2

SQL Server 2008 ETL drilldownShane BartlePrincipal Consultant

BIN 309

Pat MartinANZ SQL Premier Field Engineer

Microsoft New Zealand

3

What We Will Cover

Background to SSISSource Data Extraction – New ApproachesMonitoring EnhancementsDeveloper AdditionsData Profiling

4

END USER TOOLS & PERFORMANCE MANAGEMENT APPS

ExcelPerformancePoint

Server

BI PLATFORM

SQL Server Reporting Services

SQL Server Analysis Services

SQL Server DBMS

SQL Server Integration Services

SharePoint Server

DELIVERY

Reports Dashboards Excel Workbooks

AnalyticViews Scorecards Plans

Integrated End-To-End BI Offering

5

Integration todayIncreasing data volumesIncreasingly diverse sources

Requirements reached the tipping pointLow-impact source extractionEfficient transformationBulk loading techniques

SQL Server 2008 SSISBackground to SSIS

6

GeoSpatial Data:Semi structured

Legacy data: binary files

Application database

Integration is a seamless, manageable operationSource, prepare, & load data in single, auditable processScale to handle heavy and complex data requirements

SQL Server Integration Services

GeoSpatialComponents

Customsource

Standardsources

Data-cleansingcomponents

Merges

Data miningcomponents

Warehouse

Reports

Mobiledata

Integration Services In Action

Cube

7

Current SSIS Thread SchedulerThreads affinitised to dataflow subtreesThread starvation on highly-parallel designsSingle thread for each synchronous pathNon-linear scale-up (plateau)

SSIS Pipeline ParallelismRewrote the thread schedulerImproved performance and scaleThread pool shared across multiple components

BenefitsBetter performance (50%) in highly-parallel designsLess manual tuning during development (lower TCO)Better hardware utilisation (higher ROI)It just works!

Performance Improvements

Faster!

8

Extracting data from the source is expensiveEfficient extraction is key to improving ETL performanceInvolves bulk loading data into staging areas or warehouse

Time consuming and resource intensiveTriggers (synchronous IO penalty)Timestamp columns (Schema changes)Complex queries (delayed IO penalty)Custom (ISV, mirror, snapshot, …)

Incremental data load is key to efficient extractionNeed to know what changed at source since a point in time

Expensive lookups to determine changed columnsProviding information up front about which columns changed

SQL Server 2008 SSISSource Data Extraction – New Approaches

9

Change Data CaptureInformation about what changed at the source

Operation (Insert, Update, Delete)Update mask (which columns changed)

Changes captured from the log asynchronouslyMinimal impact on source systemLog reader can be scheduled to run during idle time

Enabled per tableHidden change tables store captured changesOne change table per source table that is trackedRetention-based cleanup jobs

CDC APIs provide access to change dataTable valued functions and scalar functions provide access to change data and CDC metadataTVF allows the changes to be gathered for specific intervals enabling incremental population of DW

Change Tables

OLTP

Data Warehouse

10

Change Data Capture

Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.

demo

11

Merge StatementSingle statement can deal with Inserts, Updates, and Deletes

Microsoft extension to ANSI definition for DELETE semanticsPerformance goals:

20% fasterMinimal logging on inserts (2×)

Typical solution:Clean the source data, load it into Tbl_StagingIndex Tbl_StagingUPDATE Warehouse INNER JOIN Tbl_Staging ON…INSERT Warehouse LEFT JOIN Tbl_Staging ON…MERGE Warehouse FROM Tbl_Staging ON…

12

Merge ExampleMERGE dbo.Departments AS dUSING dbo.Departments_delta AS ddON (d.DeptID = dd.DeptID)WHEN MATCHED AND d.Manager <>

dd.Manager OR d.DeptName <> dd.DeptName

THEN UPDATE SET d.Manager = dd.Manager, d.DeptName = dd.DeptName

WHEN NOT MATCHED THEN INSERT (DeptID, DeptName, Manager) VALUES (dd.DeptID, dd.DeptName,

dd.Manager)WHEN NOT MATCHED BY SOURCE THEN DELETE

13

Merge Statement

Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.

demo

14

Lookup Transform EnhancementsScalable Cache Implementation

Cache-load is a separate operation to LookupCan be hydrated and dehydrated securely to the file systemCaches can be explicitly shared

Adaptable CachesCan be loaded from any source (SQL, Text, Mainframe…)Track cache hits and misses

Multiple ModesNo Cache (pass-through to DB)JIT – Just In Time (on miss, query database, and store result)Full-Cache (pre-load all rows)

15

Lookup Transform 2008

Shane BartlePrincipal ConsultantMicrosoft N.Z.

demo

16

Logging events to watch pipeline internalsPipelineExecutionPlan, PipelineExecutionTree, BufferSizeTuning

System Monitor to track I/O issuesBuffers In Use tracks how many buffers are presently being usedBuffers Spooled tracks how many 10Mb buffers have been spooled to disk

Superdump to resolve error scenariosNew support capability for reactive fix-up

SQL Server 2008 SSISMonitoring Enhancements

17

Superdump

Provides visibility into the activity of a running packageCan be triggered without stopping a packageCan be scheduled (via registry key) to run on a

CrashSpecific Error Condition

18

Superdump

Pat MartinPremier Field EngineerMicrosoft N.Z.

demo

19

C# SupportADO.Net SupportImproved Import Export Wizard

SQL Server 2008 SSISDeveloper Additions

20

2005 used VSA for Script Design and ExecutionLegacy component with Visual Basic onlyLimited set of “reference-able” assemblies

In 2008 SSIS uses Visual Studio Tools for ApplicationsVisual Studio Designer ShellC# (or VB.NET) as a languageCan reference all .net assembliesCan reference Web Services

C# Support

21

ADO.Net Support

SSIS 2005 had a ADO.NET ‘DataReader’Limited to supplying a SqlCommand

SSIS 2008 has a full ADO.NET Data SourceMuch Enhanced User InterfaceODBC Support

22

ADO.Net SupportData Type Conversions

New page shows mappings and possible issuesMay insert data convert transforms into dataflowDefault mappings are customisable (via Notepad)

New System for Scaling Up Number of TablesMake as sequence of dataflow tasks Each with 5 source/transform/destination chains

Import Export Wizard Additions

23

Import/Export Wizard

Shane BartlePrincipal ConsultantMicrosoft N.Z.

demo

24

Creates a Profile of Your SQL TablesExplore or maintain data qualityRun as a task in SSISProduces XML file outputHas a nice visual tool for working with profiles

Analyze a Set of Columns / Tables Looking ForCandidate keysColumn length distributionNull RatioPattern detectionValue distributions and statsFunctional dependenciesValue inclusion

SQL Server 2008 SSISData Profiling

25

Data Profiling

Pat MartinANZ SQL Premier Field EngineerMicrosoft N.Z.

demo

26

Session Summary

Performance Improvements for throughputNew delta extraction alternative approachesMonitoring and support featuresC# and ADO.Net extend coverageData Profiling for data quality considerations

28

Q & A

29

Resources

www.microsoft.com/teched Tech·Talks Tech·Ed BloggersLive Simulcasts Virtual Labs

http://microsoft.com/technet

Evaluation licenses, pre-released products, and MORE!

http://microsoft.com/msdn

Developer’s Kit, Licenses, and MORE!

Related ContentDAT361 SQL Server 2008 Security Deep Dive

BIN309 SQL Server 2008 ETL drill down

BIN310 SQL Server 2008 Analysis Server (SSAS) enhancements

DAT362 SQL Server Spatial in the Spotlight

BIN352 Microsoft SQL Server 2008 Reporting Services: Architecture Overview

BIN311 Advanced Dashboard Creation with MOSS 2007

DAT364 End-to-End Troubleshooting for Microsoft SQL Server 2005/2008

BIN401 Optimising Query Performance in SQL Server 2008 Analysis Services

DAT355 Upgrading to Microsoft SQL Server 2008: Notes from Early Adopters

BIN402 Building and Deploying Advanced MOSS 2007 Planning Applications

32

Please complete anevaluation

33

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED

OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

top related