pro sol selver 2012 integration selvices - springer978-1-4302-3693-1/1.pdf · pro sol selver 2012...
TRANSCRIPT
••
Pro SOL SelVer 2012 Integration SelVices
Francis Rodrigues Michael Coles David Dye
Apress
Pro SQL Server 2012 Integration Services
Copyright © 2012 by hancis Hodrigues, Michael Coles, and David Dye
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part ofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher's location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law.
ISBN 978-1-4302-3692-4
ISBN 978-1-4302-3693-1 (eBook)
Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
President and Publisher: Paul Manning Lead Editor: Jonathan Gennick Technical Reviewer: Rodney Landrum Editorial Board: Steve Anglin, Ewan Buckingham, Gary Cornell, Louise Corrigan, Morgan Ertel, Jonathan
Gennick, Jonathan Hassell, Robert Hutchinson, Michelle Lowman, James Markham, Matthew Moodie, Jeff Olson, Jeffrey Pepper, Douglas Pundick, Ben Renow-Clarke, Dominic Shakeshaft, Gwenan Spearing, Matt Wade, Tom Welsh
Coordinating Editor: Anita Castro Copy Editor: Sharon Wilkey Compositor: Bytheway Publishing Services Indexer: Dhaneesh Kumar Cover Designer: Anna Ishchenko
Distributed to the book trade worldwide by Springer Science+ Business Media New York, 233 Spring Street, 6th Floor, New York, NY 10013. Phone I-SOO-SPRINGER, fax (201) 34S-4505, e-mail [email protected], or visit www.springeronline.com.
For information on translations, please e-mail [email protected], or visit www.apress.com.
Apress and friends of ED books may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Special Bulk Sales-eBook Licensing web page at www.apress.com/bulk- sales.
Any source code or other supplementary materials referenced by the author in this text is available to readers at http://www.apress.com/9781430236924. For detailed information about how to locate your book's source code, go to www.apress.com/source- code.
This book is dedicated to my family and friends without whose support and encouragement the writing would not have been possible.
- Francis Rodrigues
Contents at a Glance
About the Authors ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• xvii
About the Technical Reviewer ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• xviii
• Chapter 1: Introducing Integration Services ...........................................•••••••••••••••• 1
• Chapter 2: BIDS and SSMS •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 11
• Chapter 3: Hello World-Your First SSIS 2012 Package ...................................... 43
• Chapter 4: Connection Managers .......................................................................... 83
Chapter 5: Control Flow Basics ........................................................................... 107
• Chapter 6: Advanced Control Flow lasks ........................................................... 163
• Chapter 7: Source and Destination Adapters ...................................................... 203
• Chapter 8: Data Flow lransformations •.............................................................. 245
• Chapter 9: Variables, Parameters, and Expressions .......................................... 325
• Chapter 10: Scripting .......................................................................................... 361
• Chapter 11: Events and Error Handling •••••••••••••••••••••••••...................................... 405
• Chapter 12: Data Profiling and Scrubbing .......................................................... 427
• Chapter 13: Logging and Auditing ...................................................................... 465
• Chapter 14: Heterogeneous Sources and Destinations ....................................... 487
• Chapter 15: Data Flow Tuning and Optimization ................................................ 511
• Chapter 16: Parent-Child Design Pattern ............................................................ 525
• Chapter 17: Dimensional Data ElL •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 543
• Chapter 18: Building Robust Solutions ............................................................... 561
• Chapter 19: Deployment Model ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 579
• Index ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 605
iv
Contents
• About the Authors ............................................................................................... xvii
About the Technical Reviewer ........................................................................... xviii
• Chapter 1: Introducing Integration Services ........................................................... 1
A Brief History of Microsoft Ell ......................................................................................... 1
What Can SSIS Do for You? ............................................................................................... 2
What Is Enterprise Ell? .................................................................................................... 3
SSIS Architecture ............................................................................................................... 5
New SSIS Features ............................................................................................................ 8
Our Favorite People and Places ......................................................................................... 9
Summary ......................................................................................................................... 10
• Chapter 2: BIDS and SSMS .................................................................................... 11
Sal Server Business Intelligence Development Studio ................................................... 11
Analysis Services Project ....................................................................................................................... 12
Integration Services Project ................................................................................................................... 14
Report Server Project Wizard ................................................................................................................. 15
Report Server Project ............................................................................................................................. 15
I mport Analysis Services Database ........................................................................................................ 16
Integration Services Project Wizard ....................................................................................................... 16
Report Model Project .............................................................................................................................. 16
Integration Services ......................................................................................................... 18
Project Files ............................................................................................................................................ 19
v
• CONTENTS
Tool Windows ......................................................................................................................................... 21
Designer Window .................................................................................................................................... 23
SQl Server Management Studio ...................................................................................... 33
Tool Windows ......................................................................................................................................... 33
SQl Server Management Studio Project ................................................................................................ 37
Templates ............................................................................................................................................... 37
Code Snippets ......................................................................................................................................... 39
Queries for SSIS ...................................................................................................................................... 42
Summary ......................................................................................................................... 42
• Chapter 3: Hello World-Your First SSIS 2012 Package ...................................... 43
Integration Services Project. ............................................................................................ 43
Key Package Properties .......................................................................................................................... 44
Package Annotations .............................................................................................................................. 45
Package Property Categories ................................................................................................................. 46
Hello World ...................................................................................................................... 47
Flat File Source Connection .................................................................................................................... 49
OLE DB Destination Connection .............................................................................................................. 53
Data Flow Task ....................................................................................................................................... 57
Real World ....................................................................................................................... 70
Control Flow ........................................................................................................................................... 70
Execute SQl Task ................................................................................................................................... 71
Data Flow Task ....................................................................................................................................... 72
Summary ......................................................................................................................... 81
• Chapter 4: Connection Managers .......................................................................... 83
Commonly Used Connection Managers ........................................................................... 83
OLE DB Connection Managers ................................................................................................................ 85
File Connection Managers ...................................................................................................................... 87
vi
CONTENTS
ADO.NET Connection Manager ............................................................................................................... 90
Cache Connection Manager .................................................................................................................... 92
Other Connection Managers ............................................................................................ 94
FTP Connection Manager ........................................................................................................................ 94
HTTP Connection Manager ..................................................................................................................... 96
MSOLAP100 Connection Manager .......................................................................................................... 98
DQS Connection Manager ....................................................................................................................... 99
MSMQ Connection Manager ................................................................................................................... 99
SMO Connection Manager .................................................................................................................... 100
SMTP Connection Manager .................................................................................................................. 100
SQlMOBllE Connection Manager ......................................................................................................... 101
WMI Connection Manager .................................................................................................................... 104
Summary ....................................................................................................................... 105
• Chapter 5: Control Flow Basics ........................................................................... 107
What Is a Control Flow? ................................................................................................. 107
SSIS Toolbox for Control Flow ....................................................................................... 108
Favorite Tasks ............................................................................................................... 110
Data Flow Task ..................................................................................................................................... 110
Execute SQl Task ................................................................................................................................. 111
Common Tasks .............................................................................................................. 119
Analysis Services Processing Task ...................................................................................................... 120
Bulk Insert Task .................................................................................................................................... 126
Data Profiling Task ............................................................................................................................... 130
Execute Package Task .......................................................................................................................... 134
Execute Process Task ........................................................................................................................... 138
File System Task .................................................................................................................................. 140
FTP Task ............................................................................................................................................... 141
Script Task ............................................................................................................................................ 144
vii
• CONTENTS
Send Mail Task ..................................................................................................................................... 147
Web Service Task ................................................................................................................................. 149
XML Task .............................................................................................................................................. 152
Precedence Constraints ................................................................................................. 155
Basic Containers ............................................................................................................ 157
Containers ............................................................................................................................................ 157
Groups .................................................................................................................................................. 158
Breakpoints .................................................................................................................... 159
Summary ....................................................................................................................... 161
• Chapter 6: Advanced Control Flow Tasks ........................................................... 163
Advanced Tasks ............................................................................................................. 163
Analysis Services Execute DDL Task .................................................................................................... 163
Data Mining Query Task ....................................................................................................................... 165
Message Queue Task ........................................................................................................................... 170
Transfer Database Task ........................................................................................................................ 175
Transfer Error Messages Task .............................................................................................................. 177
Transfer Jobs Task ............................................................................................................................... 180
Transfer Logins Task ............................................................................................................................ 182
Transfer Master Stored Procedures Task ............................................................................................. 184
Transfer SQL Server Objects Task ........................................................................................................ 186
WMI Data Reader Task ......................................................................................................................... 190
WMI Event Watcher Task ...................................................................................................................... 192
Advanced Containers ..................................................................................................... 194
For Loop Container ............................................................................................................................... 194
Foreach Loop Container ........................................................................................................................ 196
Task Host Controller ............................................................................................................................. 202
Summary ....................................................................................................................... 202
viii
• CONTENTS
• Chapter 7: Source and Destination Adapters ...................................................... 203
The Data Flow ................................................................................................................ 203
Sources and Destinations .............................................................................................. 205
Source Assistant ............................................................................................................ 205
Destination Assistant ..................................................................................................... 212
Database Sources and Destinations .............................................................................. 217
OLE DB .................................................................................................................................................. 218
ADO.NET ............................................................................................................................................... 226
SQl Server Destination ......................................................................................................................... 226
SQl Server Compact. ............................................................................................................................ 226
Files ............................................................................................................................... 226
Flat Files ............................................................................................................................................... 227
Excel Files ............................................................................................................................................. 233
Raw Files .............................................................................................................................................. 242
XMl Files .............................................................................................................................................. 243
Special-Purpose Adapters ............................................................................................. 243
Analysis Services ........................................................................................................... 244
Summary ....................................................................................................................... 244
• Chapter 8: Data Flow Transformations ............................................................... 245
High-Level Data Flow ..................................................................................................... 245
Types of Transformations .............................................................................................. 246
Synchronous Transformations .............................................................................................................. 247
Asynchronous Transformations ............................................................................................................ 247
Blocking Transformations ..................................................................................................................... 248
Row Transformations .................................................................................................... 249
Data Conversion ................................................................................................................................... 249
Character Map ...................................................................................................................................... 254
ix
- CONTENTS
Copy Column ......................................................................................................................................... 257
Derived Column .................................................................................................................................... 259
Import Column ...................................................................................................................................... 262
OLE DB Command ................................................................................................................................. 265
Export Column ...................................................................................................................................... 269
Script Component ................................................................................................................................. 271
Rowset Transformations ................................................................................................ 280
Aggregate ............................................................................................................................................. 281
Sort ....................................................................................................................................................... 283
Pivot ...................................................................................................................................................... 287
Percentage Sampling ........................................................................................................................... 289
Row Sampling ...................................................................................................................................... 291
Unpivot ................................................................................................................................................. 293
Splits and Joins ............................................................................................................. 297
Lookup .................................................................................................................................................. 297
Cache Transformation .......................................................................................................................... 303
Conditional Split ................................................................................................................................... 309
Multicast ............................................................................................................................................... 312
Union All ............................................................................................................................................... 313
Merge ................................................................................................................................................... 314
Merge Join ............................................................................................................................................ 316
Auditing ......................................................................................................................... 319
Row Count ............................................................................................................................................ 319
Audit ..................................................................................................................................................... 321
Business Intelligence Transformations .......................................................................... 323
Summary ....................................................................................................................... 324
• Chapter 9: Variables, Parameters, and Expressions ........................................... 325
What Are Variables and Expressions? ........................................................................... 325
x
• CONTENTS
What Are Parameters? .................................................................................................. 328
SSIS Data Types ............................................................................................................. 331
Variable Scope, Default Values, and Namespaces ........................................................ 334
Scope .................................................................................................................................................... 334
Default Values ....................................................................................................................................... 337
Namespaces ......................................................................................................................................... 337
System Variables ........................................................................................................... 337
Package System Variables ................................................................................................................... 338
Container System Variable ................................................................................................................... 339
Task System Variables ......................................................................................................................... 339
Event Handler System Variables ........................................................................................................... 340
Accessing Variables ....................................................................................................... 342
Parameterized Queries ......................................................................................................................... 343
Derived Column Transformations ......................................................................................................... 344
Conditional Splits .................................................................................................................................. 345
Recordset Destinations ......................................................................................................................... 346
Foreach loop Containers ...................................................................................................................... 348
Script Tasks .......................................................................................................................................... 350
Execute SQl Task Result Sets .............................................................................................................. 352
Source Types ........................................................................................................................................ 353
Dynamic Sal ................................................................................................................. 354
Passing Variables .......................................................................................................... 356
SSIS Expression language ............................................................................................ 357
Functions .............................................................................................................................................. 357
Operators .............................................................................................................................................. 359
Summary ....................................................................................................................... 360
xi
• CONTENTS
• Chapter 10: Scripting .......................................................................................... 361
Script Task ..................................................................................................................... 361
Advanced Functionality .................................................................................................. 366
Script Component Source .............................................................................................. 375
Synchronous Script Component Transformation ........................................................... 383
Asynchronous Script Component Transformation ......................................................... 388
Script Component Destination ....................................................................................... 396
Summary ....................................................................................................................... 403
• Chapter 11: Events and Error Handling ............................................................... 405
SSIS Events .................................................................................................................... 405
Logging Events .............................................................................................................. 407
Script Events .................................................................................................................. 418
Script Task Events ................................................................................................................................ 418
Script Component Events ..................................................................................................................... 421
Event Handlers ............................................................................................................... 423
Summary ....................................................................................................................... 425
• Chapter 12: Data Profiling and Scrubbing .......................................................... 427
Data Profiling ................................................................................................................. 427
Data Profiling Task ............................................................................................................................... 428
Data Profile Viewer ............................................................................................................................... 433
Column Length Distribution Profile ....................................................................................................... 436
Column Null Ratio Profile ...................................................................................................................... 438
Column Pattern Profile .......................................................................................................................... 440
Column Statistics Profile ...................................................................................................................... 443
Column Value Distribution Profile ......................................................................................................... 445
Candidate Key Profile ........................................................................................................................... 447
xii
CONTENTS
Functional Dependency Profile ............................................................................................................. 450
Fuzzy Searching ............................................................................................................ 452
Fuzzy Lookup ........................................................................................................................................ 452
Fuzzy Grouping ..................................................................................................................................... 458
Data Previews ................................................................................................................ 460
Data Viewer .......................................................................................................................................... 460
Data Sampling ...................................................................................................................................... 462
Summary ....................................................................................................................... 464
• Chapter 13: Logging and Auditing ...................................................................... 465
logging .......................................................................................................................... 465
Enabling Logging .................................................................................................................................. 466
Choosing Log Events ............................................................................................................................ 470
On SaL Logging .................................................................................................................................... 471
Summary Auditing ......................................................................................................... 472
Batch-Level Auditing ............................................................................................................................ 473
Package-Level Auditing ........................................................................................................................ 478
Adding Auditing to Packages ................................................................................................................ 480
Simple Data lineage ...................................................................................................... 481
Summary ....................................................................................................................... 486
• Chapter 14: Heterogeneous Sources and Destinations ••••••••••••••••••••••••••••••••••••••• 487
SQl Server Sources and Destinations ........................................................................... 487
Other RDBMS Sources and Destinations ....................................................................... 494
Flat File Sources and Destinations ................................................................................ 495
Excel Sources and Destinations .................................................................................... 498
XMl Sources .................................................................................................................. 502
Raw File Sources and Destinations ............................................................................... 504
xiii
• CONTENTS
SQl Server Analysis Services Sources .......................................................................... 506
Recordset Destination .................................................................................................... 508
Summary ....................................................................................................................... 509
• Chapter 15: Data Flow Tuning and Optimization ................................................ 511
limiting Rows at the Database ...................................................................................... 511
Performing Joins in the Database ................................................................................. 515
Sorting in the Database ................................................................................................. 516
Performing Complex Preprocessing at the Database .................................................... 516
Ensuring Security and "Read Auditing" ......................................................................... 517
Pulling Too Many Columns ............................................................................................ 517
Using Execution Trees ................................................................................................... 518
Implementing Parallelism .............................................................................................. 522
Summary ....................................................................................................................... 523
• Chapter 16: Parent-Child Design Pattern ............................................................ 525
Understanding the Parent-Child Design Pattern ............................................................ 525
Using Parameters to Pass Values .................................................................................. 527
Working with Shared Configuration Information ........................................................... 530
Overriding Properties ..................................................................................................... 530
logging .......................................................................................................................... 531
Implementing Data-Driven ETl ...................................................................................... 531
Summary ....................................................................................................................... 542
• Chapter 17: Dimensional Data Ell ...................................................................... 543
Introducing Dimensional Data ....................................................................................... 543
Creating Quick Wins ...................................................................................................... 546
Run in Optimized Mode ........................................................................................................................ 546
xiv
• CONTENTS
Remove "Dead-End" Components ....................................................................................................... 547
Keep Package Size Small ..................................................................................................................... 548
Optimize Lookups ................................................................................................................................. 549
Keep Your Data Moving ........................................................................................................................ 549
Minimize Logging ................................................................................................................................. 549
Use the Fast Load Option ...................................................................................................................... 550
Understanding Slowly Changing Dimensions ................................................................ 550
Type 0 Dimensions ............................................................................................................................... 550
Type 1 Dimensions ............................................................................................................................... 550
Type 2 Dimensions ............................................................................................................................... 556
Type 3 Dimensions ............................................................................................................................... 558
Summary ....................................................................................................................... 559
• Chapter 18: Building Robust Solutions ............................................................... 561
What Makes a Solution Robust ...................................................................................... 561
Resilience ...................................................................................................................... 562
Data Flow Task ..................................................................................................................................... 562
Event Handlers ..................................................................................................................................... 572
Dynamism ...................................................................................................................... 573
Accountability ................................................................................................................ 574
Log Providers ........................................................................................................................................ 574
Custom Logging .................................................................................................................................... 575
Summary ....................................................................................................................... 577
• Chapter 19: Deployment Model ........................................................................... 579
The Build Process .......................................................................................................... 579
The Deployment Process ............................................................................................... 581
Environments ................................................................................................................. 588
Execution ....................................................................................................................... 594
xv
• CONTENTS
The Import Process ........................................................................................................ 601
The Migration Process ................................................................................................... 601
Summary ....................................................................................................................... 604
• Index ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 605
xvi
About the Authors
• Francis Rodrigues is a computer science graduate of the Loyola University in Maryland. He is also an alumnus of Regis High School, a Jesuit school in New York City. He currently works as a business intelligence consultant based out of New York City. He is an expert developer of enterprise business intelligence projects. His specialties include extract, transform, and load (ETL) solutions based on SQL Server and SQL Server Integration Services (SSIS). In his spare time, he can be found mountain biking in various locations in the New York area .
• Michael Coles has more than a decade's experience designing and administering SQL Server databases. A prolific writer of articles on all aspects of SQL Server, particularly on the expert use ofT -SQL, he holds MCDBA and MCP certifications. He graduated magna cum laude with a bachelor of science degree in information technology from American Intercontinental University in Georgia. A member of the United States Army Reserve, he was activated for two years following 9/11.
• David Dye is a Microsoft SQL Server MVP, instructor, and author specializing in relational database management systems, business intelligence systems, reporting solutions, and Microsoft SharePoint. For the past 9 years David's expertise has been focused on Microsoft SQL Server development and administration. His work has earned him recognition as: a Microsoft MVP in 2009 and 2010, a moderator for the Microsoft Developer Network for SQL Server forums, Innovator of the Year runner-up in 2009 by SQL Server Magazine, and in the Training Associates Technical Trainer Spotlight in April 2011. David currently serves as a technical reviewer and co-author with APress Publishing in the SQL Server 2012 series, and as an author with Packt Publishing.
xvii
xviii
About the Technical Reviewer
• Rodney Landrum has worked with SQL Server longer than he can remember. He writes regularly about technologies, including Integration Services, Analysis Services, and Reporting Services. He has authored SQL Server Tacklebox and three Reporting Services books. He contributes regularly to SQLServerCentral, SQL Server Magazine, and Simple-Talk. His day job involves overseeing a large SQL Server infrastructure in Orlando. He swears he owns the phrase "Working with Databases on a Day to Day Basis". Anyone who disagrees is itching to lose an arm wrestling match.