sql server mvp deep dives - gbvcontents simil: analgorithmto lookforsimilarstrings 200...
TRANSCRIPT
SQL Server MVP
Deep Dives
Edited by Paul Nielsen Kalen Delaney Greg Low
Adam Machanic Paul S. Randal Kimberly L. Tripp
ii
MANNING
Greenwich
(74° w. long.)
contents
preface xxxv
acknowledgments xxxviii
about War Child xxxix
about this book xli
about SQL Server MVPs xliii
:;W,\i % Database design and architecture 1Eihtkd by Paul Nielsen
Louis and Paul's 10 key relational database design ideas 3
Paul Nielsen and Louis Davidson
1. Denormalization is for wimps 3
2. Keys are key 4
3. Generalize, man! 5
4. Class <> table 5
5. Data drives design 6
6. Sets good, cursors bad 6
7. Properly type data 7
8. Extensibility through encapsulation 7
9. Spaghetti is food, not code 7
10. NOLOCK = no consistency 8
Summary 9
xiii
xiv CONTENTS
SQL Server toolsfor maintaining data integrity 11
Louis Davidson
Protection tools 12
Data types 13 • NULL specification 13 * Uniquenessconstraints 14 * Filtered unique indexes 15 « Foreign keyconstraints 16 Check constraints 17 * Triggers 20
When and why to use what tool 24
Summary 26
Findingfunctional dependencies 28
Hugo Kornelis
Interview method 28
Modeling the sales order 29
Firststep:findingsingle-attribute dependencies 31 * Second step:findingtwo-attribute dependencies 36 * Further steps: three-and-more-attribute
dependencies 39* Wtiat ifI have some independent attributes left? 39
Summary 40
KiaBBoeiatsaBBSiaaniaiaiJraBranHia m a m n n nart 2 Database Development
Edited by Adam Machanic
Set-based iteration, the third alternative 43
Hugo Kornelis
The common methods and their shortcomings 43
Declarative (set-based) code 44 * Iterative (cursor-based) code 45
Set-based iteration 46
The most basicform 46* Running totals 47* Bin packing 51
Summary 58
Gaps and islands 59
Itzik Ben-Gan
Description of gaps and islands problems 59
Sample data and desired results 60
Solutions to gaps problem 61
Gaps—solution 1 using subqueries 62 Gaps—solution 2 usingsubqueries 63 * Gaps—solution 3 using rankingfunctions 64
Gaps—solution 4 using cursors 64 » Performance summary forgaps solutions 65
CONTENTS xv
Solutions to islands problem 66
Islands—solution 1 using subqueries and ranking calculations 66
Islands—solution 2 using group identifier based on subqueries 67
Islands—solution 3 usinggroup identifier based on rankingcalculations 68* Islands—solution 4 using cursors 69
Variation on the islands problem 69 * Performance summary
for islands solutions 71
Summary 71
Error handling in SQL Server and applications 73
Bill Graziano
Handling errors inside SQL Server 73
Returning information about the error 74 * Generate your own
errors usingRAISERROR 76 « Nesting TRY...CATCHblocks 77 • TRY... CATCH and transactions 79
Handling SQL Server errors on the client 80
Handling SQJ. Serve)- messages on the client 82
Summaiy 84
|> Pulling apart the FROM clause 86f
Rob Farley
JOIN basics 86
The INNERfOIN 86 * The OUTERJOIN 87 * The CROSS
JOIN 88
Formatting your FROM clause 89
A sample query 89 The appearance ofmost queries 90
When the pattern doesn't apply 90 How to read a FROM
clause 91 When the pattern can't apply 91 Writing the. FROM
clause clearly the first time 92
Filtering with the ON clause 92
The different filters ofthe SELECTstatement 93* Filtering outthe matches 93
JOIN uses and simplification 94
The four uses ofJOINs 94
Simplification using views 96
HowJOIN uses affect you 100
Summary 101
CONTENTS
Whatmakes a bulk insert a minimally logged operation? 102
Denis Gobo
Recovery and locking 102
Creating the file to import 104
Creating the tables to store the data 105
Importing the data 106
Summary 109
Avoiding three common query mistakes 111
Kathi Kellenberger
NULL comparisons 111
Multiple OUTERJOINS 113
Incorrect GROUP BY clauses 116
Summary 117
Introduction to XQuery on SQL Server 119
Michael Coles
What is XQuery? 119
How XQuery sees your XML 122
Querying XML 124
FLWOR expressions 127
XQuery comparison operators 129
XML indexes and XQuery performance 132
Summary 132
SQL Server XMLfrequently asked questions 133
Michael Coles
XML basics 133
What's XML? 133 * What's "well-formed"XML? 135
Wliat's the prolog? 136 What's an entity? 137
Wlial's a DTD? 137
The xml data type 138
Why does SQL Saver remove the DTDfrom my XML data? 138
How do Ipreserve whitespace in my XML? 139 Why am Igettingstrange characters in my XML? 139 How do I query XML
data? 141 « How do I query a single valuefrom my XMLdata? 141 How do I shred XML data ? 142
CONTENTSxvii
Advanced query topics 144
How do I specify an XML namespace in my XQuery queries'? 145
How do I get all element names and values from my XMLdocument? 146' How do I load XML documents from the
filesystem? 147
Summary 149
Understanding before coding 150
The concept 150* The logical model 152* The physicalmodel 154 * The database 155 * The XML Schema 155
Enabling and maintaining the data flow 160
Preparing the inbound data flow 160* Importing the data 163
Exporting the data 171
Preparing the sample data 173
Homework 174
Summary 175
Foundations of full-text searching 176
Creating and maintaining catalogs 177
Creating and maintaining full-text indexes 178
Creating thefull-text index 178 * Maintainingfull-text indexes 180
Querying full-text indexes 181
Basic searches 181 - FORMSOF 182 Phrases, NEAR, OR, and
prefixed terms 183 • Ranking 184
Custom thesaurus and stopwords 186
Custom thesaurus 186 * Stopwords and stofAists 189
Useful system queries 192
Basic queries to discover what catalogs, indexes, and columns
exist 192 * Advanced queries 194 • The keywords 196
Summary 199
JJP Using XML to transport relational data 150i»j , -
,_
Matija Lah
Full-text searching 176
Robert C. Cain
CONTENTS
Simil: an algorithm to lookfor similar strings 200
Tom van Stiphout
Equals (=) and LIKE 201
SOUNDEX and DIFFERENCE 201
CONTAINS and FREETEXT 202
Simil 204
Algorithm 205 Implementation in .NET 206
Installation 206 » Usage 207 Testing 208
Summary 209
LINQ to SQL andADO.NET Entity Framework 210
Bob Beauchemin
LINQ to SQL and performance 211
Generating SQL that uses projection 212
Updating in the middle tier 214
Optimizing the number of database round trips 215
LINQ to SQL and stored procedures 217
Tuning and LINQ to SQL queries 218
Summary 219
Table-valued parameters 221
DON Kiely
What's the problem? 221
Table-valued parameters to the rescue! 222
Another TVP example 225
Using TVPs from client applications 226
Using a DataTable 227 Using a DbDataReader 228
Using TVPs to enter orders 229
Summary 232
Build your own index 234
Erland Sommarskog
The database and the table 234
Plain search and introducing tester_sp 235
Using the LIKE operator—an important observation 236
Using a binary collation 237
CONTENTS
Fragments and persons 237
Thefragments^persons table 237 Writing the search
procedure 238 Keeping the index and the statistics
updated 241' Wiat is the overhead? 242
Fragments and lists 243
Building the lists 244 « Unwrapping the lists 244
Thefragments_personlists table 245 « Loading the table 245
A search procedure 248 » Keeping the lists -updated 249
Using bitmasks 250
The, initial setup 250 Searching with the bitmask 251
Adapting the bitmask to the data 251 « Performance and
overhead 252 » The big bitmask 253
Summary 253
Getting and staying connected—or not 255
William Vaughn
What is SQL Server? 255
Understanding the SQL Server Browser service 256
Diagnosing a connectivity problem 257
Testingfor network availability 257 * Managing the SQL Server
instance state 258 • Finding visible SQL Server instances 259
What is a connection? 261
To connect or not to connect... 263
Connection management 264
Connection strategies 265
Establishing a connection 266
The server key 268
Trusted or untrusted security? 269
Using trusted or integrated security 270 ' ASP.NET
considerations 270 ' Using SQL Server authentication 271
Accepting user login credentials—or not 272
Accessing user instances 272
Connection pooling 273
Closing the connection 275
Summary 275
CONTENTS
Extending yourproductivity in SSMS and Query Analyzer 277
Pawel Potasinski
Custom keyboard shortcuts 277
Creating your custom utility to use with
keyboard shortcuts 279
Some ideas for utilities to implement 282
Summary 282
Why every SQL developer needs a tools database 283
Denis Gobo
What belongs in the tools database? 283
Creating the tools database 284
Using an auxiliary table of numbers 284
Generating a calendar on thefly 284 * Splitting strings with a numbers
table 286
Placing common code in the tools database 288
Formatting 289
Calling code from a different database 290
Summary 290
Deprecationfeature 291
Cristian Lefter
A simple usage example 292
Methods of tracking deprecated features 293
Summary 296
Placing SQL Server in yourpocket 297Christopher Fairbairn
Design goals 297
Architecture 298
Deployment 299
Deploying on a desktop 299 Deploying on a device 300
XCOPY deployment 300
Tool support 300
Programming support 302
Summary 303
CONTENTS xxi
Mobile data strategies 305^teW^^fM^' -r.
John Baird
Microsoft Sync Framework (MSF) 306
Client synchronization provider 306 •> Sewer synchronization
provider 306 » Synchronization adapter 307
Synchronization agent 308 ' Using MSF 310
Comparison 316
Summary 316
rt 3 Database Administration..
3'
Edited by Paul S. Randal
AND KlMBKKLY L. I lUI'P
- < What does it mean to be a DBA ? 321,v
brad M. McGehee
Typical DBA tasks: A to Z 321
DBA specialties 327
Summary 328
Working with maintenance plans 330
Tibor Karaszi
What is a maintenance plan, and how do I create one? 330
Versions and service packs 331 One or several schedules per
plan ? 331 • Wizard dialogs 332 Task, types 332
Check database integrity task 333 * Shrink database task 333
Reorganize index task 334 Rebuild index task 335
Update statistics task 335 • History cleanup task 336
Execute SQL Sewer Agentjob task 338 * Back up database
task 338 Maintenance cleanup task 340 * Select reportingoptions 341 Execute T-SQL statement task 341
Executing and monitoring the plan 342
Summary 343
PowerShell in SQL Server 344
Richard Siddaway
PowerShell overview 344
PowerShellfeatures 344 PowerShell issues and solutions 345
PowerShell with SQL Server 2000 and 2005 346
Data access 346 » Administration 347
yVv,,J
!SP
xxii CONTENTS
PowerShell with SQL Server 2008 348
SQLPS 348 Provider 349 « Cmdlets 350
Summary 351
Automating SQL Server Management using SMO 353
Allen White
Loading required libraries 354
Backup 355
Restore 357
Creating a database 359
Scripting 362
Summary 364
Practical auditing in SQL Server 2008 365
Jasper Smith
Overview of audit infrastructure 365
Server audit objects 365 Sewer audit specification objects 366
Database audit specification objects 366
Server audits 366
Configuring the, Windows Security Log target 367 * Creating a server
audit using the Windows Security Log target 368
Creating a security audit using the Windoivs Application Logtarget 370 ' Configuring a server audit using the File target 371
Server audit specifications 372
Creating server audit specifications 372 Viewingaudit events 374
Database audit specifications 376
Creating database audit specifications 376
Summary 380
Aaron Bertrand
What is so great about DMVs, anyway? 381
A brief list of my favorite DMVs and DMFs 383
sys.dm_os_sys_info 383 » sys.dm_exec_sessions,
sys.dm_exec_requests, and sys.dm_exec_connections 383
sys.dm_exec_sql__texl 384 sys.dm_exec_query_stals 384
Myfavorite DMVs, and why 381
CONTENTS xxiii
sys. dm_exec_procedure_stats 384
sys.dm_db_index_usage_stats 385
sys.dm_db_missing_index_delails, sys.dm_db_missing_index_groups,and sys. dm_db_missing_index_group_stats 385
Honorable mentions 385
sys. dm_os_perjoimance._coun.ters 385
sys.d.m_db_partilion_stats 386
sys. dm_db_index_physical_stats 386
sys.dm_sql_referenced_entities 38 7
Setting up a utility database 387
Some interesting applications of my favorite DMVs 390
A mare refined sp_who or sp_who2 390 Getting statisticsfor stored
procedures (SQL Server 2008 only) 393 Finding unused stored
procedures (SQL Server 2008 only) 394 Finding inefficient andunused indexes 396 Finding inefficient queries 398 Finding
missing indexes 399
DMV categories in SQL Server 401
Summary 402
Reusing space in a table 403
Joe Webb
Understanding how SQL Server automatically reuses
table space 403
Recognizing when SQL Server does not reclaim space 407
Using DBCC CLEANTABLE to reclaim unused table
space 410
Summary 411
Some practical issues in table partitioning 413
Ron Talmage
Table partitioning dependencies 413
Manipulating partitioned data 414
How the partition function works 415
Drilling down: using SPLIT and MERGE 417 Drilling down: usingSWITCH 418
The key: avoiding data movement 419
Sources for more informadon 419
Summary 419
CONTENTS
Partitioningfor manageability (and maybe performance) 421
Dan Guzman
Overview 421
Flow to partition 423
Planning and design considerations 424
Gotchas and tips 429
Boundary time values 429 SPLIT and MERGE
performance 430 « Update statistics after SWITCH 430
Sharedpartitionfunctions and schemes 430
Summary 430
Efficient backups without indexes 432
Greg Linwood
It's OK to not back up nonclustered indexes! 432
A simple example 433
Default table and index storage, behavior 433 * Adding a dedicated
filegroupfor nonclustered indexes 435 * Moving nonclustered indexes
into the neivfilegroup 435
Backing up only the PRIMARY filegroup 436
Restoring the PRIMARY filegroup backup 436
Restoringfor extraction only 437 ' Restoringfor productionuse 437 Restoringforproduction use—step) by step 439
usp_Manage_NCIX_Filegroup 442
Planning before moving NCIXs into a dedicated
filegroup 445
Moving NCIXs temporarily requires additional disk space 445
Moving NCIXs creates empty space in PRIMARYfilegroup 445
Log shipping 446
Summary 44V
Using database mirroring to become a superhero! 449
Glenn Berry
Why should I use database mirroring? 449
How does database mirroring work? 449
How do you set up database mirroring? 450
How do you prepare the mirror? 453
Using database mirroring for routine maintenance 456
CONTENTSXXV
Using database mirroring to upgrade to SQL Server 2008 458
Using database mirroring to move data seamlessly 459
Case study of moving data with database, m irroring 459
Lessons learned from case study 460
Sum maty 461
The poor man's SQL Server log shipping 463Edwin Sarmiento
Creating the T-SQL script 465
Creating a cleanup script 465
Creating a batch file 466
Improving the log shipping process 473
Summary 473
Understated changes in SQL Server 2005 replication 475Paul Ibison
Undocumented or partially documented changesin behavior 475
Reading the text of hidden replication stored procedures 476
Creating snapshots without any data—only the schema 476Some changed replication defaults 478
More efficient methodologies 481
Remove redundant pre-snapshot and post-snapshot scripts 481
Replace merge -KXCHANGETYPE parameters 482
Summary 483
High-performance transactional replication 484Hilary Cotter
Performance kiss of death factors in transactional
replication 485
Batch updates 485 Replicating text 488 Logging 489
Network latency 490 • Subscriber hardware 491 Subscriber indexesand triggers 491 Distributor hardware 492
Large numbers ofpush subscriptions 492
Optimal settings for replication 492
CommilBalchSize and Comm.ilBatchThreshold 493 * Updateproc 494 SubscriptionStreams 495
Summary 495
xxvi CONTENTS
Successfully implementing Kerberos delegation 496
Scott Stautfer
Understanding the issues that Kerberos delegationresolves 497
The double hop 497 • A generic infrastructure—our business
challenge 497
Understanding Kerberos delegation 498
Service principle names 498 * Constrained delegation 501
Implementing Kerberos delegation step by step 503
Configuring the Active Directory 503 « Configuring the client
tier 505 » Configuring the web tier 508 * Configuring the
data tier 510
Validating delegation from end to end 512
Downloading and installing DelegConfig 512 > Running DelegConfigfrom the web tier 512* RunningDelegConfigfrom the client tier 512
Resources to assist in more complex infrastructures 512
Summary 517
i Running SQL Server on Hyper-V 518ru~
jOHN PAUL cOOK
Visualization architecture 519
Benefits of isolation 520
Configuring virtual machines 521
Configuring disks 521 CPUconfiguration 523
Configuring networking 524 * Memory configuration 525
Addressing clock drift issues 525
Backup considerations 525
Advantages of physical to virtual migration 526
Test environments and visualization 526
Summary 527
it 4 Performance Tuning and Optimization 529Edited by Kalen Delaney
When is an unused index not an unused index? 531Rob Farley
Overview of indexing 531
Unused indexes 532
CONTENTSxxvii
jj§,
,
Unused indexes that are actually used 535
How is the unused index being used? 538
How does this affect me? 539
Summary 539
Speeding up your queries with index covering 541Alex Kuznetsov
Index covering speeds up selects 541
Some rules of thumb about indexes aren't true for coveringindexes 545
Covering indexes usually slow down modifications 545
One index should cover many queries 546
One index can both cover queries and implementuniqueness 547
Summary 547
Tracing the deadlock 549Gail Shaw
What's a deadlock? 549
Causes of deadlocks 550
Deadlock graph 550
Traceflag 1204 550 - Traceflag 1222 551 « SQLProfiler 551 Reading the deadlock graph 551 • Theprocesslist 552 ' The resource list 555
The big picture 556
Summary 556
How to optimize tempdb performance 558
Brad M. McGehee
What is tempdb used for? 558
Tempdb internals 559
How DBAs can help optimize tempdb 559
Minimizing the use of tempdb 560
Preallocating tempdb space and avoiding use of autogrowth 561Don't shrink tempdb ifyou don 'l need to 562 Dividing tempdb amongmultiplephysicalfiles 562 Moving tempdb to a disk separatefrom yourother databases 563 Locating tempdb on afast I/Osubsystem 563 Adding RAM to your SQL server
CONTENTS
instance 564 Using SQL Server 2008 transparent data
encryption 564 Leaving auto create statistics and auto
update statistics on 564 Verifying CHECKSUMfor
SQL Server2008 564
Summaiy 565
Does the order ofcolumns in an index matter? 566
Joe Webb
Understanding the basics of composite indexes 566
Finding a specific row 567
Finding a last, name 569
Finding a first name 570
Summaiy 574
Correlating SQL Profiler with PerfMon 575
Kevin Kline
What vexes you? 576
Getting started with PerfMon and Profiler 576
Best practices using PerfMon 576 * Best practices usingProfiler 578 • A correlated view ofperformance 580
Summary 581
Using correlation to improve query performance 583
Gert-Jan Strik
The purpose of the optimizer 583
Correlation with the clustered index 584
Low correlation 584
When the optimizer does it right 585
When the optimizer does it right again 586
When the optimizer gets it wrong 587
Correcting the optimizer 587
When to expect correlation 588
Determining correlation 588
Summary 589
CONTENTS xxix
f How to use Dynamic Management Views 590Glenn Berry
Why should I use DMV queries? 590
Setting up security to run DMV queries 590
Looking at top waits at the instance level 591
Looking for CPU pressure and what's causing it 592
Finding I/O pressure in SQL Server 594
SQL Server memory pressure 597
SQL Server index usage 600
Detecting blocking in SQL Server 603
Summary 604
Query performance and disk I/O counters 606
Expensive 1/Os and very expensive 1/Os 606
Disk performance counters 607
Random or sequential I/Os and disk performancecounters 608
SQL Server operations and I/O sizes 610
How expensive are small random I/Os, anyway? 611
Performance scenarios 613
Scenario 1: constant checkpoints 613 Scenario 2: NOLOCK and
faster query processing 614 Scenario 3: read-ahead reads 615
Scenario 4: index fragmentation 616
Summary 618
ff2| XEVENT: the next event infrastructure 619
Cristian Lefter
Extended Events infrastructure characteristics 619
XEVENT architecture 622
SQL Server Extended Events engine 622 « Packages 623
Events 624 Targets 624 • Actions 625
Predicates 625 Types and maps 625 Sessions 626
XXX CONTENTS
XEVENT in action 626
Usage scenarios 626 System health session 628
Performance considerations 629
Summary 630
Part 5 Business intelligence .631Edited by Grko Low-
Business intelligence overview 633
Terminology 634
Really, what is so different? 635
Approach 636
Dimensional modeling 637
Cubes, anyone? 638
Microsoft BI stack 640
Flow do I get started? 640
Summary 641
Unlocking the secrets ofSQL Server 2008Reporting Services 642-x-*' '
William Vaughn
Why should developers care about Reporting Services? 642
What is Reporting Services? 643
Using Visual Studio to create an RDL report 645
Using the Visual Studio 2008 Report Designer 649
Managing report parameters 650 Deploying your report 651
Using the Report Manager 652
Using the Visual Studio MicrosoftReportViewer control 653
What's in Reporting Services 2008 for developers? 654
Virtual directory changes 655 Using SQL Server Reporting Services-
Configuration Manager 655 * Exporting reports 656
Enabling My Reports 656
Working with the Report Designer 658
Summary 659
CONTENTS xxxi
Reporting Services tips and tricks 660Bruce Loehle-Conger
Performance tips 660
Filters versus query parameters 660 • Linked servers 660
Drillthrough instead ofdrill-down 661 Data expo rt, 662
Connection pooling 662
Design dps 663
Stored procedures and temp tables 663 * Excel merged cell
solution 663 0 Excel web queries and reports 664 « HTML or
Word documents 667 Serverportability 667 • Embedding T-SQLin a report 668 * UserfUserlD 669
Summary 669
SQL Server Audit, change tracking, and changedata capture 670
Aaron Bertrand
What are these solutions used for? 670
What do people do now? 671
How does SQL Server 2008 solve these problems? 672
SQL Server Audit 673 " Change tracking 676
Change data capture 681
Comparison of features 685
Summary 686
Introduction to SSAS 2008 data mining 687
Dejan Sarka
Data mining basics 688
Data mining projects 689 " Data overview and preparation 690
SSAS 2008 data mining algorithms 690
Creating mining models 691
Harvesting the results 693
Viewing the models 694 Evaluating the models 695
Creating prediction queries 697
Sources for more information 698
Summary 698
CONTENTS
To aggregate or not to aggregate—is there really a question ? 700
Erin Welker
What are aggregations? 700
Designing aggregations 701
Influencing aggregations 702
Attribute relationships 704
Usage-based optimization 705
High-level approach 707
Other considerations 707
Summary 708
Incorporating data profiling in the ETL process 709
John Welch
Why profile data? 709
Introduction to the Data Profiling task 711
Types ofprofiles 711 * Input to the task 712' Output from the
task 713* Constraints ofthe Data Profiling task 715
Making the Data Profiling task dynamic 716
Changing the database 716* Altering the profile requests 716
Setting the ProfiklnpiitXml properly 717
Making data-quality decisions in the ETL 719
Excluding data based on quality 719* Adjusting rules
dynamically 719
Consuming the task output 720
Capturing the output 720 Using SSIS XMLfunctionality 721
Using scripts 723 * Incorporating the values in the package 723
Summary 724
Expressions in SQL Server Integration Services 726Matthew Roche
SSIS packages: a brief review 726
Expressions: a quick tour 727
Expressions in the control flow 729
Expressions and variables 732
Expressions in the data flow 734
Expressions and connection managers 737
Summary 741
CONTENTS xxxiii
SSIS performance tips 743
Phil Brammer
SSIS overview 743
Control flow performance 743
Data flow performance 744
Source acquisition performance 744 * Data transformationperformance 745 Destination performance 746 * Lookuptransformation performance 747 General dataflow performance 748
Summary 749
Some definitions 750
A T-SQL incremental load 751
Incremental loads in SSIS 754
Creating the new BIDS project 754 Defining the lookuptransformation 755 * Setting the lookup transformation behavior 757
Summary 760
index 763
Incremental loads using T-SQL and SSIS 750
Andy Leonard