sql server for developers - sql explore 2012mscom.co.il/downloads/sqlpresentations/11063.pdf · sql...
TRANSCRIPT
SQL Server
for
Developers
2012
About Me
• Name: Guy Glantser
• Role: CEO, Madeira
• Email: [email protected]
• Phone: 054-2346606
• Blog: www.madeira.co.il/author/guyglantser
• Website: www.madeira.co.il
1
Agenda
• Working with SQL Server Data Types
– Handling Date & Time Values
– Managing Hierarchies
– Storing Files in SQL Server 2012 Using FileTables
– Working with Spatial Data
– Using Table Types
Agenda
• Advanced Programming Techniques
– Using Ranking Functions
– Using SQL Server 2012 New Window Functions
– Temporary Tables vs. Table Variables
– The OUTPUT Clause
– The MERGE Statement
– Working with Grouping Sets
– Row Identifier Design Best Practices
2
Agenda
• Understanding Execution Plans
– The Procedure Cache
– Compilation-Execution Sequence
– Execution Plan Representations
– Logical and Physical Operators
– Cost and Cardinality
– Estimated vs. Actual Execution Plan
– Common Pitfalls
Agenda
• Parameterization in SQL Server
– Introduction to Parameterization
– 7 Ways to Execute Your Query
– Parameter Sniffing
– Non-Uniform Data Distribution
– Changing Parameter Values
– Simple vs. Forced Parameterization
3
Working with
SQL Server Data Types
Handling Date & Time Values
• Data Types Before SQL Server 2008
Data Type Storage Date Range Accuracy
DATETIME 8 Bytes 01/01/1753 – 31/12/9999 3 Milliseconds
SMALLDATETIME 4 Bytes 01/01/1900 – 06/06/2079 1 Minute
4
Handling Date & Time Values
• New Data Types Since SQL Server 2008
Data Type Storage Date Range Accuracy
DATE 3 01/01/0001 – 31/12/9999 1 Day
TIME(n) 3 to 5 100 Nanoseconds
DATETIME2(n) 6 to 8 01/01/0001 – 31/12/9999 100 Nanoseconds
DATETIMEOFFSET(n) 8 to 10 01/01/0001 – 31/12/9999 100 Nanoseconds
Handling Date & Time Values
2012-06-01 09:43:28
2012-06-01
5
Managing Hierarchies
Michael
BenElton
BarbaraDanny
Dina
InaNika
John
RitaRod
Ron
Managing Hierarchies
• What do We Need?
– Easily Modify the Hierarchy
– Efficiently Retrieve the Hierarchy
in the Correct Order
– Retrieve Only a Sub-Tree
– Retrieve All the Nodes
from the Root to a Specific Node
– Move a Sub-Tree to Another Location in the Hierarchy
– And More…
9
65
43
872
1
6
Managing Hierarchies
• What are the Options?
– Self Referencing Column for the Parent Node
– Self Referencing Column + “HierarchyPath” Column
– The HIERARCHYID Data Type
9
65
43
872
1
Managing Hierarchies
• The Concept of the Hierarchy Path
9
65
43
872
1
/1/2/4/5/
7
Storing Files in SQL Server 2012
Using FileTables
Storing Files in SQL Server 2012
Using FileTables
• What do We Need?
– Good Read & Write Performance
– Backup Synchronization
– Transactional Consistency
– Human Interface for Accessing Files
– Support for Hierarchy Management
of Files and Folders
– Full-Text Indexing of Files
8
Storing Files in SQL Server 2012
Using FileTables
• What are the Options?
– Manage Structured and Unstructured Data Separately
– Use BLOBS in the Database
– Use FileStream (SQL Server 2008)
– Use FileTables (SQL Server 2012)
Storing Files in SQL Server 2012
Using FileTables
Traditional BLOBs FileStream FileTables
Performance + - + +Backup - + + +
Transaction - + + +Human + - - +
Hierarchy + - - +Full-Text - + + +
9
Working with Spatial Data
GEOMETRY GEOGRAPHY
Working with Spatial Data
• Declaration and Instantiation
– STGeomFromText
– STPointFromText
– STGeomFromWKB
– Parse
– Explicit Conversion
– Implicit Conversion
10
Working with Spatial Data
• Spatial Methods
– Methods on a Single Object
– Methods on Multiple Objects
Working with Spatial Data
• Spatial Indexes
11
Using Table Types
• User-Defined Type
• Can be Used with Table Variables
• Can be Passed as a Parameter
• Read-Only within Stored Procedures and Functions
Advanced
Programming
Techniques
12
Using Ranking Functions
• ROW_NUMBER ()
• RANK ()
• DENSE_RANK ()
• NTILE (n)
OVER (PARTITION BY x ORDER BY y)
Using Ranking Functions
• COUNT (…)
• SUM (…)
• MIN (…)
• MAX (…)
• Any Other Aggregate Function
OVER (PARTITION BY x ORDER BY y)
13
Using Ranking Functions
• Use Cases
– Numbering & Ranking Rows
– Paging Mechanism
– Combining Aggregated Data
with Raw Data
– Running Aggregations
– Solving Complex Problems
Using SQL Server 2012
New Window Functions
• New Functions
– LEAD
– LAG
– FIRST_VALUE
– LAST_VALUE
14
Using SQL Server 2012
New Window Functions
• Window Definition
ROWS|RANGE
BETWEEN
UNBOUNDED|X PRECEDING
CURRENT ROW
AND
UNBOUNDED|Y FOLLOWING
CURRENT ROW
Temporary Tables
vs. Table Variables
• Scope
Temporary tables are visible to
the current session including
nested stored procedures.
Table variables are only visible to
the current batch, and they are
not visible to nested stored
procedures.
15
Temporary Tables
vs. Table Variables
• Storage
Both types of tables are created
in tempdb.
Both types of tables will be
processed in memory, if enough
memory is available.
Temporary Tables
vs. Table Variables
• Indexes
Temporary tables can have
indexes, just like regular tables.
Table variables can’t have indexes
except for the unique indexes
associated with the primary key
and with unique constraints.
16
Temporary Tables
vs. Table Variables
• Statistics
SQL Server creates and updates
statistics for temporary tables,
similar to regular tables.
Table variables don’t have
statistics at all, and the query
optimizer always assumes they
contain a single row.
Temporary Tables
vs. Table Variables
• Recompilations
Temporary tables can cause
recompilations due to statistics
updates.
Table variables don’t have
statistics, and thus can’t cause
recompilations.
17
Temporary Tables
vs. Table Variables
• Transactional Control
Modifications to temporary
tables are transactional. They are
written to the transaction log of
tempdb, and they are committed
or rolled back as part of the
transaction.
Table variables are not
transactional.
Temporary Tables
vs. Table Variables
• Collation
Temporary tables use the
collation of the tempdb database.
Table variables use the collation
of the current database.
18
Temporary Tables
vs. Table Variables
• What Should We Use?
– For a small number of rows,
use a table variable, because it
has less overhead.
– If you need to join the table to
other tables, use a temporary
table in order to leverage its
indexes and statistics.
– Only table variables can be
passed as parameters.
The OUTPUT Clause
• Can be Used with Any DML Statement
• Use the INSERTED and DELETED Logical Tables
• Can Reference Joined Tables As Well
• Output to the Client or to a Table
19
The OUTPUT Clause
• Use Cases
– Compare Before and After Values in an UPDATE Statement
– Obtain a Range of IDENTITY Values
– Archive Data in a Single Statement
The MERGE Statement
• Atomic Operation
• Supports the Following Conditions:
– WHEN MATCHED
– WHEN NOT MATCHED BY TARGET
– WHEN NOT MATCHED BY SOURCE
– Custom Logic
• Supports OUTPUT and $ACTION
• Restrictions on Triggers
20
Working with Grouping Sets
• Multiple Groupings in a Single Query
• ANSI-SQL Compliant Syntax
• Improves Performance
• Support for ROLLUP and CUBE
• Support for Multiple Grouping Sets
• The GROUPING_ID Function
Row Identifier Design
Best Practices
• Natural Keys vs. Surrogate Keys
• Surrogate Key Options:
– IDENTITY
– UNIQUEIDENTIFIER
– Custom Solution
– Sequence
21
Understanding
Execution Plans
The Procedure Cache
• Cached Object Types
– Object Plans
– SQL Plans
– Bound Trees
– Extended Stored Procedures
22
Compilation-Execution
Sequence
Compilation-Execution
Sequence
• For each batch, a hash value is
calculated based on the batch text,
and this hash value is searched in
cache.
• If a plan is not found for that batch,
then it’s compiled and placed in
cache.
• If a plan is found for that batch,
then it is tested for validity based
on the objects it depends on.
23
Compilation-Execution
Sequence
• If the plan is not valid, then it’s
recompiled and placed in cache
again.
• If the plan is valid, then it’s
immediately executed.
Execution Plan Representations
• Formats
– Graphical
– XML
– Text
• Sources
– SSMS
– DMVs
– Profiler
– Extended Events
24
Logical and Physical Operators
• Access Methods
– Table Scan
– Clustered Index Scan
– Clustered Index Seek
– Index Seek
– Index Scan
Logical and Physical Operators
• Join Types
– Nested Loops
– Hash Match
– Merge
25
Logical and Physical Operators
• Lookups
– RID Lookup
– Key Lookup
• Aggregates
– Stream Aggregate
– Hash Match
Cost and Cardinality
• Look for the operator with the
highest relative cost
• Use the statement relative cost in
order to compare several query
alternatives
• Look for major differences between
the estimated number of rows and
the actual number of rows
26
Common Pitfalls
• Nested Loops are Lying!
– The estimated number of rows in the
inner operator related to a single
iteration
– The actual number of rows in the inner
operator related to the sum of all
iterations
– Use the estimated number of rows
multiplied by the estimated number of
executions
Common Pitfalls
• Cost is Always Estimated!
– Even when you look at the actual
execution plan
– If there is a big difference between the
estimated and actual number of rows,
then don’t rely on the relative costs
27
Common Pitfalls
• Different Plans for Different Session
Settings
– SSMS uses SET ARITHABORT ON by
default
– SQL Server .NET native client uses SET
ARITHABORT OFF by default
– So when you try to troubleshoot a
performance problem by looking at
the plan in SSMS, you might not be
looking at the correct plan…
Parameterization in
SQL Server
28
Introduction to
Parameterization
What is Parameterization?
Introduction to
Parameterization
Why is it So Important?
29
Introduction to
Parameterization
When does it Become
Problematic?
Introduction to
Parameterization
How can we Deal with it?
30
Introduction to
Parameterization
SELECT
ProductName ,
ProductCategory
FROM
Production.Product
WHERE
ProductID = 17;
Introduction to
Parameterization
SELECT
ProductName ,
ProductCategory
FROM
Production.Product
WHERE
ProductID = 17;
31
Introduction to
Parameterization
SELECT
ProductName ,
ProductCategory
FROM
Production.Product
WHERE
ProductID = @ProductId;
Introduction to
Parameterization
• The “Customers by Country” Case Study
32
Introduction to
Parameterization
• The “Customers by Country” Case Study
SELECT
Id ,
Name ,
LastPurchaseDate
FROM
Marketing.Customers
WHERE
Country = N'IL';
7 Ways to Execute Your Query
33
Parameter Sniffing
• The query optimizer “sniffs”
the parameter values in the
first execution, and optimizes
the query based on these
values.
Parameter Sniffing
• Local variables are not
parameters, so the query
optimizer has nothing to sniff.
• The values of local variables
are only assigned at run-time.
34
Parameter Sniffing
Is Parameter Sniffing
Good or Bad?
Non-Uniform Data Distribution
35
Changing Parameter Values
• The optimizer sniffs the
parameter values at compile-
time, so if you change the
parameter values at run-time,
the plan will not be suitable for
the new values.
Simple vs. Forced
Parameterization
• In simple parameterization,
the optimizer automatically
parameterizes only trivial
plans, in which the parameter
values have absolutely no
effect on the plan chosen.
36
Simple vs. Forced
Parameterization
• In forced parameterization, the
optimizer automatically
parameterizes most plans.
Simple vs. Forced
Parameterization
• Parameterization can be Applied at the:
Database
Level
Query Template
Level
37
• Use the new date & time data
types and functions
• Convert date & time values to
date only values efficiently
• Use the HIERARCHYID data type to
manage hierarchies
38
• Manage files as part of the
database with FileTables
• Seamlessly access your files
through Windows API, Windows
share or T-SQL
• Leverage the GEOGRAPHY and
GEOMETRY data types for GIS
systems and other purposes
• Use spatial indexes to achieve fast
response times for spatial queries
39
• Pass tables as parameters to
stored procedures and functions
using table types
• Table parameters are always read-
only inside the stored procedure
or function
• Ranking functions can facilitate
many complex queries and
improve performance significantly
• Aggregate functions can also be
used as window functions
40
• Use the new window functions to
perform complex self-joins with a
single access to the underlying
table
• Pay attention to the window
definition
• Use a table variable for a small
number of rows
• Use a temporary table if you need
to join the table to other tables
41
• Use the OUTPUT clause to retrieve
multiple identity values or to
archive rows in a single atomic
operation
• Use the OUTPUT clause with the
MERGE statement for auditing
• Use the MERGE statement in
order to merge between a source
table and a target table in a single
atomic statement
• A classic use case for using the
MERGE statement is loading data
into the data warehouse
42
• Use grouping sets in order to
generate ad-hoc ROLAP cubes
• Use the GROUPING_ID column in
order to distinguish between a
real NULL value and a NULL value
that represents grouping status
• Use the new sequence object in
order to manage row identifiers
• Beware of indexes on
UNIQUEIDENTIFIER columns
43
• Pay attention to the estimated
number of rows vs. actual number
of rows of the inner operators of
nested-loop joins
• Remember that plan cost is always
estimated, even when you look at
the actual execution plan
• If there is a big difference
between the estimated and actual
number of rows, then don’t rely
on the relative costs
44
• Different session settings and
different users produce different
plans
• When you try to troubleshoot a
performance problem by looking
at the plan in SSMS, you might not
be looking at the correct plan
• Usually, data distribution is more
or less uniform, so, in general,
prefer parameterized queries over
non-parameterized queries
45
• When data is not uniformly
distributed, rewrite your code
using one of the methods shown
in this presentation
• Avoid the use of local variables as
an alternative for parameters
• Use parameters instead
46
• Don’t change parameter values
inside your stored procedures
• Pass the calculated values to inner
stored procedures to use
parameter sniffing
• Force parameterization only when
you have no other choice
• Prefer forced parameterization at
the query template level
47
One Last Picture…
48
49