blazing fast queries - when indexes are not enough

38
April 21 – 23. Neuss, Germany www.sqlpass.org/eu2010 Blazing Fast Queries When Indexes Are Not Enough Davide Mauri, Solid Quality Mentors, [email protected]

Upload: davide-mauri

Post on 27-Jun-2015

351 views

Category:

Technology


0 download

DESCRIPTION

If you want optimum performance in SQL Server, you have to use indexes. But what if you already implemented indexes and your solution is still slow or it doesn’t scale like you want? You may have to rethink the way you write your queries. How you write your queries is directly related to how you approach and solve your business problems, and you might think that the easiest way to solve them is with a procedural, row-by-row approach. But although cursors, loops, and user-defined functions (UDFs) might seem the easiest method, this "easier" way leads to slower performance, poor scalability, and more complex code that’s bug-prone and difficult to maintain. In this session, you’ll see how to use a set-based approach to solving problems, even those that most developers think are solvable only by using a procedural approach. We’ll then compare the two approaches to see the differences in terms of scalability, performance, and complexity. You'll be amazed at the boost in performance and maintainability you can give your applications!

TRANSCRIPT

Page 1: Blazing Fast Queries - When Indexes Are Not Enough

April 21 – 23. Neuss, Germany

w w w . s q l p a s s . o r g / e u 2 0 1 0

Blazing Fast Queries

When Indexes Are Not Enough

Davide Mauri, Solid Quality Mentors,

[email protected]

Page 2: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

About Me

MCDBA, MCAD, MCT

Microsoft SQL Server MVP

Works with SQL Server from 6.5

Specialized in BI, Database Tuning

President of UGISS (Italian SQL Server UG)

Mentor @ Solid Quality MentorsItalian Subsidiary

Page 3: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

A developer story

I’ve started as a developer, I hated SQL….

…at some point my applications started to show performance and scalability problems...

(more successful they was more problems I had)

…I started to optimized the application finding clever ideas to gain performances

…the application became more and more complex to maintain

(and thus much less cost effective and profitable)

Page 4: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

…and the story ends

At some point I decided that maybe I should check if something can be done also on the database….

…I started learning database and SQL….

…and I discovered that using correct indexes and writing correct queries I could gain more performance and scalability than what I can ever dream of!

So, now everyone told you about indexes….I’ll tell you about queries and the fabulous secret technique of….

Set Based Thinking

Page 5: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Set Based Thinking

Thinking in Sets allows you to leverage all the power of database engine

Optimizer is clever then what you can imagine!

Thinking in Sets means that you have to look at the properties that a set has and that you can use to solve your problem

Thinking in Sets means that we can find the solution from a purely logical perspective and then translate it into SQL commands

After all SQL is Logic (cit. Itzik Ben-Gan)

Page 6: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Set Based Thinking

Thinking in Sets means that you can use a declarative approach instead of a procedural one

Ask to the database what you want, not how to get it

Let the optimizer do his work!

We have a problem:

Row-by-Rows solutions are usually what comes to our mind as a first solution

Since we’re used to think at how we can solve the problem with the programming language we will use to implement it

Page 7: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Set Based Thinking

So, we start to implement this first-on-mind solution immeditaly...

...and then fix all the problems that arises from it

It does not scale

It does not offer performance

Solution:

Stop doing this brute-force-approach and try to find the best algorithm (independent from programming language) that can solve the problem

Unlearn to think row-by-row!

Page 8: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Set Based Thinking

At the first time it’s difficult…

…but everything is difficult at the first time!This not:This is easy: But it should be

quite clear who will

win on a competition

All we need is

Exercising!

Page 9: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Let’s start exercising

I’ll propose some typical real world scenario

All problems derives from situation I faced during consultancy

Then we’ll see how we can solve them using a purely set-based approach

And then we’ll compare some of them to the “easier” row-by-row approach

Page 10: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

What is NOT a set-based approach?

• Using cursors to iterate over all rows, of course

• Calling a scalar UDF that reads rows from a table, for each row in a SELECT statement

• Calling a rowset UDF using CROSS/OUTER APPLY

(except for INLINE UDFs)

• Using a SELECT to define the value of a column for each row in a SELECT statement

• WHILE Loops

Page 11: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Let’s start! 1st problem: Loans

Knowing the value of the requested loan, interests and number for rates, generate a row for each rate, from the beginning up until the end of the loan

Loan Value # Rates Payment Frequency % Interest € 10.000,00 12 1 5,00% € 20.000,00 12 6 6,00% € 30.000,00 12 6 5,50%

Rate Date € 875,00 January-10 € 875,00 February-10 € 875,00 March-10 € 875,00 April-10 € 875,00 May-10 € 875,00 June-10 € 875,00 July-10 € 875,00 August-10 € 875,00 September-10 € 875,00 October-10 € 875,00 November-10 € 875,00 December-10

Page 12: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Loans

Demo

Page 13: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Groups and Correlated Values

Given a list of machines statuses, tell the latest reported status of each machine

ControllerABC

Controller Status TimeStampA StandBy 1001B StandBy 1001C StandBy 1003D Working 1004A Working 1004B StandBy 1006D StandBy 1009D Working 1010D Working 1011

Page 14: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Groups and Correlated Values

Demo

Page 15: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Stocks

Given a table with stock transactions get the High, Low, Open, Close & Volume values for each stock, each hour, each daySymbolID TransactionDateTime Price Volume

1 2008-01-01 09:15:21.000 75.800 25891 2008-01-01 09:25:44.000 68.200 43861 2008-01-01 09:29:31.000 74.300 28371 2008-01-01 09:34:42.000 68.900 29371 2008-01-01 09:39:13.000 72.300 45131 2008-01-01 09:43:35.000 67.300 8381 2008-01-01 09:51:57.000 73.800 13801 2008-01-01 09:56:42.000 68.700 4190

tran_hour from_datetime to_datetime symbol_id high low volume open close

92008-01-01 09:15:21.000

2008-01-01 09:56:42.000 1 75.800 67.300 23670 75.800 68.700

Page 16: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Stocks

Demo

Page 17: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Performance Comparison

10000 100000 10000000

1000000

2000000

3000000

4000000

5000000

6000000

7000000

47 426 754740790510442

5980523

I/O Reads

Reads Set-BasedReads Row-By-Row

I/Os

Rows

Page 18: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Performance Comparison

10000 100000 10000000

5000

10000

15000

20000

25000

30000

35000

40000

134 970

12534

4043034

33355

Time (msec)

Time (msec) Set-BasedTime (msec) Row-By-Row

Time

Rows

Page 19: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Free Seats / Free Warehouse Spaces

Give a group of person of defined size, find in which row in a cinema (or theater, or stadium, or whatever ), there is enough free space to make the seat near each other

Page 20: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Free Seats / Free Warehouse Spaces

Page 21: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Free Seats – Row by Row

3 4 5 6 7 9A. Set the “free seat counter” (FSC) to 1B. Start from the first free seat (n), store this number as

First Free Seat (FFS)C. If next free seat is n+1 then increase the FSCD. If next free seat is not n+1, store it somehere along with FFSE. restart from point A

1 2 5 1

Page 22: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Free Seats

Demo

Row By Row Solution

Page 23: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Free Seats – Set Based

A. If SQL Server could see the picture above would be able to see thatthere are groups of free spaces

Then we could simply use “GROUP BY”B. We need turn this “for-eyes-only” this property into something

SQL Server can handle

Page 24: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Free Seats – Set Based

A. Let’s enumerate all the seats (free and occupied)B. And then enumerate all the free seatsC. Do a simple difference….D. And now we can use a GROUP BY

1 2 3 4 5 6 7 8 9 10 …

1 2 3 4 5 6 7 …2 2 2 2 2 3 3

Page 25: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Free Seats

Demo

Set Based Solution

Page 26: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Performance Comparison

30 300 30000

5000

10000

15000

20000

25000

30000

35000

40000

20 36 176381

3630

36098

I/O

Set BasedCursor

Page 27: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Performance Comparison

30 300 30000

500

1000

1500

2000

2500

3000

3500

4000

9 22 11387

469

3516

Time (ms)

Set BasedCursor

Page 28: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Value Packing

Now, from the same machines statuses table, we need to “pack” all the ranges

Controller Status TimeStampA StandBy 1001B StandBy 1002B StandBy 1003B StandBy 1004A Working 1005A Working 1006A Working 1007B Working 1008B Working 1009A StandBy 1010A StandBy 1011B StandBy 1012

Controller Status From ToA StandBy 1001 1001B StandBy 1002 1004A Working 1005 1007B Working 1008 1009A StandBy 1010 1011B StandBy 1012 1012

Page 29: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Value Packing

Demo

Page 30: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Temporal Problems

Set Based solution comes to the rescue also in temporal problems

Packing Problems

Get the minimum and the maximum date of validity of something

Auditing or Temporal Denormalization Problems

Given tables that hold data with different valid intervals, create a result that holds all the valid combination in time

Page 31: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Temporal – Auditing/Denormalization

Give tables that contains validity period for attributes that changes of over time, generate a result that shows all the combinations that existed at some point in time

Value From To

Phone 1 2007-12-19 2008-02-02

Phone 2 2008-02-03 now

Value From To

Addresses 1 2007-12-01 2007-12-31

Addresses 2 2008-01-01 now

Phone Number Changes Address Changes

2007-12-01 2008-01-01

2007-12-19 2008-02-03

Timeline

Page 32: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Temporal Auditing / Denormalization

Demo

Page 33: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Conclusions

Set Based Thinking is hard….

…but deserves our attention!

And after all row-by-row solution is not that easy!

We can spend more time in thinking instead of writing ugly code

We’ve been able to optimize our query making them up to 30 times faster!

Stop crawling, and start to run!

Big customers will be very satisfied with us!

Page 34: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Is all Gold what shines?

YES!!!

…in Theory

In 99.9% a set based solution is preferable

But in some very rare (one?) case is still not the best

The problem is that the ACTUAL implementation of T- SQL in not complete and doesn’t allow to easily get “previous” or “next” values

Vote on Connect!!!

Page 35: Blazing Fast Queries - When Indexes Are Not Enough

Blazing Fast Queries

Additional Resources

Itzik Ben-Gan & Friends books:• SQL Server 2008 Fundamentals• SQL Server 2008 Querying• SQL Server 2008 Programming

Thinking in Sets• Joe Celko

SQL and Relational Theory• Chris Date

Page 36: Blazing Fast Queries - When Indexes Are Not Enough

Questions

Page 37: Blazing Fast Queries - When Indexes Are Not Enough

Thank you!

Page 38: Blazing Fast Queries - When Indexes Are Not Enough

Don’t forget to fill out your session evaluation forms