parallel query execution in sql server - microsoft...why parallel query execution? •improves...

39
Parallel Query Execution in SQL Server Craig Freedman Software Design Engineer SQL Server Query Team

Upload: others

Post on 18-Mar-2020

20 views

Category:

Documents


0 download

TRANSCRIPT

Parallel Query Executionin SQL Server

Craig Freedman

Software Design Engineer

SQL Server Query Team

Outline

• Why parallel query execution?

• Basics of parallel query execution

• Degree of parallelism (DOP)

• The parallelism operator

• Parallel query plan examples

• What to look out for …

• Q&A

Why Parallel Query Execution?

• Improves response time for big queries

• Completely transparent to the end user

• Appropriate for data warehouse workloads(small numbers of big queries)

• Parallelism is not free– Adds overhead to query execution

– May reduce overall server throughput

• Inappropriate for OLTP workloads(large numbers of small queries)

Scalability Terminology

Perfect linear scalability:

• 2x CPUs = ½ Response Time

• 2x Data + 2x CPUs = Same Response Time

Data Size # CPUs Response Time

Speed Up Same Data More CPUs Reduced RT

Scale Up More Data More CPUs Same RT

Basics of Parallel Query Execution

1. Horizontally partition input data

2. Distribute partitions among CPUs

3. Simultaneously perform the same operationon each partition

“Single Instruction Multiple Data”

Advantages:

• Excellent scalability: threads operate independently

• Easy to adjust the degree of parallelism (DOP)

Parallelism Example

Thread 1

Aggregate

Thread 2

Aggregate

314159

2682 Evens

Odds5359

NOT Pipeline Parallelism

• Parallelism limited by the number of operators

• Operators are dependent on one another

• Does not scale well

Thread 2

Hash Aggregate

Thread 1

Scan

Choosing a Parallel Plan

• Must be running on a multi-processor system

• Optimizer makes a cost based decision whether to generate a parallel plan

Configuration Option Setting

affinity mask Must enable more than one processor

max degree of parallelism Must be greater than one(or overridden by MAXDOP query hint)

cost threshold for parallelism Query cost must exceed threshold (seconds)

max worker threads Must have enough threads to run query

Degree of Parallelism (DOP)

• The number of threads per operatorNOT the total number of threads

• Total number of threads may exceed DOP

• Optimizer estimates DOP for costing

• Actual DOP selected at query startup

Choosing the Degree of Parallelism

Processors

• Number of physical processors

• Limited by “affinity mask” configuration option

MAXDOP

• MAXDOP query hint

• “max degree of parallelism” configuration option

Threads

• “max worker threads” configuration option

• Reduce DOP (or run serially) if insufficient threads available

CPU Utilization

• SQL 2000:

– Any thread runs on any CPU, and …

– There may be more threads than DOP, so …

– Reducing DOP may not reduce CPU usage

• SQL 2005:

– # of schedulers per query limited to DOP, so …

– Query never uses more than DOP CPUs, and …

– Reducing DOP does reduce CPU usage

The Parallelism Operator

• aka Exchange

• Moves (packets of) rows between threads

• Really two operators: producer and consumer

• Producer and consumer execute independently:

– Producers push data to consumers

– Consumers may have to wait for data from producers

– Flow control keeps producers from getting too far ahead of consumers

Producer Thread 1

Producer Thread 2

The Parallelism Operator

Consumer Thread 1

Consumer Thread 2

Types of Parallelism Operators

Type # Producer Threads # Consumer Threads

Gather Streams > 1 (DOP) 1

Repartition Streams > 1 (DOP) > 1 (DOP)

Distribute Streams 1 > 1 (DOP)

Gather Streams Repartition Streams Distribute Streams

Types of Parallelism Operators

Partitioning Type Description

Broadcast Send all rows to all consumers.

Hash Determine where to send each row by evaluating a hash function on one or more columns in the row.

Round Robin Send each packet of rows to the next consumer in sequence.

Demand Send the next row to the next consumer that asks.Used by partitioned tables only. Pull based data flow!

Range Determine where to send each row by evaluating a range function on one column in the row. Used by index builds and stats queries.

Routing of rows between producers and consumers

Types of Parallelism Operators

Order preserving (aka merging) exchange

7531

8642

Non-order preserving (aka non-merging) exchange

87654321

86427531

7531

8642

Parallel Query Plan Examples

Parallel Scan

• Most operators are unaware of parallelism;Parallel scan is an exception

• Threads work together to scan entire table

• Server dynamically assigns pages or rows to threads

• Automatically adjusts to the number of threads

• Automatically adjusts for skew or load imbalances

Thread 1 Thread 2 Thread 3 Thread 4

Parallel Nested Loops Join

• Distribute outer rows to any inner thread:– Generally no exchange needed (e.g., parallel scan)– Occasionally need a round-robin exchange

• Inner side (almost) always runs serially andprocesses a single outer row at a time

SELECT * FROM T1 JOIN T2 ON T1.B = T2.A WHERE T1.A < 100 Parallel Scan

100 Serial Scans

100 Rows

Parallel Nested Loops Join

• Distribute outer rows to any inner thread:– Generally no exchange needed (e.g., parallel scan)– Occasionally need a round-robin exchange

• Inner side (almost) always runs serially andprocesses a single outer row at a time

SELECT * FROM T1 JOIN T2 ON T1.B = T2.A WHERE T1.A < 100 Parallel Scan

100 Serial Scans

100 Rows

Inner Side Parallelism

• Normally, the inner side of a nested loops join runs serially• Exception:

– Single outer row and no correlated parameters– May in some rare cases yield parallelism on the inner side

of a nested loops join

SELECT * FROM T1 JOIN T2 ON T1.A = T2.B WHERE T2.A = 0

Broadcast PartitionOne Row!

Parallel Scan!

Parallel Join

Inner Side Parallelism

• Normally, the inner side of a nested loops join runs serially• Exception:

– Single outer row and no correlated parameters– May in some rare cases yield parallelism on the inner side

of a nested loops join

SELECT * FROM T1 JOIN T2 ON T1.A = T2.B WHERE T2.A = 0

Broadcast PartitionOne Row!

Parallel Scan!

Parallel Join

Inner Side Parallelism

IF EXISTS (SELECT * FROM T1 WHERE T1.A = 0)PRINT 'Hello World!'

One Row!

Exchange and Parallel Scan!

Serial Join

Parallel Hash Join

• Hash build and probe rows that may jointo the same HJ instance

• Merge join uses same approach

SELECT * FROM T1 JOIN T2 ON T1.A = T2.B

Parallel ScanHash Partition

Parallel Hash Join

• Hash build and probe rows that may jointo the same HJ instance

• Merge join uses same approach

SELECT * FROM T1 JOIN T2 ON T1.A = T2.B

Parallel ScanHash Partition

Bitmap Filtering

• Useful when build input has a selective predicate1. Create a bitmap using build input2. Filter probe input using bitmap

• Also used by some parallel merge joins

SELECT * FROM T1 JOIN T2 ON T1.A = T2.B WHERE T1.B < 100000

Create Bitmap

Filter with Bitmap

Bitmap Filtering

• Useful when build input has a selective predicate1. Create a bitmap using build input2. Filter probe input using bitmap

• Also used by some parallel merge joins

SELECT * FROM T1 JOIN T2 ON T1.A = T2.B WHERE T1.B < 100000

Create Bitmap

Filter with Bitmap

Broadcast Hash Join

• If build is small, it may not hash partition welland may lead to skew problems

• Instead broadcast build rows to all hash join instancesand probe at any hash join instance

• Also eliminates an extra exchange!

SELECT * FROM T1 JOIN T2 ON T1.A = T2.B WHERE T1.B < 100

Parallel Scan

Broadcast Partition

No Exchange!

Few Rows

Merging Exchange

• Needed for parallel query with ORDER BY

• Also used by merge join and stream aggregate

• Risk of intra-query parallel deadlocks!

SELECT * FROM T1 ORDER BY T1.A

Merging Exchange

Merging Exchange

• Needed for parallel query with ORDER BY

• Also used by merge join and stream aggregate

• Risk of intra-query parallel deadlocks!

SELECT * FROM T1 ORDER BY T1.A

Merging Exchange

Parallel Aggregation

• Hash rows that belong to the same groupto the same aggregate instance

SELECT T1.A, MIN(T1.B) FROM T1 GROUP BY T1.A

Hash Partition

Local/Global (Partial) Aggregation

• If number of groups is small, data may not hash partition welland may lead to skew problems

• Instead split into two aggregates:– Local aggregate computes partial results for all groups– Global aggregate computes final results

• Also eliminates exchange below the local aggregate

Local Aggregate

SELECT MIN(T1.A) FROM T1

Many RowsNo Exchange!

Few Rows

Global Aggregate

What to look out for …

What to look out for …

• Nested loops join parallelism may be limited due to:– Too few outer rows (e.g., partitioned table with two partitions)– Poor distribution of rows to pages (e.g., skewed parallel scan)– Showplan XML shows the number of rows processed per thread

• Merging exchanges– May not scale as well as non-merging exchanges– Parallel deadlocks (rare – especially with SQL Server 2005)

• Inserts, updates, and deletes are serial (except for index build)– Application level parallelization can help, but …– Cannot manually parallelize bulk load without giving up bulk logging

• Some features, operators, and intrinsics may force serial plans or serial zones (and bottlenecks) within a parallel plan

• On SQL Server 2000, reducing DOP may not reduce CPU usage

Non-Parallelizable “Stuff”

Forces a serial zone (within a parallel plan):

• System table scans • Sequence functions • TOP

• “Backward” scans • Recursive queries • TVFs

• Global scalar aggregate • Multi-consumer spool

Forces a serial plan:

• All TSQL UDFs

• CLR UDFs with data access

• Miscellaneous built-ins such as:OBJECT_ID(), ERROR_NUMBER(), @@TRANCOUNT, …

• Dynamic cursors

Serial zones may lead to bottlenecks and reduced performance

Intra-Query Parallel Deadlocks

• Deadlock among the threads executing a parallel query• Three scenarios:

– Two merging exchanges separated by order preserving operators– A merge join with merging exchanges on both inputs– Merging exchange above index seek with multiple ranges

• Deadlock detector recognizes intra-query deadlocks and tries to resolve, but it’s SLOW!

• To diagnose, check whether all threads within the session are waiting on wait type CXPACKET– On SQL Server 2005, use sys.dm_os_waiting_tasks– On SQL Server 2000, use sysprocesses

• All but a few rare cases fixed in SQL Server 2005:– Merge join with mismatched join key types– Index seek with multiple ranges

Branch 1 Thread 1

Branch 1 Thread 2

Root Thread

Parallel Deadlock Example

Branch 2 Thread 1

Branch 2 Thread 2

Both exchanges are merging

No rows flow through here

Merging blocks without at least one row from each input

Too many rows flow through here

Blogs

• My blog:http://blogs.msdn.com/craigfr

• Query Team blog: http://blogs.msdn.com/sqlqueryprocessing

• SQL Customer Advisory Team blog: http://blogs.msdn.com/sqlcat

Q&A

© 2007 Microsoft Corporation. All rights reserved.Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the dateof this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part

of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION