sql server columnstore performance tuning eric n hanson principal program manager microsoft...
TRANSCRIPT
SQL Server Columnstore Performance Tuning
Eric N HansonPrincipal Program ManagerMicrosoft Corporation
DBI409
Session Objectives and Takeaways
Session Objective(s): Learn to make query performance go from good to outstanding
Batch mode is crucial to speedupsGood application design can work around limits
Why pay attention?
Your customer with well-tuned xVelocity columnstore application
Your customer with poorly-tuned xVelocity columnstore application
demo
Outer Join Performance Limit
Fundamentals of xVelocity Columnstore Performance
Each column stored separately, compressedBatch mode query executionSegment elimination
Columnstore physical structure
• Segment (a.k.a. row group) = 1 million rows
• Table partition has 1 or more segments
• Each column segment is a BLOB
• Dictionaries are BLOBs• Catalog has one row per
column segment with min/max values
segment 1
segment Ndictionaries
catalog
…
Batch vs. Row Mode
Batch mode processing
Process ~1000 rows at a timeBatch stored in vector form
Vector operators implementedFilter, hash join, hash aggregation
Greatly reduced CPU time (7 to 40X)
8
bit
map o
f qu
alif
yin
g
row
s
Column vectors
Batch object
#1 Takeaway!
Make sure most of the work of the query happens in batch mode.
Segment Elimination
column_id
segment_id
min_data_id max_data_id
1 1 20120101 20120131
1 2 20120115 20120215
1 3 20120201 20120228
• Segment (rowgroup)= 1 million row chunk• Min, Max kept for each column in a segment• Scans can skip segments based on this
select Date, count(*) from dbo.Purchase where Date >= 20120201 group by Date
skipped
DOs
Use star schemaPut columnstores on large tables onlyInclude every column of table in columnstore indexUse integer surrogate keys for joinsStructure your queries as star joins with grouping and aggregation as much as possible
DON’T
Join on string columnsFilter on string columns of columnstore tableJoin pairs of very large tables if you don’t have toUse NOT IN <subquery> on columnstore tableUse UNION ALL to combine columnstore tables with other tables
Outer Join Workaround
Outer join prevents batch processingRewrite query to do most work in batch mode
DEMO
IN and EXISTs Issue & Workaround
Using IN and EXISTS with subqueries can prevent batch mode executionIN ( <list of constants> ) typically works fine DEMO
Union All Issue & Workaround
UNION ALL often prevents batch modeWorkaround
Push GROUP BY and aggregation over UNION ALLDo final GROUP BY and aggregation of resultsCalled “local-global aggregation”
DEMO
Scalar Aggregates Issue & Workaround
Aggregate without group by doesn’t get batch processingselect count(*) from dbo.Purchase 1.2 secondsWorkaroundwith CountByDate (Date, c) 0.093 secondsas ( select Date, count(*) from dbo.Purchase group by Date ) select sum(c) from CountByDate;
Multiple DISTINCT aggregates issue & workaround
Table Spool if query has 2 or more DISTINCT aggregatesSpool takes time to writeSpool write is single threadedReads of spool in plan are single threaded, in row modeWorkaround:
form each DISTINCT agg in separate subqueryJoin results on grouping keys
DEMO
SQL Server 2012runs queries with 1 DISTINCT aggand 1 or more non-distinct aggs in batch mode without any spool!
Physical DB Design, Loading, and Index Management
Adding Data Using Partition Switching
Columnstores must be partition-aligned Partition switching fully supportedTo add data daily
Partition by dayEvery day
Split last partitionCreate staging table and columnstore index itSwitch it in
Avoids costly drop/rebuild
Trickle Loading with Columnstore Indexes
Master table (columnstore)Delta table (rowstore)Query using UNION ALL local-global aggregation workaroundAdd Delta to Master nightly DEMO
Avoid Nonclustered B-trees
Covering B-trees no longer neededExtra B-trees can cause optimizer to choose poor planSave spaceReduce ETL time
Eliminating Unsupported Data Types
Omit column, orModify column type to supported type
Reduce precision of numerics to 18 digits or lessConvert guid’s to intsReduce precision of datetimeoffset to 2 or lessConvert hierarchyid to int or string
Achieving Fast Columnstore Index Builds
One thread per segmentLow memory throttles parallelismConsider
high min server memory setting Set REQUEST_MAX_MEMORY_GRANT_PERCENT to 50 or moreAdd memoryOmit columnsNormalize fact columns to dimensionsVertically partition
sys.dm_exec_query_memory_grants shows DOP
Not enough memory to build a columnstore index?
Same ideas apply as for achieving fast buildsRun out of memory during execution, not startup?
Errors 701 or 802 indicate thisWorkaround: reduce DOP explicitly, e.g.create columnstore index <name> on <table>(<columns>) with(maxdop = 1);
Maximizing the Benefits
of Segment Elimination
Verifying Segment Elimination
You may want to check if segments are eliminatedWays to get segment elimination confirmation:
Xevent sqlserver.column_store_segment_eliminate Text output messages
Trace flag 646 outputs segment elimination messagesTrace flag 3605 directs messages to error log file
DEMO
Ensuring segment elimination by date
Use clustered B-tree on date Columnstore inherits order
Or, partition by dateOrdering by load date, ship date, order date etc. can all work
Dates are naturally correlated
Partitioning and Segment Elimination -- Examplepartition 1
Date = 20120301segment 1 min(Date) = 20120301, max(Date) = 20120301segment 2 “ segment 3 “segment 4 “segment 5 “
partition 2
Date = 20120302segment 1 min(Date) = 20120302, max(Date) = 20120302segment 2 “ segment 3 “ segment 4 “ segment 5 “
Select …From FactWhere Date = 20120301
Multi-dimensional segment eliminationPre-sort or partition on a keydrawn from >1 dimensionE.g. YYYYMMRRCan be persisted computed column not used in columnstore
select f.region_id, avg(f.duration)from fact_CDR f where f.region_id = 1 and f.date_id between 20120101 and 20120131 group by f.region_id
Additional Tuning Considerations
String performance issuesString filters don’t get pushed to storage engine
more batches to processdefeats segment elimination
Joining on string columns is slowFactor strings out to dimensions
Date LicenseNum Measure
20120301 XYZ123 100
20120302 ABC777 200
Date LicenseId Measure
20120301 1 100
20120302 2 200
LicenseId LicenseNum
1 XYZ123
2 ABC777
Forcing use or non-use of Columnstores
Query hintOPTION(IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
Index hint… FROM F WITH(index=MyColumnStore) …… FROM F WITH(index=MyClusteredBtree) …
Summary
Key’s to fast processingBatch modeSegment elimination
Make sure most work is done in batch mode!Can work around read-only property by
PartitioningUsing Delta-file for trickle load
Future work will reduce need for tuning
Related Content
Columnstore Tuning Guide (your reading homework )
http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-performance-tuning.aspx
Columnstore FAQ http://social.technet.microsoft.com/wiki/contents/articles/sql-server-columnstore-index-faq.aspx
Introductory Columnstores Talk Video TechEd 2011 session DBI312
Customer Case StudiesSee list in FAQ
Track Resources
@sqlserver@ms_teched
mvaMicrosoft Virtual Academy
SQL Server 2012 Eval Copy
Get Certified!
Hands-On Labs
Resources
Connect. Share. Discuss.
http://northamerica.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn
Complete an evaluation on CommNet and enter to win!
Please Complete an Evaluation Your feedback is important!
Multipleways to Evaluate Sessions
Scan the Tagto evaluate thissession now on myTechEd Mobile
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.