1 2 3 the result is all sales territory country
TRANSCRIPT
Querying in DAX
Marco Russo
DBI-B315
How Many of You…
Have used MDX on a solution?Love MDX? Well… at least like it! Understand cell-by-cell versus block mode evaluation?Can write a medium complex MDX query in a textbox?Understand the FREEZE statement?Believe that MDX is utterly complicated?
Why Another Language?SQL Server 2012 Analysis Services
Multidimensional: MDXTabular: new engineRequires a new languageDAX is the answer
DAX is designed to be SimpleFastNot necessarily easy
Evolution of DAXPowerPivot 2010
Introduction of DAX as a programming languageCalculated Columns, MeasuresFilter Context / Evaluation Context
Excel is the preferred clientMDX is the query language
PowerPivot 2012 / Analysis Services 2012DAX becomes a query languageSQL Server Profiler fires Query Plan eventsOptimization is now feasible, even for normal human beings!
Agenda• Basic DAX Query Language Features• Query Functions in DAX• Handling Complex Filters• Basic Information on the DAX Query Plan• SQL Server Profiler• Optimization Examples
DAX Query Syntax
DAX as a query language
Only one statement
EVALUATEFollowed by any DAX table expressionReturns the result as a dataset
Looks simple!
EVALUATE syntax[DEFINE
{ MEASURE <table>[<col>] = <expression> }]
EVALUATE <Table Expression>
[ORDER BY {<expression> [{ASC | DESC}]} [, …]
[START AT {<value>|<parameter>} [, …]] ]
EVALUATE exampleEVALUATE
'Sales Territory'ORDER BY 'Sales Territory'[Sales Territory Group] ASC, 'Sales Territory'[Sales Territory Country] ASC, 'Sales Territory'[Sales Territory Region]START AT
"North America", "United States"
EVALUATE and FILTEREVALUATE
FILTER ( FILTER ( 'Internet Sales', [Ship Date] > [Due Date] ), 'Internet Sales'[Unit Price] > 2 )
CALCULATETABLEEVALUATE
CALCULATETABLE ( 'Sales Territory', [Sales Territory Country] = "United States"
)
ORDER BY 'Sales Territory'[Sales Territory Country]
CALCULATE evaluation orderEVALUATE
CALCULATETABLE ( CALCULATETABLE ( 'Sales Territory', ALL ( [Sales Territory Country] ) ), [Sales Territory Country] = "United States" )
1
2
3
The result is ALL Sales Territory Country
ADDCOLUMNSEVALUATE
ADDCOLUMNS ( 'Internet Sales', "Margin", ([Sales Amount] - [Total Product Cost]))
SUMMARIZEEVALUATE
SUMMARIZE( 'Internet Sales', 'Internet Sales'[Order Date], "Sales Amount", SUM ( 'Internet Sales'[Sales Amount] ) )
Source Table
GROUP BY
Expression to evaluate
SUMMARIZE with ROLLUPEVALUATE
SUMMARIZE( 'Internet Sales', ROLLUP ( 'Date'[Calendar Year] ), "Sales Amount", SUM('Internet Sales'[Sales Amount]) )
Compute sub-totals at the Year
level
CROSSJOINEVALUATE
ADDCOLUMNS ( CROSSJOIN ( DISTINCT ( 'Product'[Color] ), DISTINCT ( 'Product'[Size] ) ), "Products", COUNTROWS ( RELATEDTABLE ( Product ) ) )
Query measuresDEFINE MEASURE 'Product Category'[Subcategories Count] = COUNTROWS ( RELATEDTABLE ('Product Subcategory' ) ) MEASURE 'Product Category'[Products Count] = COUNTROWS ( RELATEDTABLE ('Product' ) )EVALUATE ADDCOLUMNS ( 'Product Category', "SubCategories", [Subcategories Count], "Products Count", 'Product Category'[Products Count] )
Query measures are fast!MultidimensionalCalculated members defined in the query can be very slowLimitations in the cache usage
TabularNo performance lossCache always active, no query vs global cacheIf Formula Engine is involved, no parallelism happens
Very useful during development
GENERATE and a complex exampleEVALUATE GENERATE ( VALUES ( 'Product Category'[Product Category Name] ), FILTER ( CALCULATETABLE ( ADDCOLUMNS ( TOPN ( 3, VALUES ( 'Product'[Product Name] ), [Internet Total Sales] ), "Sales", [Internet Total Sales] ) ), [Sales] > 0 ) )
Querying Tabular – First recap
DAX as a query languageVery simple: one single instruction makes everything: EVALUATEA query is a function call that returns a tableAny DAX function can be used in the queryQuery-defined measures work very well
At this point, we know DAX as a query language…
Optimizing DAX
Optimizing DAX is important
Data ModelTabular is a simple data modelNot many options to optimize the modelDAX is the key to gain optimal performances
Query PlanMultidimensional has no query plansTabular makes the query plan visible
DAX query architecture
Analysis Services 2012Tabular Model
In-Memory Mode VertiPaq StorageQuery
Storage Engine Query
DAX / MDX query
DirectQuery Mode External Data Sources
Query SQL Query
Process = Read Data from External Sources
Query engines
FE - Formula EngineStill single threadedHandles Complex Expressions
SE - Storage EngineWhat?Storage Engine in an in-memory database?Yes, we like to call it this way! xVelocity in-memory Analytics Engineaka VertiPaq
Optimizing means using the SE and avoiding FE
Microsoft view of Tabular: Rich & Fast DAX VertiPaq Query
Rich
Single thread per query
Designed for expressivity
Simple
One core per segment
Optimized for speed
Formula engine
Formula engine Vertipaq
28
DAX query evaluation flowBuild DAX Expression Tree
Buld DAX Logical Plan
Simplify DAX Logical Plan
Build DAX Physical Plan
Execute DAX Physical Plan
Some Vertipaq queries can be executedin order to gather additional informationabout data that is to be queried
Fire Logical Plan Event
Fire Physical Plan Event
SQL Server Profiler
Catches events from SSASQueries Events Query Processing
DAX Studio – daxstudio.codeplex.comFuture release will capture Dax Query Plan
Monitoring a queryEVALUATE
CALCULATETABLE ( SUMMARIZE ( 'Internet Sales', Geography[State Province Code], "Sales", SUM ( 'Internet Sales'[Sales Amount] ) ), FILTER ( Customer, Customer[Last Name] = "Anand" ))
1° VertiPaq querySELECT [Customer].[CustomerKey]FROM [Customer]WHERE
( PFDATAID( [Customer].[LastName] ) = 81 )
2° VertiPaq querySELECT [Geography].[StateProvinceCode] FROM [Internet Sales]
LEFT OUTER JOIN [Customer] ON [Internet Sales].[CustomerKey] = [Customer].[CustomerKey]
LEFT OUTER JOIN [Geography] ON [Customer].[GeographyKey] = [Geography].[GeographyKey]WHERE
[Customer].[CustomerKey] IN (11096, 11989, 17005, 22513, 28899, 15054, 19626, 20344, 25918, 27141... [74 total values, not all displayed]);
DAX logical query planCalculateTable AddColumns Scan_VertiPaq GroupBy_VertiPaq Scan_VertiPaq Sum_VertiPaq Scan_VertiPaq 'Internet Sales'[Sales Amount] Filter_VertiPaq Scan_VertiPaq 'Customer'[Last Name] = Anand
Query plans side by side
CalculateTable AddColumns Scan_VertiPaq GroupBy_VertiPaq Scan_VertiPaq Sum_VertiPaq Scan_VertiPaq [Sales Amount] Filter_VertiPaq Scan_VertiPaq 'Customer'[Last Name] = Anand
EVALUATECALCULATETABLE( SUMMARIZE( 'Internet Sales', Geography[State Province Code], "Sales", SUM( [Sales Amount] ) ), FILTER( Customer, Customer[Last Name] = "Anand" ))
Query running – Step 1SELECT [Geography].[StateProvinceCode], SUM([Internet Sales].[SalesAmount])FROM [Internet Sales]
LEFT OUTER JOIN [Customer] ON [Internet Sales].[CustomerKey]=[Customer].[CustomerKey]
LEFT OUTER JOIN [Geography] ON [Customer].[GeographyKey]=[Geography].[GeographyKey]WHERE
[Customer].[CustomerKey] IN (11096, 11989, ...[74 total values, not all displayed]) VAND
[Geography].[StateProvinceCode] IN ('VIC', 'BC', ...[21 total values, not all displayed]);
Query running – Step 2SELECT [Geography].[StateProvinceCode] FROM [Internet Sales]
LEFT OUTER JOIN [Customer] ON [Internet Sales].[CustomerKey] = [Customer].[CustomerKey]
LEFT OUTER JOIN [Geography] ON [Customer].[GeographyKey] = [Geography].[GeographyKey]WHERE
[Customer].[CustomerKey] IN (11096, 11989, 17005, 22513, 28899, 15054, 19626, 20344, 25918, 27141... [74 total values, not all displayed]);
Simple query plan4 VertiPaq Queries
2 before execution2 during execution
1 performed GROUPBY and SUM in VertiPaq1 hit the cache
Intermediate results are spooledFinal JOIN performed by Formula Engine
This is a very good query plan
Place extended columns in ADDCOLUMNSSlower query
EVALUATESUMMARIZE ( Product, Product[Product Category Name], "Products", COUNTROWS ( Product ))
Optimized queryEVALUATEADDCOLUMNS ( SUMMARIZE ( Product, Product[Product Category Name] ), "Products", CALCULATE( COUNTROWS( Product ) ))
Always put extended columns in ADDCOLUMNS parameters.Just use SUMMARIZE for its group by and when the evaluation context ofSUMMARIZE is required (i.e. when a TOPN has to be executed for each group)http://www.sqlbi.com/articles/best-practices-using-summarize-and-addcolumns/
Currency conversion SUMX ( DimCurrency, SUMX ( DimDate, CALCULATE ( VALUES ( CurrencyRate[AverageRate] ) * SUM ( FactInternetSales[SalesAmount] ) ) ))
Currency conversion optimizedSUMX( CROSSJOIN( DimCurrency, DimDate ), CALCULATE( VALUES( CurrencyRate[AverageRate] ) * SUM( FactInternetSales[SalesAmount] ) ))
Using one SUMX, the optimizer pushes the
calculation in VertiPaq
Slow measures and filtersDEFINE MEASURE 'Internet Sales'[Sales] = CALCULATE (ROUND (SUM ('Internet Sales'[Sales Amount] ), 0 ) ) MEASURE 'Internet Sales'[YTD Sales] = TOTALYTD ( [Sales] , 'Date'[Date] ) MEASURE 'Internet Sales'[QTD Sales] = TOTALQTD ( [Sales] , 'Date'[Date] )EVALUATE
FILTER ( ADDCOLUMNS ( CROSSJOIN ( VALUES ( 'Date'[Calendar Year] ), VALUES ( 'Date'[Month] ), VALUES ( 'Date'[Month Name] ) ), "Sales", [Sales], "YTD Sales", [YTD Sales], "QTD Sales", [QTD Sales] ), NOT ISBLANK( [Sales] ))ORDER BY 'Date'[Calendar Year], 'Date'[Month]
These two measures are slow to compute
The filter is applied after computation
Filter as soon as you canDEFINE MEASURE 'Internet Sales'[Sales] = CALCULATE (ROUND (SUM ('Internet Sales'[Sales Amount] ), 0 ) ) MEASURE 'Internet Sales'[YTD Sales] = TOTALYTD ( [Sales] , 'Date'[Date] ) MEASURE 'Internet Sales'[QTD Sales] = TOTALQTD ( [Sales] , 'Date'[Date] )EVALUATE
ADDCOLUMNS ( FILTER ( CROSSJOIN ( VALUES ( 'Date'[Calendar Year] ), VALUES ( 'Date'[Month] ), VALUES ( 'Date'[Month Name] ) ), NOT ISBLANK ( [Sales] ) ), "Sales", [Sales], "YTD Sales", [YTD Sales], "QTD Sales", [QTD Sales])ORDER BY 'Date'[Calendar Year], 'Date'[Month]44
This query runs 7 times faster, even if it computes [Sales]
twice
Joins are expensiveEach join in the data model has a cost
But… can you remove joins?From snowflakesFrom lookup tablesAnd most important: from junk dimensions!
Don’t build junk dimensions, store strings in fact table!
Cache usageDAX Query PlansCache all VertiPaq Query Results
MDX QueriesIdentical to SSAS MultidimensionalCaches FE calculated members
Using MDX to query TabularBoth caches work
Using DAX to query TabularOnly VertiPaq results are cached
If you heavily hit Formula Engine, the
cache will not be really useful.
General rulesUse the simplest formulasPrefer ADDCOLUMNS to SUMMARIZEAvoid complex calculations in measuresDon’t use error-handling functions extensivelyThey are sloooooowGood only in measures
Build a correct data modelReducing distinct count of values for columns is the main targetKeep it simple, relational, clear
Push calculations down to the VertiPaq engineNot an easy task to do looking at query plansBut, after some experience, you will learn it
ConclusionsDAX is a simple languageVery effective in expressing complex queriesLooks strange, at the beginning, but it is easier than MDX
Optimizing Tabular means optimizing DAXOptimizing DAX means pushing calculations to VertiPaqSaid in other words… trust the Rain Man, he’s your best friend!
Sometimes, you will change the data model for this
DAX is simple, it is not easy…but this is the fun part of it!
Track Resources
@sqlserver
mvaMicrosoft Virtual Academy
SQL Server Website
Get Certified!
Hands-On Labs
Download Data Explorer
Download Geoflow
Windows Azure
msdn
Resources for Developers
http://microsoft.com/msdn
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
Resources for IT Professionals
http://microsoft.com/technet
Complete an evaluation on CommNet and enter to win!
Evaluate this session
Scan this QR code to evaluate this session and be automatically entered in a drawing to win a prize
© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.