window functions for data science
TRANSCRIPT
Window Functions for Data ScienceMARK TABLADILLO PH.D.
MICROSOFT MVP – ATLANTA, GA
FEBRUARY 2016
AbstractWindow functions are powerful analytic functions built into SQL Server. SQL Server 2005 introduced the core window ranking functions, and SQL Server 2012 added time and statistical percentage window functions. These functions allow for advanced variable creation, and are of direct benefit to people creating features for data science. This talk will also recommend further reading on this topic.
Required for Certification
What is a “Window”?Answer: A set of rows defined by the OVER clause
Set One
Set Two
Set Three
ORDER BY
Set One
Set Two
Set Three
PARTITION BY
What is a “function”?Function Example
Ranking ROW_NUMBER, RANK, DENSE_RANK, NTILE
Aggregate MIN, MAX, AVG, SUM, COUNT, STDEV, STDEVP, VAR, VARPCHECKSUM_AGG, COUNT_BIG
Analytic LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST
GROUP BY not required
Demo: Ranking
What is a “Window”?Answer: A set of rows defined by the OVER clause
Set One
Set Two
Set Three
ORDER BY
Set One
Set Two
Set Three
PARTITION BY
What is a “function”?Function Example
Ranking ROW_NUMBER, RANK, DENSE_RANK, NTILE
Aggregate MIN, MAX, AVG, SUM, COUNT, STDEV, STDEVP, VAR, VARP
Analytic LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST
GROUP BY not required
Order please
Demo: Aggregate
What is a “Window”?Answer: A set of rows defined by the OVER clause
Set One
Set Two
Set Three
ORDER BY
Set One
Set Two
Set Three
PARTITION BY
What is a “function”?Function Example
Ranking ROW_NUMBER, RANK, DENSE_RANK, NTILE
Aggregate MIN, MAX, AVG, SUM, COUNT, STDEV, STDEVP, VAR, VARP
Analytic LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST
GROUP BY not required
Framing: Order MattersTerm Operators
ROWS = Physical Operator (faster) UNBOUNDED PRECEDINGUNBOUNDED FOLLOWINGN PRECEDINGN FOLLOWINGCURRENT ROW
RANGE = Logical Operator (slower) UNBOUNDED PRECEDINGUNBOUNDED FOLLOWINGCURRENT ROW (RANGE?)
Default FrameFirst row of partition to current row
1 2 3 4 5 6 7 8 9
Current Row
Unbounded FollowingUnbounded Preceding
4 Preceding2 Following
1 Preceding 1 Following
ROW
Current Row
Unbounded Following
Unbounded Preceding
RANGE
1 2 3 4 5 6 7 8 9
12 23 34 45 50 50 50 50 65
Demo: Framing
What is a “Window”?Answer: A set of rows defined by the OVER clause
Set One
Set Two
Set Three
ORDER BY
Set One
Set Two
Set Three
PARTITION BY
What is a “function”?Function Example
Ranking ROW_NUMBER, RANK, DENSE_RANK, NTILE
Aggregate MIN, MAX, AVG, SUM, COUNT, STDEV, STDEVP, VAR, VARP
Analytic LAG, LEAD, FIRST_VALUE, LAST_VALUE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, CUME_DIST
GROUP BY not required
Demo: Analytic
Caveats ExistBut if Window Functions alone cannot do the job,
Then something else can
Logical Alternatives
Common Table Expressions (CTEs)
CROSS APPLY
Nondeterministic Functions =Not one unique way to do the jobhttps://msdn.microsoft.com/en-us/library/ms178091.aspx
CodeAvailable at https://github.com/marktab/windowfunctions2016