db2 for z/os is serious about analytics: leveraging sql to ... · pdf filequery execution...
TRANSCRIPT
Db2 for z/OS is Serious About Analytics:
Leveraging SQL to Analyze Your Data
Where it Resides
Maryela Weihrauch, IBM Distinguished Engineer,
© 2016 IBM Corporation12/7/2017World of Watson 2016 2
Notices and disclaimers
• Copyright © 2016 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.
• U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.
• Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.
• IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”
• Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.
• Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.
• References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.
• Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.
• It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.
© 2016 IBM Corporation12/7/2017World of Watson 2016 3
Notices and disclaimers continued
• Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
• The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.
• IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.
© 2016 IBM Corporation12/7/2017World of Watson 2016 4
Agenda
• Db2 and Analytics – The bigger picture
• Db2 12 Optimizer enhancements
• OLAP BIFs in DB2 and IDAA
– OLAP Specification
– Rolling Aggregations
– Grouping
– Math functions
© 2016 IBM Corporation12/7/2017World of Watson 2016 5
System z:
re-inventing
business around
an integrated data
lifecycle
▪ Reduced data movement, reduced complexity, reduced configuration resources
▪ More accurate, more secure, more available
▪ Better business response
VSAM
Db2
VSAM
IMS
DB2AA
HTAP -
Db2 with
Analytics
Accelerator
OLAP
Db
2 fo
r z/O
S
Pro
ce
ss
ing
IBM
Db
2 A
na
lytic
s A
cce
lera
tor
Db2 for z/OS CPU
savings target
• Operational (in
transaction) analytics
• (complex) OLTP
Accelerator focus
• Ad-hoc queries
• Complex queries
scanning large
amount of data
• ETL
acceleration/virtual
transformation
Complex queries (more history)
OLTP Transactions
High concurrency
Hybrid Transactional &
Analytical Processing
Standard reports
© 2016 IBM Corporation12/7/2017World of Watson 2016 7
Optimizer
Acce
lera
tor D
RD
AR
equ
esto
r
Application
Applicatio
n
Interface
Queries executed with Accelerator
Queries executed without Accelerator
Heartbeat (availability and performance indicators)
Query execution run-time for queries
that cannot be or should not be off-
loaded to Accelerator
SPU
MemorySPU
MemorySPU
MemorySPU
Memory
SM
PH
ost
Heartbeat
Db2 for z/OS
CPU FPGA
CPU FPGA
CPU FPGA
CPU FPGA
CPU FPGA
CPU FPGA
CPU FPGA
CPU FPGA
Accelerator
Query
Execution
Flow
© 2016 IBM Corporation12/7/2017World of Watson 2016 8
Db2 12 high-
level query
performance
focus
• Query focus based upon newer workloads (modern applications)
– Complex views or table UDFs
– UNION ALL
– Outer joins
– Join predicates with (stage 2) expressions
– CASE expressions, CAST functions, scalar functions
• Addressing general bottlenecks
– More sort avoidance, work file and sparse index usage optimization
– Reducing prepare cost and frequency
– List prefetch
– I/O performance
– Reduce unnecessary prefetch scheduling
• NOTE: 100% zIIP eligible for parallel child tasks
© 2016 IBM Corporation12/7/2017World of Watson 2016 9
Significant CPU
reduction in Db2
query workloads
• 30% - 90% reduction in elapsed time and CPU for SQL containing
– Complex outer joins, UNION ALL, UDFs and table UDFs
– Combinations of table expressions, views and outer joins
– VARGRAPHIC data type
– Disorganized data, poorly clustered indexes
– Nearly 100% new access paths vs. Db2 11
34%
66.00%
80%
85%
90%
0% 5% 10%15%20%25%30%35%40%45%50%55%60%65%70%75%80%85%90%
SAP CDS FIN
Cust1-Uncl
WAS Portal
SAP CDS Fiori
Cryst. Rpt.-Long
CPU time %
Complex real-time analytics - outer joins, UNION ALL, UDF/Table UDFs
Real-time analytics - large GROUP BY sorts/WF usage, sparse index on VARGRAPHIC
Reporting - poorly clustered index – list prefetch and hybrid join benefit
Complex OLTP -UNION ALL
Complex real-time analytics – outer joins, table expr, CASE, DGTTs
Db2 performanceworkloads
Db2 class 2 CPU reduction
© 2016 IBM Corporation12/7/2017World of Watson 2016 10
Significant CPU
reduction in Db2
query workloads..
15.70%
17.00%
18.00%
19.00%
23.77%
24.00%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90%
Cryst.Rpt-…
Cust1-Clus
TPCH-…
TPCH-Seq
Cust2
BIDAY-ShortLarge sorts for GROUP BY
Quick complex SQL
Sort intensive and unclustered (DEG=1)
Sort intensive and unclustered (DEG=ANY)
Sub-second complex analytics queries
Traditional query workload
Db2 performanceworkloads
Db2 class 2 CPU
reduction
▪ 15% - 25% reduction in elapsed time and CPU for more traditional query workloads
Sort intensive and work file processing
Lots of aggregating
Many new access paths vs. Db2 11
© 2016 IBM Corporation12/7/2017World of Watson 2016 11
Significant CPU
reduction in Db2
query workloads..
▪ 0%-15% reduction in elapsed time and CPU for
Simple query workloads
SQL where vast majority of CPU and elapsed time are due to data scanning (IDAA candidates)
As % drops - fewer new access paths vs. Db2 11
3.02%
4.52%
8.66%
11.00%
13.18%
0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90%
BIDAY-Long
SAP BW
TPCD
TPCH-SQLPL
Cust3 Traditional simple query application
Sort intensive, clustered version of TPCH
Lightweight query app.
Star Join, mostly scanning overhead (IDAA candidate)
Brute force data scanning (IDAA candidate)
Db2 performanceworkloads
Db2 class 2 CPU reduction
© 2016 IBM Corporation12/7/2017World of Watson 2016 12
• Rebind without APREUSE to see full potential
• Queries with new access paths consistently were the biggest winners
• Typical performance gains for new access path
– 10%-40% reduction in CPU and elapsed time (can be much higher)
• Typical performance gains with no access path change
– 0%-20% reduction in CPU and elapsed time
• Set DB2 zparm ACCELMODEL to enable accelerator modeling – Supports static and dynamic SQL– Explain will externalize reason if query is not eligible for acceleration– DB2 writes new fields (class 2 elapsed time and CPU time spent in
DB2 to execute SQL eligible for acceleration) in IFCID 3 and IFCID 148, similar to the SECP CPU field – needs to be formatted by a monitor
Give Db2 12
optimizer a
chance
© 2016 IBM Corporation12/7/2017World of Watson 2016 13
BIF-
Analytics through
SQL
V9 V10 V11 V12
RANK √
DENSE_RANK √
ROW_NUMBER √
Moving AVERAGE √
Moving SUM √
Grouping √
ROLLUP √
CUBE √
MEDIAN √
PERCENTILE √
LISTAGG* √
© 2016 IBM Corporation12/7/2017World of Watson 2016 14
OLAP
Specifications• OLAP specifications can be used to determine
ranking, row numbering, and aggregation as a
scalar value
• The OLAP specification is defined using a Window
– A window may specify three (3) components (each
is optional)
1. Partitioning – PARTITION BY CLAUSE
2. Ordering – ORDER BY clause
3. Aggregation Group – ROWS/RANGE
© 2016 IBM Corporation12/7/2017World of Watson 2016 15
Differences
between the
function types
Function Type
Action Examples
Scalar compute a single value for the current row
SUBSTR, CONCAT, LTRIM
Aggregate collapse a group of rows into a single row and compute a single value for this group
MAX, MIN, SUM, COUNT
Moving Aggregate
compute a single value for the current row based on some or all rows in a defined group
Aggregate function, like above, for a window
© 2016 IBM Corporation12/7/2017World of Watson 2016 16
RANK,
DENSE_RANK,
ROW_NUMBER
• RANK/DENSE_RANK
– Returns the ordinal rank of a row in the window
• RANK: may have a gap in the ranking
• DENSE_RANK: no gaps in the ranking
• ROW_NUMBER
– Returns the sequential row number
© 2016 IBM Corporation12/7/2017World of Watson 2016 17
RANK
Example
SELECT EMPNO, LASTNAME, SALARY+BONUS AS TOTAL_SALARY,
RANK() OVER(ORDER BY SALARY+BONUS DESC) AS RANK_SALARY
FROM DSN8A10.EMP WHERE SALARY+BONUS > 30000 ORDER BY RANK_SALARY;
+-----------------------------------------------------------+
| EMPNO | LASTNAME | TOTAL_SALARY | RANK_SALARY |
+-----------------------------------------------------------+
1_| 000010 | HAAS | 53750.00 | 1 |
2_| 200010 | HEMMINGER | 47500.00 | 2 |
3_| 000110 | LUCCHESI | 47400.00 | 3 |
4_| 000020 | THOMPSON | 42050.00 | 4 |
5_| 000050 | GEYER | 40975.00 | 5 |
6_| 000030 | KWAN | 39050.00 | 6 |
7_| 000070 | PULASKI | 36870.00 | 7 |
8_| 000060 | STERN | 32850.00 | 8 |
9_| 000220 | LUTZ | 30440.00 | 9 |
10_| 200220 | JOHN | 30440.00 | 9 |
11_| 000090 | HENDERSON | 30350.00 | 11 |
+-----------------------------------------------------------+
LUTZ & JOHN have same total salary therefore they get the same ranking
HENDERSON gets the rank based off of the number of rows that previously qualified the prior rank
© 2016 IBM Corporation12/7/2017World of Watson 2016 18
DENSE_RANK
– Rank based
on uniqueness
SELECT EMPNO, LASTNAME, SALARY+BONUS AS TOTAL_SALARY,
DENSE_RANK() OVER(ORDER BY SALARY+BONUS DESC) AS RANK_SALARY
FROM DSN8A10.EMP WHERE SALARY+BONUS > 30000 ORDER BY RANK_SALARY;
+-----------------------------------------------------------+
| EMPNO | LASTNAME | TOTAL_SALARY | RANK_SALARY |
+-----------------------------------------------------------+
1_| 000010 | HAAS | 53750.00 | 1 |
2_| 200010 | HEMMINGER | 47500.00 | 2 |
3_| 000110 | LUCCHESI | 47400.00 | 3 |
4_| 000020 | THOMPSON | 42050.00 | 4 |
5_| 000050 | GEYER | 40975.00 | 5 |
6_| 000030 | KWAN | 39050.00 | 6 |
7_| 000070 | PULASKI | 36870.00 | 7 |
8_| 000060 | STERN | 32850.00 | 8 |
9_| 000220 | LUTZ | 30440.00 | 9 |
10_| 200220 | JOHN | 30440.00 | 9 |
11_| 000090 | HENDERSON | 30350.00 | 10 |
+-----------------------------------------------------------+
HENDERSON gets a ranking based on the # of unique total salaries that precede it
© 2016 IBM Corporation12/7/2017World of Watson 2016 19
ROW_NUMBER
- Sequential
row number
SELECT EMPNO, LASTNAME, SALARY+BONUS AS TOTAL_SALARY,
ROW_NUMBER() OVER(ORDER BY SALARY+BONUS DESC) AS RANK_SALARY
FROM DSN8A10.EMP WHERE SALARY+BONUS > 30000 ORDER BY RANK_SALARY;
+-----------------------------------------------------------+
| EMPNO | LASTNAME | TOTAL_SALARY | RANK_SALARY |
+-----------------------------------------------------------+
1_| 000010 | HAAS | 53750.00 | 1 |
2_| 200010 | HEMMINGER | 47500.00 | 2 |
3_| 000110 | LUCCHESI | 47400.00 | 3 |
4_| 000020 | THOMPSON | 42050.00 | 4 |
5_| 000050 | GEYER | 40975.00 | 5 |
6_| 000030 | KWAN | 39050.00 | 6 |
7_| 000070 | PULASKI | 36870.00 | 7 |
8_| 000060 | STERN | 32850.00 | 8 |
9_| 000220 | LUTZ | 30440.00 | 9 |
10_| 200220 | JOHN | 30440.00 | 10 |
11_| 000090 | HENDERSON | 30350.00 | 11 |
+-----------------------------------------------------------+
© 2016 IBM Corporation12/7/2017World of Watson 2016 20
Moving AVG
Example
SELECT Territory, Month, Sales,
AVG(Sales) OVER (PARTITION BY Territory
ORDER BY Month
ROWS 2 PRECEDING) AS Moving_Avg
FROM Sales_History;
Territory Month Sales
East 199810 10
East 199811 4
East 199812 10
East 199901 7
East 199902 10
West 199810 8
West 199811 12
West 199812 11
West 199901 7
West 199902 6
Aggregation group
Compute the average sales over the current month and the
preceding two months, for each territory, and month
© 2016 IBM Corporation12/7/2017World of Watson 2016 21
Territory Month Sales Moving_Average
East 199810 10 10
East 199811 4 7
East 199812 10 8
East 199901 7 7
East 199902 10 9
West 199810 8 8
West 199811 12 10
West 199812 7 9
West 199901 11 10
West 199902 6 8
Moving average
in action
The results of the query illustrate the “moving average”.In the first row 10 is the average of only that row.The second row is the average of the first 2 rows 10 + 4.The third row and all of the following rows through to the end of the table are the average of the current row and the two preceding 10 + 4 + 10 divided by 3.
Note that the moving average resets on a change in territory starting at row 6 (change EAST to WEST).
© 2016 IBM Corporation12/7/2017World of Watson 2016 22
Moving SUM Example
SELECT Territory, Month, Sales,
SUM(Sales) OVER (PARTITION BY Territory
ORDER BY Month ASC
ROWS UNBOUNDED PRECEDING) AS Cumulative_Sum
FROM Sales_History;
Aggregation group
© 2016 IBM Corporation12/7/2017World of Watson 2016 23
Territory Month Sales Cumulative_Sum
East 199810 10 10
East 199811 4 14
East 199812 10 24
East 199901 7 31
East 199902 10 41
West 199810 8 8
West 199811 12 20
West 199812 7 27
West 199901 11 38
West 199902 6 44
Cumulative Sum
in action
© 2016 IBM Corporation12/7/2017World of Watson 2016 24
Grouping Sets –
GROUPING,
ROLLUP, CUBE
• Grouping sets allow multiple groups to be specified
• Querying and reporting are made easier and faster by
producing a single result from essentially performing a
UNION ALL of two or more groups of rows
• The group-by-clause has been enhanced:– grouping-sets
• Allows multiple grouping clauses to be specified
in a single statement
• Allows the groups to be computed with a single
pass over the data
• Can be used to determine subtotals and grand
totals
– super-groups
• Pre-defined grouping-sets : ROLLUP, CUBE
© 2016 IBM Corporation12/7/2017World of Watson 2016 25
ROLLUP• Creates subtotals that ‘roll up’ from the most detailed level to a
grand total
• N elements translate to N+1 grouping sets:
The order specified is significant to the result:
GROUP BY ROLLUP (C1,C2,C3)
GROUP BY GROUPING SETS ((C1,C2,C3),
(C1,C2),
(C1),
());
equivalent
GROUP BY ROLLUP (C1,C2)GROUP BY GROUPING SETS ((C1,C2),
(C1),
());
equivalent
GROUP BY ROLLUP (C2,C1)
GROUP BY GROUPING SETS ((C2,C1),
(C2),
());
equivalent
versus
© 2016 IBM Corporation12/7/2017World of Watson 2016 26
CUBE • Creates subtotals for all permutations
• N elements translate to 2n grouping sets:
• Unlike ROLLUP, the order specified is NOT significant to the
result therefore use ORDER BY to guarantee the order
• grand-total
– the overall aggregation of previous subtotals
– Both CUBE and ROLLUP return this row as the last row
GROUP BY CUBE (C1,C2,C3)GROUP BY GROUPING SETS ((C1,C2,C3),
(C1,C2), (C1,C3), (C2,C3),
(C1), (C2), (C3),
());
equivalent
8 groups
3 elements
grand-total
© 2016 IBM Corporation12/7/2017World of Watson 2016 29
Grouping Sets on
DSN8B10.PROJ
DEPTNO RESPEMP PRSTAFF
B01 000020 1.00
C01 000030 2.00
C01 000030 1.00
D01 000010 12.00
D01 000010 6.50
D11 000060 9.00
D11 000220 2.00
D11 000150 3.00
D11 000160 3.00
D21 000070 6.00
D21 000230 2.00
D21 000250 1.00
D21 000270 2.00
E01 000050 6.00
E01 000050 5.00
E11 000090 5.00
E21 000100 4.00
E21 000320 1.00
E21 000330 1.00
E21 000340 1.00
DEPTNO RESPEMP PRSTAFF
B01 000020 1.00
C01 000030 3.00
D01 000010 18.50
D11 000060 9.00
D11 000150 3.00
D11 000160 3.00
D11 000220 2.00
D21 000070 6.00
D21 000230 2.00
D21 000250 1.00
D21 000270 2.00
E01 000050 11.00
E11 000090 5.00
E21 000100 4.00
E21 000320 1.00
E21 000330 1.00
E21 000340 1.00
SELECT DEPTNO, RESPEMP,
SUM(PRSTAFF)
FROM DSN8B10.PROJ
GROUP BY DEPTNO,RESPEMP;
SELECT DEPTNO, RESPEMP,
SUM(PRSTAFF)
FROM DSN8B10.PROJ
GROUP BY
GROUPING SETS ((DEPTNO,RESPEMP));
17 rows after grouping20 rows originally
equivalent
© 2016 IBM Corporation12/7/2017World of Watson 2016 30
ROLLUP on DSN8B10.PROJ
B01 ? 1.00
C01 ? 3.00
D01 ? 18.50
D11 ? 17.00
D21 ? 11.00
E01 ? 11.00
E11 ? 5.00
E21 ? 7.00
? ? 73.50
DEPTNO RESPEMP PRSTAFF
B01 000020 1.00
C01 000030 3.00
D01 000010 18.50
D11 000060 9.00
D11 000150 3.00
D11 000160 3.00
D11 000220 2.00
D21 000070 6.00
D21 000230 2.00
D21 000250 1.00
D21 000270 2.00
E01 000050 11.00
E11 000090 5.00
E21 000100 4.00
E21 000320 1.00
E21 000330 1.00
E21 000340 1.00
SELECT DEPTNO, RESPEMP,
SUM(PRSTAFF)
FROM DSN8B10.PROJ
GROUP BY
ROLLUP (DEPTNO,RESPEMP);
SELECT DEPTNO, RESPEMP,
SUM(PRSTAFF)
FROM DSN8B10.PROJ
GROUP BY
GROUPING SETS ((DEPTNO,RESPEMP),
(DEPTNO),
());
26 rows after ROLLUP
Grand total of ALL groups
Subtotals for each
DEPTNO group
equivalent
Subtotals for each
DEPTNO, RESPEMP group
SELECT DEPTNO, RESPEMP, SUM(PRSTAFF)
FROM DSN8B10.PROJ
GROUP BY
GROUPING SETS ((DEPTNO,RESPEMP),
(DEPTNO),
(RESPEMP),
());
CUBE on
DSN8B10.PROJ
? 000010 18.50
? 000020 1.00
? 000030 3.00
? 000050 11.00
? 000060 9.00
? 000070 6.00
? 000090 5.00
? 000100 4.00
? 000150 3.00
? 000160 3.00
? 000220 2.00
? 000230 2.00
? 000250 1.00
? 000270 2.00
? 000320 1.00
? 000330
? 000340 1.00
? ? 73.50
DEPT.. RESP.. PR...
B01 000020 1.00
C01 000030 3.00
D01 000010 18.50
D11 000060 9.00
D11 000150 3.00
D11 000160 3.00
D11 000220 2.00
D21 000070 6.00
D21 000230 2.00
D21 000250 1.00
D21 000270 2.00
E01 000050 11.00
E11 000090 5.00
E21 000100 4.00
E21 000320 1.00
E21 000330 1.00
E21 000340 1.00
SELECT DEPTNO, RESPEMP, SUM(PRSTAFF)
FROM DSN8B10.PROJ
GROUP BY
CUBE (DEPTNO,RESPEMP);
equivalent
Grand total of ALL groups
B01 ? 1.00
C01 ? 3.00
D01 ? 18.50
D11 ? 17.00
D21 ? 11.00
E01 ? 11.00
E11 ? 5.00
E21 ? 7.00
Subtotals for each
DEPTNO group
Subtotals for
each RESPEMP
group
© 2016 IBM Corporation12/7/2017World of Watson 2016 32
PERCENTILE_CONT and PERCENTILE_DISC - Overview
• PERCENTILE_CONT and PERCENTILE_DISC are categorized as "inverse distribution
functions" in the SQL Standard that compute a percentile value within a given group.
• MEDIAN, which we know intuitively as producing the "middle" value can be equivalently
expressed as PERCENTILE_CONT(0.5), where the argument 0.5 indicates that we
want the function to return the "middle" percentage value.
© 2016 IBM Corporation12/7/2017World of Watson 2016 33
PERCENTILE_CONT vs PERCENTILE_DISC vs MEDIAN
• PERCENTILE_CONT treats the group as a "continuous distribution",
meaning that if the percentage falls between two values, the result is
interpolated between those two values.
– For example, the median (PERCENTILE_CONT(0.5)) of a group
of 6 numbers would be average of the 3rd and 4th number,
because we find the interpolated middle value between those
two.
• PERCENTILE_DISC treats the group as a set of discrete values. The
result is not interpolated and is always one of the values in the group.
PERCENTILE_DISC(0.5) over a group of 6 numbers would return the
3rd number.
• MEDIAN is equivalent with PERCENTILE_CONT(0.5).
Salary
2450
2975
1375
Salary
2450
2975
1250
PERCENTILE_CONT
PERCENTILE_DISC
VS.
PERCENTILE_CONT
and MEDIAN
• Suppose we would like to determine the salary, within each
department, which is at the 50th percentile
Salary
2450
2975
1375
Result:
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY SALARY)
FROM EMP
GROUP BY DEPT_NO;
SELECT MEDIAN(SALARY)
FROM EMP
GROUP BY DEPT_NO;
NAME SALARY DEPT_NO
Miller 1300 10
Clark 2450 10
King 5002 10
Smith 800 20
Adams 1100 20
Jones 2975 20
Scott 3000 20
Ford 3000 20
James 950 30
Martin 1250 30
Ward 1250 30
Turner 1500 30
Allen 1600 30
Blake 2850 30
MEDIAN == PERCENTILE(0.5)
PERCENTILE_DISC
Name Salary DEPT_NO
Miller 1300 10
Clark 2450 10
King 5002 10
Smith 800 20
Adams 1100 20
Jones 2975 20
Scott 3000 20
Ford 3000 20
James 950 30
Martin 1250 30
Ward 1250 30
Turner 1500 30
Allen 1600 30
Blake 2850 30
Salary
2450
2975
1250
Result:
Suppose we would like to determine the
actual salary value, within each department,
which is at the 50th percentile
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY SALARY)
FROM EMP
GROUP BY DEPT_NO;
© 2016 IBM Corporation12/7/2017World of Watson 2016 37
LISTAGG
Overview• The LISTAGG function is used to aggregate a set of
string values into a single string
• The values are appended based on the order specified by
the WITHIN GROUP clause
• The values may be separated by a separator
LISTAGG ALL
DID EName EID EBYear
10 Jack 0012 1983
10 Lily 0015 1990
20 Tom 0019 1983
20 Bob 0022 1976
20 Frank 0004 1983
20 Tom 0014 1985
30 Jerry 0028 1991
30 Chris 0021 1981
30 Jill 0002 1984
30 Jerry 0031 1984
30 Allan 0006 1995
DID Name_List
10 Jack; Lily
20 Bob; Frank; Tom; Tom
30 Chris; Jill; Jerry; Jerry; Allan
Result:
Output employee names, in ascending order, under
the same department according to their employee ID
and birth year from the EMP table
SELECT Dept_ID,
ListAgg(ALL E_Name, ‘; ’) WITHIN GROUP (ORDR BY Birth, Emp_ID)
AS Name_List
FROM EMP
GROUP BY Dept_ID;
*To be delivered in the service stream
LISTAGG DISTINCT
DID EName EID EBYear
10 Jack 0012 1983
10 Lily 0015 1990
20 Tom 0019 1983
20 Bob 0022 1976
20 Frank 0004 1983
20 Tom 0014 1985
30 Jerry 0028 1991
30 Chris 0021 1981
30 Jill 0002 1984
30 Jerry 0031 1984
30 Allan 0006 1995
DID Name_List
10 Jack; Lily
20 Bob; Frank; Tom
30 Chris; Jill; Jerry; Allan
Result:
Suppose we would like to output employee names, in ascending order, under the same department
according to their employee ID and birth year from the EMP table, eliminating any duplicate names
SELECT Dept_ID,
ListAgg(DISTINCT E_Name, ‘; ’) WITHIN GROUP (ORDR BY Birth, Emp_ID)
AS Name_List
FROM EMP
GROUP BY Dept_ID;
© 2016 IBM Corporation12/7/2017World of Watson 2016 40
Summary • Bringing analytics to the data is generally more
efficient than bringing data to analytics
– Especially true with ever growing amount of data
– Improve data governance, data quality, e2e
operational reliance
• Db2 12 Optimizer enhancements focus on emerging
SQL pattern
• OLAP BIFs in DB2 and IDAA
– Simple and efficient to find complex insight into data
Maryela Weihrauch IBM
Title
Db2 for z/OS is Serious About Analytics:
Leveraging SQL to Analyze Your Data
Where it ResidesPlease fill out your
session evaluation before
leaving!