sql 101: an introduction to writing queries with sql at the university of minnesota (day 2)

45
SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2) Peter M. Radcliffe Senior Analyst College of Liberal Arts February 15 th , 2007

Upload: brian-cabrera

Post on 04-Jan-2016

25 views

Category:

Documents


2 download

DESCRIPTION

SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2). Peter M. Radcliffe Senior Analyst College of Liberal Arts February 15 th , 2007. Grouping. Another method for reducing information is to calculate aggregate or summary functions on the data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

SQL 101:An Introduction to writing queries

with SQL at the University of Minnesota (Day 2)

Peter M. RadcliffeSenior Analyst

College of Liberal ArtsFebruary 15th, 2007

Page 2: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

Grouping

• Another method for reducing information is to calculate aggregate or summary functions on the data

• Remembering our SQL skeleton, this adds new clauses: SELECT FROM WHERE GROUP BY HAVING ORDER BY

• We will return to HAVING later – first we will discuss adding aggregations to SELECT and using GROUP BY

Page 3: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

SELECT: Aggregate Functions

• Some aggregate functions are:1. SUM = Total of all values in the numeric expression2. AVG = Average of values in numeric expression3. COUNT = Number of non-null values in expression (can be

used with all data-types, not just numeric)4. COUNT(*) = The number of selected rows…always returns one

and only one value…the count of the rows in the column or expression)

5. MAX(expression) = The highest value (can be used with all data-types, not just numeric)

6. MIN(expression) = The lowest value (can be used with all data-types, not just numeric)

Page 4: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

SELECT: Aggregate Functions

• Aggregate Examples: SELECT AVG(tuition * 2)

• Returns the average tuition paid if tuition was doubled

SELECT AVG(DISTINCT tuition * 2)• If you include DISTINCT in your expression, it will eliminate

duplicate values before calculating the sum, average, or count of an expression.

SELECT MIN(descr)• The description value that is first alphabetically (e.g. “Accounting”)

SELECT MAX(descr)• The description value that is last alphabetically (e.g. “Zoology”)

Page 5: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

Aggregates and WHERE clause

• You cannot use aggregates in the WHERE clause, however, you can use WHERE to restrict the rows that are used in your aggregate calculation to specific conditions

• The HAVING clause is used to restrict rows based on aggregates

• EXAMPLE:

SELECT AVG(tuition_net_instr)

FROM sysadm.ps_dwta_stdnt_crse

WHERE institution = ‘umntc’

Page 6: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

SELECT: Aggregate Functions

• Aggregate Function Example (Question): What will be the difference between results for the following

statements (assuming data is being retrieved from same table source)?

1. SELECT COUNT(emplid)2. SELECT COUNT(DISTINCT emplid)3. SELECT DISTINCT COUNT(emplid)

1. Number of rows with valid values for emplid2. Number of unique values of emplid3. Same as #1, since count function is producing only one

value, therefore select distinct does not change results

Page 7: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

SELECT: Aggregates & Details

• You cannot simultaneously produce both individual results and aggregates.

• Solutions: You can run separate queries to get row-by-row data or

aggregate results. Join tables (addressed soon) where one table holds the

aggregate data and the other the detail data This can often be produced using a “subquery”, which we will

also cover later

Page 8: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

GROUP BY: Overview

• GROUP BY Optional Use to organize results by groups represented in column(s)

• GROUP BY works like DISTINCT, as it divides the table into groups and returns one row for each group

• Whenever you use GROUP BY, each item in the select list has to produce a single value per set.

Page 9: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

SELECT & GROUP BY

• The SELECT and GROUP BY clauses must work together All non-aggregate columns in SELECT must be in GROUP BY The reverse is not true, but generally a good idea That is, you can group by a column that is not in the SELECT

statement, but since it does not appear in the output, you will not know what aggregate corresponds to what grouping

Page 10: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

SELECT & GROUP BY Example

• Compare these queries, modifying our earlier examples:

1) SELECT count(*)FROM sysadm.ps_dwhr_jobWHERE jobcode = ‘9370’GROUP BY name

2) SELECT name, count(*)FROM sysadm.ps_dwhr_jobWHERE jobcode = ‘9370’GROUP BY name

Page 11: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

GROUP BY & SELECT DISTINCT

• GROUP BY creates a list of distinct values much like SELECT DISTINCT

• EXAMPLESELECT acad_planFROM ps_acad_plan_tblGROUP BY acad_plan

…produces the same results as…SELECT DISTINCT acad_planFROM ps_acad_plan_tbl

…because no aggregate function was specified

Page 12: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

GROUP BY: Multi-Column Groups

• Frequently, we want to define a group by multiple characteristics, for example combinations of Area, Fund, and Org in CUFS financial data.

• Groups can be defined by as many variables as desired, so long as all of the columns that form the group (the non-aggregated columns) are specified in the GROUP BY clause

• To display results sorted by the same columns in the same order, repeat the columns from the GROUP BY clause in the ORDER BY clause

• If a different sort order is desired, the GROUP BY and ORDER BY clauses can contain different columns in a different order

Page 13: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

GROUP BY: Multi-Column Groups

• Example: Instructional (75%) Tuition by Course Section

SELECT acad_org, acad_org_descr, subject, catalog_nbr, class_section, sum(tuition_net_instr) as instructional_tuition

FROM sysadm.ps_dwta_stdnt_crseWHERE institution = ‘UMNTC’ AND area_class = ‘TPUB’ AND term = ‘1069’GROUP BY acad_org, acad_org_descr, subject,

catalog_nbr, class_sectionORDER BY acad_org_descr, acad_org, subject,

catalog_nbr, class_section

Page 14: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

GROUP BY: Multi-Aggregate Groups

• Within any group, as many aggregates as desired may be calculated, including any combination of available functions (for example sums, averages, and counts)

• However, all the aggregations will take place at the same level (defined by the same group)

• It is therefore not possible to calculate, for example, collegiate and departmental totals in the same query

• We will address how to produce aggregates at different levels simultaneously when we discuss the CASE statement

Page 15: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

GROUP BY: Multi-Aggregate Groups

• Example: Registration Tuition Summary

SELECT acad_career, acad_level_bot,SUM(tuition_net_acad_group) as reg_tuition,SUM(unt_taken) as sch,COUNT(distinct emplid) as headcount

FROM sysadm.ps_dwta_stdnt_crseWHERE institution = ‘UMNTC’ AND acad_group_distr = ‘TPHR’ AND term = ‘1069’GROUP BY acad_career, acad_level_botORDER BY acad_career desc, acad_level_bot

Page 16: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

GROUP BY: Null Values

Null values in the grouping column(s)• Nulls are grouped together as a separate group.• A COUNT function will show the number of rows in that

group, and therefore the number of null values in the grouping column(s).

Null values in the aggregate function(s)• Most aggregate functions ignore nulls• A COUNT function, for example, will count only the non-null

rows.

Page 17: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

Grouping and Null Values

• You can categorize NULL values in a named group within a column with the following:

SELECT NVL(ethnic_group, “Unknown”), count(*)FROM PS_DWSA_STIX_1069GROUP BY ethnic_group

• Results: You will have coded all NULLS as Unknown and then counted the number of rows in each known ethnic category as well as the number of rows in the “Unknown” group containing NULLS.

Page 18: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

SELECT: Null Values

• To count the number of null values in a column:

SELECT ethnic_group, count(*)FROM ps_dwsa_demo_addrGROUP BY ethnic_group

OR

SELECT count(*)FROM ps_dwsa_demo_addrWHERE ethnic_group is null

Page 19: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

Including Null Values in Aggregates

• You can also include NULLS in your calculations by adding an OR comparison to the WHERE clause:

SELECT COUNT(*)

FROM sysadm.ps_dwsa_stix_1069

WHERE ethnic_cd = ‘Unknown’

OR ethnic_cd is NULL

• Note that the aggregate function must not itself involve calculations on null values (e.g. calculating an average)

Page 20: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

Connecting Multiple Queries

• UNION and UNION ALL

• Assembles data from any number of queries into a single result set

• Each query must have the same columns in their select statement, but the rest of the queries can vary

• UNION eliminates duplicate rows produced by the various queries

• UNION ALL leaves duplicate rows in the result set

Page 21: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

UNION Example

SELECT “Registration” as tuition_type, acad_career SUM(tuition_net_acad_prog) as tuition, SUM(ufees_acad_prog) as ufees, SUM(unt_taken) as sch, COUNT(DISTINCT emplid) as headcountFROM sysadm.ps_dwta_stdnt_crseWHERE term = ‘1069’ AND acad_group_distr = ‘TNUR’UNIONSELECT “Instruction” as tuition_type, acad_career, SUM(tuition_net_instr) as tuition, SUM(ufees_instr) as ufees, SUM(unt_taken) as sch, COUNT(DISTINCT emplid) as headcountFROM sysadm.ps_dwta_stdnt_crseWHERE term = ‘1069’ AND acad_group_distr = ‘TNUR’

Page 22: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

Working with Multiple Tables

• The information needed to answer some questions may not be contained in a single table

• To connect tables, a JOIN statement is used in the FROM clause

• Linking between tables Common column(s) Concordant product (all-to-all)

• Structure of joins One-to-One One-to-Many Many-to-Many One or Many to None

Page 23: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN: Overview

• Optional • Part of FROM clause used if you want to retrieve and

manipulate data from more than one table• JOIN works by connecting two or more tables by ‘joining’

them, generally on one or more common field, or JOIN column(s)

• If there are no common columns identified, a concordant product will be created, linking each row in each table to each row in the other table(s), creating massive tables that consume system resources

• On rare occasions, concordant product joins are created purposely

Page 24: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN: How it Works

• JOIN prompts the system to

1. Examine all possible combinations of rows from the two tables2. Eliminate the rows that are not specified in WHERE clause

• JOIN outlines the specifications that data must meet in order to be included in the query output

• If there are NULLS in the JOIN column, the JOIN will not include NULL values

• This means NULLS in one table will not be matched to NULLS in the other table

Page 25: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN: Types of Joins

• Types of joins1. Inner Join – Only rows that match in both tables2. Outer Join – All rows from one table and those that match in the

other table3. Full Join – All rows from both tables, connected where common

columns match

Columns from outer or full joins where no matching row was found will contain NULL values

We will also discuss “self joins”, where a table is joined to itself. These “self joins” can take on any of the above forms.

Page 26: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN Clause: Aliases

• When tables are joined, it is important to identify which table contains each of the columns in the SELECT clause

• Where a column appears in only one of the joined tables, the system will be able to find it without additional information

• However, where each table contains columns with the same name (such as the join conditions), the system needs to be told which table’s column to use

Page 27: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN Clause: Aliases

• The format for the columns in the SELECT statement is:

TABLENAME.COLUMN

• The standard convention for simplifying this process is to give each table a short, often single-letter alias, so the user does not have to type the entire table name before each column

• This alias is assigned by typing the desired alias immediately after the name of the table in the FROM and JOIN clauses

Page 28: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN Clause: Aliases

• Example:

FROM sysadm.ps_dwta_stdnt_crse a

INNER JOIN sysadm.ps_dwta_stdnt_tuition b

• The SELECT clause therefore looks like:

SELECT a.emplid, b.tuition_res

Instead of…

SELECT sysadm.ps_dwta_stdnt_crse.emplid, sysadm.ps_dwta_stdnt_tuition.tuition_res

Page 29: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN Clause: ON statement

• To identify the columns to use when matching rows from the tables to be joined, they can be listed immediately after the JOIN statement beginning with the keyword ON, and using the same logical operators as WHERE

FROM sysadm.ps_dwta_stdnt_crse aINNER JOIN sysadm.ps_dwta_stdnt_tuition b

ON a.emplid = b.emplidAND a.institution = b.institutionAND a.term = b.termAND a.acad_career = b.acad_career

WHERE a.institution = ‘UMNTC’

Page 30: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN Clause: ON or WHERE

• While the ON statement provides cleaner organization, the JOIN conditions can also be listed in the WHERE clause along with any additional selection criteria:

FROM sysadm.ps_dwta_stdnt_crse aINNER JOIN sysadm.ps_dwta_stdnt_tuition b

WHERE a.emplid = b.emplidAND a.institution = b.institutionAND a.term = b.termAND a.acad_career = b.acad_careerAND a.institution = ‘UMNTC’

Page 31: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Inner Join

• INNER JOIN clause (uses = or other logical operator): You want to view only data from two tables when the values on

the join column(s) are equal or satisfy the logical operator Any logical operator can be used to match records, for example:

‘=’, ‘>’, “>=’, ‘<’, ‘<=’, ‘!=’, or <>

Note that while many logical operators can be used, relationships other than = will produce many matches between the files, returning large numbers of rows

Given the size of the tables in the data warehouse, these joins may produce overwhelming results – proceed with caution in using logical operators other than =

Page 32: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Inner Join

• Join conditions (matching columns) can appear either in the WHERE clause or in a separate section of the JOIN clause beginning with the keyword “ON”

SELECT a.emplid, a.acad_prog_primary, b.tuition_res

FROM sysadm.ps_dwta_stdnt_crse a

INNER JOIN sysadm.ps_dwta_stdnt_tuition b

ON a.emplid = b.emplid

Page 33: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Inner Join Mechanics

• Joining two tables on the column “emplid”

Table 1 Table 21234567

3002847

2874502

7654321

4654785 7654321

8473621

9657513

1234567

7586424

Page 34: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Inner Join Example• INNER JOIN EXAMPLE: Tuition by residency of student

select 'Instruction' as tuit_type, a.acad_career, b.tuition_res,

sum(a.tuition_net_instr) as tuition, sum(a.unt_taken) as sch,

count(distinct a.emplid) as head

from sysadm.ps_dwta_stdnt_crse_pr a

inner join sysadm.ps_dwta_stdnt_tuition_pr b

on a.emplid = b.emplid

and a.institution = b.institution

and a.term = b.term

and a.acad_career = b.acad_career

where a.term = '1069'

and a.area_class = 'TMED'

and a.acad_career = 'UGRD'

group by a.acad_career, b.tuition_res

order by b.tuition_res, a.acad_career desc

Page 35: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Inner Join Results

Type Career Residency Tuition SCH Students

Instruction UGRD INST $255,642.54 14,080 1,608

Instruction UGRD MB $3,071.50 20 2

Instruction UGRD ND $58,014.14 334 35

Instruction UGRD OUTST $453,831.42 1,416 148

Instruction UGRD SD $66,725.31 381 40

Instruction UGRD WISC $578,741.71 3,825 403

Instructional Tuition by Academic Career and Tuition Residency

Page 36: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Inner Join Recap

• Inner Joins select only those rows in each table that have the same values on the columns specified in the join condition

• Null values in the join conditions are not matched to other records because NULL is not a specific value and therefore does not match to another NULL value

• Any rows in either table that are not matched on all of the join conditions will be dropped, even though the row may have perfectly valid and relevant data in one table

Page 37: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Outer Join

• OUTER JOINS keep all rows from one table and connect data from those rows of other table that match the base table on the join condition columns

• Critical to decide which table provides the base, therefore maintaining all its rows, and which table provides only selected rows

• It is often useful to use the NVL(column,value) command to replace NULL values on the secondary table with default values, e.g. “Missing”, “None”, or 0

Page 38: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN: Types of Joins

• TWO TYPES OF OUTER JOINS:

LEFT OUTER JOIN clause: • Include all rows from the first-named table and only the records in

the second-named table that meet the join condition (some systems represent this by using the symbol */ , the * representing the table that you want all of the data from).

RIGHT OUTER JOIN clause: • Include all rows from the second-named table and only the records

in the first-named table that meet the join condition (some systems represent this by using the symbol /* , the * representing the table that you want all of the data from).

Page 39: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

Outer Join Mechanics

• Left outer join on the column “emplid”

Table 1 Table 21234567

3002847

2874502

7654321

4654785 7654321

8473621

9657513

1234567

7586424

NULL

NULL

NULL

Page 40: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOIN: Outer Join Example

• OUTER JOIN EXAMPLE: SCH by Summer Term Grad Assistants

SELECT a.AREA, SUM(a.EXP_AMT) as PAYROLL, SUM(b.UNT_TAKEN) as SCH FROM PS_DWPY_MONTHLY_SUM_HIST_VW a LEFT OUTER JOIN PS_DWTA_STDNT_CRSE b ON a.EMPLID = b.EMPLID AND CONCAT('1',CONCAT((SUBSTR(a.FISC_YR,3,2)),'5')) = b.TERM WHERE a.AREA_CLASS = 'TCLA' AND a.FISC_YR = '2006' AND a.OBJ IN ('7000','7005','7010','7015') AND a.JOBCODE IN ('9571','9572','9573','9574','9575')GROUP BY a.AREAORDER BY a.AREA

Page 41: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Outer Join Results

AREA PAYROLL SCH

448 $3,227.74 NULL

449 $116,302.30 4

451 $305.00 NULL

452 $5,230.00 NULL

453 $42,567.40 12

454 $831.70 NULL

455 $34,179.20 NULL

456 $23,967.75 NULL

458 $14,554.94 120

SCH and Payroll Totals for Summer Term Grad Assistants

Page 42: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Full Outer Join

• FULL OUTER JOINS retain all rows from both tables, connecting those rows with matching data in the columns used for the join conditions in both tables

• Good for reconciliation where it is important to see what records appear in one table but not the other

• In general for the data warehouse, usefulness is limited because with the large tables, it is generally not desirable to maintain all the rows from both tables

Page 43: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Outer Join Recap

• OUTER JOINS are used to maintain some or all of the non-matching records

• OUTER JOINS can be LEFT, RIGHT, or FULL, depending on which table has the rows that are to be maintained

• It is critical to make sure there are no WHERE clause criteria that rely on the secondary table, as only those rows that can be matched will have valid values on the criteria columns, and therefore the OUTER JOIN will reduce to an equivalent INNER JOIN

• In general, OUTER JOINS are of limited usefulness with the data warehouse due to the extreme size of the tables

Page 44: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Multiple Joins Example

• Example from CLAIR application called IRIS:

FROM t_iris_reporting_crse_sctn r

LEFT OUTER JOIN t_iris_core_sir s

ON r.cla_xlist_sctn_id = s.cla_xlist_sctn_id

INNER JOIN t_iris_core_dict_term t

ON r.term = t.terM

INNER JOIN web_admin..t_cla_security_dict_cla_dept d

ON isnull(s.cla_acad_org_funding,r.cla_acad_org) = d.cla_acad_org

Page 45: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 2)

JOINS: Self Joins

• SELF JOINS involve connecting a table to another copy of itself

• SELF JOINS can take the form of INNER, OUTER, or FULL JOINS

• Can be done for many reasons, including creating a full list of units with details for some (OUTER JOIN) or connecting data from multiple rows that share common linking values (join conditions)

• The primary reason to use SELF JOINS with data warehouse tables, however, is for constructing SUBQUERIES