comp 430 intro. to database systems grouping & aggregation slides use ideas from chris ré and...

32
COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Je Get clickers today!

Upload: alexandra-merritt

Post on 19-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Other aggregations Count Avg Max Min … What else depends on SQL version.

TRANSCRIPT

Page 1: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

COMP 430Intro. to Database

SystemsGrouping & Aggregation

Slides use ideas from Chris Ré and Chris Jermaine.

Get clickers today!

Page 2: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

One form of aggregation – Column totals

Candycandy_name price

Reese’s Cup 0.50

5th Avenue 1.20

Almond Joy 0.75

Jelly Babies 2.39

SELECT Sum(price)FROM Candy;

4.84

Page 3: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Other aggregations• Count• Avg• Max• Min• … What else depends on SQL version.

Page 4: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Aggregation + DISTINCT

Candycandy_name price

Reese’s Cup 0.50

5th Avenue 1.20

Almond Joy 0.75

Jelly Babies 2.39

Chocolate Orange 2.39

Jelly Nougats 0.50

SELECT Count(price), Count (DISTINCT price)FROM Candy;

6 4

Page 5: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Aggregation & NULL

Candycandy_name price

Reese’s Cup 0.50

5th Avenue NULL

Almond Joy 0.75

Jelly Babies 2.39

Chocolate Orange 2.39

Jelly Nougats NULL

SELECT Count(price)FROM Candy;

4

No price since no longer made in original form.

Page 6: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Counting rows

Candycandy_name price

Reese’s Cup 0.50

5th Avenue NULL

Almond Joy 0.75

Jelly Babies 2.39

Chocolate Orange 2.39

Jelly Nougats NULL

SELECT Count(*)FROM Candy;

6

Page 7: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Aggregation + other previous features

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.50 10

banana 10/10 1 10

bagel 10/25 1.50 20

SELECT Sum(price * quantity)FROM PurchaseWHERE product = ‘bagel’;

50

Page 8: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Grouping + Aggregation – Column subtotals

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.50 10

banana 10/10 1 10

bagel 10/25 1.50 20

SELECT product, Sum(price * quantity) AS subtotalFROM PurchaseGROUP BY product;

product date price quantity

bagel10/21 1 20

10/25 1.50 20

banana10/03 0.50 10

10/10 1 10

product subtotal

bagel 50

banana 15

Page 9: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Grouping + Aggregation

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.50 10

banana 10/10 1 10

bagel 10/25 1.50 20

SELECT product, Count(*) AS purchases, Sum(quantity) AS items, Sum(price * quantity) AS subtotalFROM PurchaseGROUP BY product;

Can only SELECT fields that you

GROUP BY.

product purchases items subtotal

bagel 2 40 50

banana 2 20 15

Page 10: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Conditions on aggregatesSELECT product, Sum(price * quantity) AS subtotalFROM PurchaseWHERE date > ‘10/05’GROUP BY productHAVING SUM(quantity) >= 10;

Condition on individual rows.

Condition on aggregates.

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.50 10

banana 10/10 1 10

bagel 10/25 1.50 20

apple 10/10 1 5

product date price quantity

bagel10/21 1 20

10/25 1.50 20

banana 10/10 1 10

apple 10/10 1 5

product subtotal

bagel 50

banana 10

Page 11: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Query: Products with at least two purchases.A. Count(product)B. Count(*)C. Sum(product)D. product

60%

0%3%

38%

SELECT productFROM PurchaseGROUP BY productHAVING ??? >= 2 ;

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.50 10

banana 10/10 1 10

bagel 10/25 1.50 20

apple 10/10 1 5

Page 12: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Sorting on aggregates

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.50 10

banana 10/10 1 10

bagel 10/25 1.50 20

SELECT product, Sum(price * quantity)FROM PurchaseGROUP BY productORDER BY Sum(price * quantity);

product date price quantity

bagel10/21 1 20

10/25 1.50 20

banana10/03 0.50 10

10/10 1 10

product subtotal

banana 15

bagel 50

Page 13: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Sorting on aggregates

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.50 10

banana 10/10 1 10

bagel 10/25 1.50 20

SELECT product, Sum(price * quantity) AS subtotalFROM PurchaseGROUP BY productORDER BY subtotal;

product date price quantity

bagel10/21 1 20

10/25 1.50 20

banana10/03 0.50 10

10/10 1 10

product subtotal

banana 15

bagel 50

Page 14: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Syntax summary

SELECT [DISTINCT] ComputationsFROM TablesWHERE Condition1

GROUP BY a1, …, ak

HAVING Condition2

ORDER BY attributesLIMIT n;

Uses attributes a1, …, ak or aggregates.

Condition on attributes.

Condition on aggregates.

Some SQLs: SELECT TOP(n) Computations

Can also use computation results.

Page 15: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Semantics summary

SELECT [DISTINCT] ComputationsFROM TablesWHERE Condition1

GROUP BY a1, …, ak

HAVING Condition2

ORDER BY attributesLIMIT n;

5. Computations, [eliminate duplicates]8. Projections9. Cross-product10. Apply Condition1

11. Group by attributes12. Apply Condition2 to each group6. Sort7. Use first n rows

What we’re they thinking?!?

Page 16: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

(Potentially confusing) details

Page 17: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

DISTINCT Sum() vs. Sum(DISTINCT)

SELECT DISTINCT Sum(DISTINCT value)FROM DataGROUP BY tag;

Dataitem tag value

1 a 2

2 a 2

3 b 1

4 b 3

5 c 4

?

Page 18: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

HAVING without grouping

SELECT value, COUNT(*) AS countFROM DataHAVING count > 1;

Dataitem tag value

1 a 2

2 a 2

3 b 1

4 b 3

5 c 4

?

Fails in SQLite.Without GROUP BY,

the entire results form one group.

Page 19: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Grouping without aggregation

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.5 10

banana 10/10 1 10

bagel 10/25 1.50 20

SELECT productFROM PurchaseGROUP BY product;

product

bagel

banana

Page 20: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

GROUP BY vs. DISTINCTSELECT productFROM PurchaseGROUP BY product;

SELECT DISTINCT productFROM Purchase;

But only GROUP BY works well with aggregation.

Purchaseproduct date price quantity

bagel 10/21 1 20

banana 10/03 0.5 10

banana 10/10 1 10

bagel 10/25 1.50 20

product

bagel

banana

Page 21: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

More complicated examples

Page 22: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Activity – will break activity into segments04a-aggregation.ipynb

• Students (id and name) taking more than 5 courses

• Courses and their average ratings• Highest average rating of a course• Course with highest average rating

Page 23: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Students with >5 courses – Two solutions

SELECT s_id, first_name, last_nameFROM StudentWHERE (SELECT Count(*) FROM Enrollment WHERE Student.s_id = Enrollment.s_id ) > 5;

SELECT s.s_id, s.first_name, s.last_nameFROM Student s, Enrollment eWHERE s.s_id = e.s_idGROUP BY Student.s_idHAVING COUNT(crn) > 5;

Page 24: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Efficiency comparisonSELECT s_id, first_name, last_nameFROM StudentWHERE (SELECT Count(*) FROM Enrollment WHERE Student.s_id = Enrollment.s_id ) > 5;

SELECT s.s_id, s.first_name, s.last_nameFROM Student s, Enrollment eWHERE s.s_id = e.s_idGROUP BY Student.s_idHAVING COUNT(crn) > 5;

Assuming no major optimizations:

1. How many times do we SELECT … FROM Enrollment?

2. How many joins?

Page 25: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Courses and their average ratings

SELECT crn, Avg(rating)FROM EnrollmentGROUP BY crn;

Page 26: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Highest average rating of a course

SELECT Max(avg_rating)FROM (SELECT Avg(rating) as avg_rating FROM Enrollment GROUP BY crn );

SELECT crn, Max(Avg(rating))FROM EnrollmentGROUP BY crn;

Nesting aggregate functions is not

syntactically allowed.

Page 27: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Highest avg rating – Solution 1SELECT crnFROM (SELECT crn, Avg(rating) AS avg_rating1 FROM Enrollment GROUP BY crn )WHERE avg_rating1 = (SELECT Max(avg_rating2) FROM (SELECT Avg(rating) as avg_rating2 FROM Enrollment GROUP BY crn ) );

1. Calculate all average ratings.

2. Calculate the maximum.

3. Find which courses have this value as their average.

Page 28: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

What about repeated subquery?

SELECT crnFROM (SELECT crn, Avg(rating) AS avg_rating1 FROM Enrollment GROUP BY crn )WHERE avg_rating1 = (SELECT Max(avg_rating2) FROM (SELECT Avg(rating) AS avg_rating2 FROM Enrollment GROUP BY crn ) );

CREATE VIEW Average ASSELECT crn, Avg(rating) AS avg_ratingFROM EnrollmentGROUP BY crn;

SELECT crnFROM AverageWHERE avg_rating = (SELECT Max(avg_rating) FROM Average );

Page 29: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Highest avg rating – Solution 2

CREATE VIEW Average ASSELECT crn, Avg(rating) AS avg_ratingFROM EnrollmentGROUP BY crn;

SELECT crnFROM AverageORDER BY avg_rating DESCLIMIT 1; Doesn’t work for ties.

Page 30: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Efficiency comparisonCREATE VIEW Averages ASSELECT crn, Avg(rating) AS avg_ratingFROM EnrollmentGROUP BY crn;

SELECT crnFROM AveragesORDER BY avg_ratingLIMIT 1;

SELECT crnFROM AveragesWHERE avg_rating = (SELECT Max(avg_rating) FROM Averages );Assuming no major optimizations:

Cost estimate of each?

Page 31: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Aside: Code Style & IndentationReview previous examples.

Page 32: COMP 430 Intro. to Database Systems Grouping & Aggregation Slides use ideas from Chris Ré and Chris Jermaine. Get clickers today!

Where we’re headed• So far:• Have seen a powerful subset of SQL queries.• Raised questions about how example schemas are structured.

• Next:• Switch to focusing on database design.

• Later:• Return to more SQL.