principles of database systems cse 544p lecture #1 september 28, 2011 1dan suciu -- p544 fall 2011

99
Principles of Database Systems CSE 544p Lecture #1 September 28, 2011 1 Dan Suciu -- p544 Fall 2011

Upload: edith-hardy

Post on 03-Jan-2016

222 views

Category:

Documents


3 download

TRANSCRIPT

Dan Suciu -- p544 Fall 2011 1

Principles of Database SystemsCSE 544p

Lecture #1September 28, 2011

Dan Suciu -- p544 Fall 2011 2

Staff

• Instructor: Dan Suciu– CSE 662, [email protected]– Office hours: Wednesdays, 5:30-6:20

• TAs: – Sandra Fan, [email protected]

Dan Suciu -- p544 Fall 2011 3

Communications

• Web page: http://www.cs.washington.edu/p544 – Lectures will be available here– Homework will be posted here– Announcements may be posted here

• Mailing list:– Announcements, group discussions– If you registered, you are automatically subscribed

Dan Suciu -- p544 Fall 2011 4

Textbook(s)

Main textbook:• Database Management Systems,

Ramakrishnan and Gehrke

Second textbook:• Database Systems: The Complete Book,

Garcia-Molina, Ullman, Widom

Dan Suciu -- p544 Fall 2011 5

Course Format

• Lectures Wednesdays, 6:30-9:20

• 7 Homework Assignments

• Take-home Final

Dan Suciu -- p544 Fall 2011 6

Grading

• Homework: 70 %

• Take-home Final: 30%

Dan Suciu -- p544 Fall 2011 7

Homework Assignments

1. SQL2. Conceptual design3. JAVA/SQL4. Transactions5. Database tuning6. XML/XPath/XQuery7. Pig Latin, on AWS

Due: Mondays’, by 11:59pm. Three late days per person

Dan Suciu -- p544 Fall 2011 8

Take-home Final

• Posted on December 8, at 11:59pm

• Due on December 10, by 10:00pm

• No late days/hours/minutes/seconds

Dan Suciu -- p544 Fall 2011 9

Software Tools• Postgres:

– Preferred usage: download from download http://www.postgresql.org/download/– Other option: use postgres on lab machines

• SQL Server 2008– Download client from http://msdnaa.cs.washington.edu – Username is your full @cs.washington.edu email address– Doesn’t work ? Email [email protected] – Connect to IPROJSRV (may need tunneling)– OK to use you own server, just import IMDB

• Xquery: download one interpreter from– Preferred: Saxon: http://saxon.sourceforge.net/ (from apache; very popular)– Others:

• Zorba: http://www.zorba-xquery.com/ (I used this one: ½ day installation)• Galax: http://galax.sourceforge.net/ (great in the past, seems less well maintained)

• Pig Latin: – We will run it on Amazon Web Services– You may download from http://hadoop.apache.org/pig/, but you won’t need it

Dan Suciu -- p544 Fall 2011 10

Accessing SQL Server

• SQL Server Management Studio• Server Type = Database Engine• Server Name = IPROJSRV• Authentication = SQL Server Authentication

– Login = your UW email address (not the CSE email)– Password = [in class]

• Must connect from within CSE, or must use tunneling• Alternatively: install your own, get it from MSDNAA

(see earlier slide)• Then play with IMDB, start working on HW 1

Dan Suciu -- p544 Fall 2011 11

Rest of Today’s Lecture

• Overview of DBMS

• Overview of the course content

• SQL

Dan Suciu -- p544 Fall 2011 12

Database

What is a database ?

Give examples of databases

Dan Suciu -- p544 Fall 2011 13

Database

What is a database ?• A collection of files storing related data

Give examples of databases• Accounts database; payroll database; UW’s

students database; Amazon’s products database; airline reservation database

Dan Suciu -- p544 Fall 2011 14

Database Management System

What is a DBMS ?

Give examples of DBMS

Dan Suciu -- p544 Fall 2011 15

Database Management System

What is a DBMS ?• A big C program written by someone else that allows

us to manage efficiently a large database and allows it to persist over long periods of time

Give examples of DBMS• DB2 (IBM), SQL Server (MS), Oracle, Sybase• MySQL, Postgres, …

SQL for Nerds, Greenspun, http://philip.greenspun.com/sql/ (Chap 1)

Dan Suciu -- p544 Fall 2011 16

Market Shares

From 2006 Gartner report:

• IBM: 21% market with $3.2BN in sales

• Oracle: 47% market with $7.1BN in sales

• Microsoft: 17% market with $2.6BN in sales

Dan Suciu -- p544 Fall 2011 17

An Example

The Internet Movie Databasehttp://www.imdb.com

• Entities: Actors (800k), Movies (400k), Directors, …

• Relationships:who played where, who directed what, …

Dan Suciu -- p544 Fall 2011 18

Key concept 1: Relational Data Model

Actor: Cast:

Movie:

id fName lName gender

195428 Tom Hanks M

645947 Amy Hanks F

. . .

id Name year

337166 Toy Story 1995

. . . . . . . ..

pid mid

195428 337166

. . .

Dan Suciu -- p544 Fall 2011 19

Key concept 2: Declarative Query LanguageSELECT *

FROM Actor

SELECT count(*)

FROM Actor

SELECT *

FROM Actor

WHERE lName = ‘Hanks’SQL

We write what we want,not how we want it.

20

Key concept 3: Data Independence

SELECT *

FROM Actor, Casts, Movie

WHERE lname='Hanks' and Actor.id = Casts.pid

and Casts.mid=Movie.id and Movie.year=1995

817k actors, 3.5M casts, 380k movies;

How can it be so fast ?

Physical data independence:query is independent of physical storage

Dan Suciu -- p544 Fall 2011 21

How Can We Evaluate the Query ?

Actor: Cast: Movie:

id fName lName gender

. . . Hanks

. . .

id Name year

. . . 1995

. . .

pid mid

. . .

. . .

Plan 1: . . . . [ in class ]

Plan 2: . . . . [ in class ]

Dan Suciu -- p544 Fall 2011 22

Actor Cast Movie

lName=‘Hanks’ year=1995

Actor Cast Movie

lName=‘Hanks’ year=1995

Indexes: on Actor.lName, on Movie.year

Alternative query plans:

Query optimizationDatabase Statistics histograms, synopses, etc

Dan Suciu -- p544 Fall 2011 23

Key concept 4: Transactions

X = Read(Account_1);X.amount = X.amount - 100;Write(Account_1, X);

Y = Read(Account_2);Y.amount = Y.amount + 100;Write(Account_2, Y);

CRASH !

What is the problem ?

Recovery from systems failures:Transfer $100 from account 1 to account 2:

Dan Suciu -- p544 Fall 2011 24

X = Read(Account);if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; }else error(“Insufficient funds”);

X = Read(Account);if (X.amount >= 100) { dispense_money( ); X.amount = X.amount – 100; }else error(“Insufficient funds”);

What can go wrong ?

Concurrency Control

Overdrafting an account:

User 1: User 2:

Dan Suciu -- p544 Fall 2011 25

Transactions

ACID =• Atomicity ( = recovery)• Consistency• Isolation ( = concurrency control)• Durability

Dan Suciu -- p544 Fall 2011 26

Client/Server Database Architecture

• Single server that stores the database• Many clients running apps and connecting to DBMS• Performance bottlenecks:

– Client/server communication– Transactional semantics

• Other architectures: – main memory database– replicated databases

Dan Suciu -- p544 Fall 2011 27

Two Types of Database Usage

• OLTP (online-transaction-processing)– Many updates– Many simple “point queries”– Few (or no) complex aggregate queries

• Decision-Support– Many aggregate/group-by queries.– Few (or no) updates

Dan Suciu -- p544 Fall 2011 28

Trends in Data Management

• Large scale data analytics: Map/Reduce, Pig, …• Cloud based database service: AWS, Azure, …• NoSQL: sacrifice ACID for performance• Data privacy• Data provenance• Complex data analytics: probabilistic databases

Dan Suciu -- p544 Fall 2011 29

Outline of Course Content

1. SQL2. Relational Calculus, Database Design3. Constraints, Views4. Transactions: recovery5. Transactions: concurrency control6. XML, XPath, XQuery7. Data storage, indexes, physical tuning8. Query execution9. Query optimization10. Big Data: Parallel databases, Map/Reduce, Pig Latin11. Advanced topics: privacy, provenance, probabilistic dbs

Dan Suciu -- p544 Fall 2011 30

Announcement: Homework 1

• Homework 1 is posted; • Due on Monday, Oct. 10• Tools:

– Postgres: install on your computer (PREFERRED) or use the installation in the lab

– SQL Server, for testing only; connect to IPROJSRV:login: your UW email address; password: ……..

• Tasks: create db, import data, create indices, write 11 SQL queries

Dan Suciu -- p544 Fall 2011 31

Outline for rest of today

• Basics SQL (Chapters 5.2, 5.3)• Aggregates (Chapter 5.5.)• Nulls, Outer joins (Chapter 5.6)• Subqueries (Chapters 5.4)

– This is tough ! Next lecture we will discuss Relational Calculus (a.k.a. Tuple Calculus, Chapter 4.3). See supplementary text Three Query Language Formalisms

Dan Suciu -- p544 Fall 2011 32

SQL

• Data Definition Language (DDL)– Create/alter/delete tables and their attributes– Read from the book

• Data Manipulation Language (DML)– Query tables, Insert/delete/modify – Discussed in class

Dan Suciu -- p544 Fall 2011 33

Tables in SQL

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

Attribute namesTable name

Tuples or rows

Key

Dan Suciu -- p544 Fall 2011 34

The Relational Data ModelData is stored in tables , a.k.a. relations

Each relation has:1. A schema = name+attributes

– Product(PName, Price, Category, Manufacturer)– Each relation has a key, which we underline

2. An instance = set of rows

SQL departs from the pure relational model in that it allows duplicate tuples• Set semantics bag semantics

{1, 2, 3} {1, 1, 2, 3, 3, 3}

Dan Suciu -- p544 Fall 2011 35

Data Types in SQL

• Atomic types:– Characters: CHAR(20), VARCHAR(50)– Numbers: INT, BIGINT, SMALLINT, FLOAT– Others: MONEY, DATETIME, …

• Record (aka tuple)– Has atomic attributes

• Table (relation)– A set of tuples

Dan Suciu -- p544 Fall 2011 36

Simple SQL Query

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT *FROM ProductWHERE category=‘Gadgets’

Product

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks“selection”

Dan Suciu -- p544 Fall 2011 37

Simple SQL QueryPName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT PName, Price, ManufacturerFROM ProductWHERE Price > ‘$100’

Product

PName Price Manufacturer

SingleTouch $149.99 Canon

MultiTouch $203.99 Hitachi

“selection” and

“projection”

Dan Suciu -- p544 Fall 2011 38

Details• Case insensitive:

SELECT = Select = select

Product = product

BUT: ‘Seattle’ ≠ ‘seattle’

• Constants:

‘abc’ - yes

“abc” - no

Dan Suciu -- p544 Fall 2011 39

Eliminating Duplicates

SELECT DISTINCT category

FROM Product

Compare to:

SELECT category

FROM Product

Category

Gadgets

Gadgets

Photography

Household

Category

Gadgets

Photography

Household

Dan Suciu -- p544 Fall 2011 40

Ordering the Results

SELECT pname, price, manufacturer

FROM Product

WHERE category=‘Gadgets’ AND price > ‘$10’

ORDER BY price, pname

Ties are broken by the second attribute on the ORDER BY list.

Ordering is ascending, unless you specify the DESC keyword.

Dan Suciu -- p544 Fall 2011 41

SELECT Category

FROM Product

ORDER BY PName

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

?SELECT DISTINCT category

FROM Product

ORDER BY category

SELECT DISTINCT category

FROM Product

ORDER BY PName

?

?

Dan Suciu -- p544 Fall 2011 42

Keys and Foreign Keys

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

Company

CName Country

GizmoWorks USA

Canon Japan

Hitachi Japan

Key

Foreignkey

Dan Suciu -- p544 Fall 2011 43

Joins

Product (PName, Price, Category, Manufacturer)

Company (CName,, Country)

Find all products under $200 manufactured in Japan;return their names and prices.

SELECT PName, PriceFROM Product, CompanyWHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’

Joinbetween Product

and Company

Dan Suciu -- p544 Fall 2011 44

Joins

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product Company

Cname Country

GizmoWorks USA

Canon Japan

Hitachi Japan

PName Price

SingleTouch $149.99

SELECT PName, PriceFROM Product, CompanyWHERE Manufacturer=CName AND Country=‘Japan’ AND Price <= ‘$200’

Dan Suciu -- p544 Fall 2011 45

Tuple Variables

SELECT DISTINCT name, countryFROM Person, CompanyWHERE worksfor = cname

Whichcountry ?

Product (PName, Price, Category, Manufacturer)

Company (CName,, Country)

Person(name, Country, Worksfor)

SELECT DISTINCT Person.name, Company.countryFROM Person, CompanyWHERE Person.worksfor = Company.cname

SELECT DISTINCT x.name, y.countryFROM Person AS x, Company AS yWHERE x.worksfor = y.cname

Dan Suciu -- p544 Fall 2011 46

In Class

Product (pname, price, category, manufacturer)

Company (cname, country)

Find all Chinese companies that manufacture products in the ‘toy’ category

SELECT cname

FROM

WHERE

Dan Suciu -- p544 Fall 2011 47

In Class

Product (pname, price, category, manufacturer)

Company (cname, country)

Find all Chinese companies that manufacture products both in the ‘electronic’ and ‘toy’ categories

SELECT cname

FROM

WHERE

Dan Suciu -- p544 Fall 2011 48

The Nested Loop Semantics of SQL Queries

SELECT a1, a2, …, ak

FROM R1 AS x1, R2 AS x2, …, Rn AS xnWHERE Conditions

Answer = {}

for x1 in R1 do

for x2 in R2 do …..

for xn in Rn do if Conditions

then Answer = Answer {(a1,…,ak)}return Answer

Dan Suciu -- p544 Fall 2011 49

SELECT DISTINCT R.A

FROM R, S, T

WHERE R.A=S.A OR R.A=T.A

Using the Formal Semantics

If S ≠ and T ≠ ∅ ∅then returns R Ç (S È T)else returns ∅

What do these queries compute ?

SELECT DISTINCT R.A

FROM R, S

WHERE R.A=S.A

Returns R Ç S

Dan Suciu -- p544 Fall 2011 50

Aggregation

SELECT count(*)FROM Product

Except count, all aggregations apply to a single attribute

SELECT sum(price)FROM ProductWHERE manufacturer=‘GizmoWorks’

SQL supports several aggregation operations:

sum, count, min, max, avg

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 51

COUNT applies to duplicates, unless otherwise stated:

SELECT count(category) FROM ProductWHERE price > ‘$20’

If category has no nulls,then count(category)=count(*)

We probably want:

SELECT count(DISTINCT category)FROM ProductWHERE price > ‘$20’

Aggregation: CountProduct (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 52

Grouping and Aggregation

SELECT manufacturer, count(*) AS totalFROM ProductWHERE price < ‘$200’GROUP BY manufacturer

Let’s see what this means…

For each manufacturer, find total number of its products under $200.

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 53

Grouping and Aggregation

1. Compute the FROM and WHERE clauses.

2. Group by the attributes in the GROUPBY

3. Compute the SELECT clause, including aggregates.

Dan Suciu -- p544 Fall 2011 54

1&2. FROM-WHERE-GROUPBY

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

SELECT manufacturer, count(*) AS totalFROM ProductWHERE price < ‘$200’GROUP BY manufacturer

Dan Suciu -- p544 Fall 2011 55

3. SELECTSELECT manufacturer, count(*) AS totalFROM ProductWHERE price < ‘$200’GROUP BY manufacturer

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

count(*) Manufacturer

2 GizmoWorks

1 Canon

Dan Suciu -- p544 Fall 2011 56

HAVING Clause

SELECT manufacturer, count(*) AS totalFROM ProductWHERE price < ‘$200’GROUP BY manufacturerHAVING min(price) >’$20’

Same query, except that we return only those manufacturersthat make only products with price > $20

HAVING clause contains conditions on aggregates.

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 57

General form of Grouping and Aggregation

SELECT SFROM R1,…,Rn

WHERE C1GROUP BY a1,…,ak

HAVING C2

S = may contain attributes a1,…,ak and/or any aggregates but NO OTHER ATTRIBUTES

C1 = is any condition on the attributes in R1,…,Rn

C2 = is any condition on aggregate expressions

Why ?

Dan Suciu -- p544 Fall 2011 58

General form of Grouping and Aggregation

Evaluation steps:1. Evaluate FROM-WHERE, apply condition C1

2. Group by the attributes a1,…,ak

3. Apply condition C2 to each group (may have aggregates)4. Compute aggregates in S and return the result

SELECT S

FROM R1,…,Rn

WHERE C1

GROUP BY a1,…,ak

HAVING C2

Dan Suciu -- p544 Fall 2011 59

NULLS in SQL

• Whenever we don’t have a value, we can put a NULL• Can mean many things:

– Value does not exists– Value exists but is unknown– Value not applicable– Etc.

• The schema specifies for each attribute if can be null (nullable attribute) or not

• How does SQL cope with tables that have NULLs ?

Dan Suciu -- p544 Fall 2011 60

Null Values

• If x= NULL then 4*(3-x)/7 is still NULL

• If x= NULL then x=‘Joe’ is UNKNOWN• In SQL there are three boolean values:

FALSE = 0UNKNOWN = 0.5TRUE = 1

Dan Suciu -- p544 Fall 2011 61

Null Values

• C1 AND C2 = min(C1, C2)• C1 OR C2 = max(C1, C2)• NOT C1 = 1 – C1

Rule in SQL: include only tuples that yield TRUE

SELECT *FROM PersonWHERE (age < 25) AND (height > 6 OR weight > 190)

E.g.age=20heigth=NULLweight=200

Dan Suciu -- p544 Fall 2011 62

Null Values

Unexpected behavior:

Some Persons are not included !

SELECT *FROM PersonWHERE age < 25 OR age >= 25

Dan Suciu -- p544 Fall 2011 63

Null Values

Can test for NULL explicitly:– x IS NULL– x IS NOT NULL

Now it includes all Persons

SELECT *FROM PersonWHERE age < 25 OR age >= 25 OR age IS NULL

Dan Suciu -- p544 Fall 2011

Outerjoins

64

SELECT x.country, y.pnameFROM Company x JOIN Product y ON x.cname = y.manufacturer

SELECT x.country, y.pnameFROM Company x, Product y WHERE x.cname = y.manufacturer

Same as:

But countries that don’t manufacture will not be listed !

Product (pname, price, category, manufacturer)

Company (cname, country)Normally, joins are “inner joins”:

Dan Suciu -- p544 Fall 2011

Outerjoins

65

SELECT x.country, y.pnameFROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturer

If we want to see the companies that don’t produce anything,then we use an outer join:

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 66

ProductCompanyPName Price Category

Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Cname Country

GizmoWorks USA

Canon Japan

Hitachi Japan

MuseumPass Vatican

Cname PName

USA GizmoWorks

USA GizmoWorks

Japan Canon

Japan Hitachi

Vatican NULL

Dan Suciu -- p544 Fall 2011

Application

67

SELECT x.country, count(*)FROM Company x, Product y WHERE x.cname = y.manufacturerGROUP BY x.country

What’s wrong ?

Product (pname, price, category, manufacturer)

Company (cname, country)

Compute the total number of products made by each country

Dan Suciu -- p544 Fall 2011

Application

68

SELECT x.country, count(y.pname)FROM Company x LEFT OUTER JOIN Product y ON x.cname = y.manufacturerGROUP BY x.country

Now we also get the products who sold in 0 quantity

Product (pname, price, category, manufacturer)

Company (cname, country)

Compute the total number of products made by each country

Note: we don’tuse count(*)

WHY ?

Dan Suciu -- p544 Fall 2011 69

Outer Joins

• Left outer join:– Include the left tuple even if there’s no match

• Right outer join:– Include the right tuple even if there’s no match

• Full outer join:– Include the both left and right tuples even if there’s no

match

Dan Suciu -- p544 Fall 2011

Subqueries

• A subquery is another SQL query nested inside a larger query

• Such inner-outer queries are called nested queries• A subquery may occur in:

1. A SELECT clause2. A FROM clause3. A WHERE clause

70

Rule of thumb: avoid writing nested queries when possible; sometimes it’s impossible

Dan Suciu -- p544 Fall 2011 71

1. Subqueries in SELECT

Product (pname, price, category, manufacturer)

Company (cname, country)

For each product return the country that manufactures it

SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer)FROM Product X

What happens if a subquery returns more than one country ?

Dan Suciu -- p544 Fall 2011 72

1. Subqueries in SELECT

Whenever possible, don’t use a nested queries:

= We have “unnested”the query

SELECT X.pname, (SELECT Y.country FROM Company Y WHERE Y.cname=X.manufacturer)FROM Product X

SELECT pname, countryFROM Product, CompanyWHERE cname=manufacturer

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 73

1. Subqueries in SELECT

Compute the number of products made by each country

SELECT DISTINCT x.country, (SELECT count(*) FROM Company y, Product WHERE y.cname=manufacturer and y.country = x.country)FROM Company x

Better: we can unnest by using a GROUP BY

Product (pname, price, category, manufacturer)

Company (cname, country)

SELECT x.country, count(*)FROM Company x, Product zWHERE x.cname = z.manufacturerGROUP BY x.country

74

GROUP BY v.s. Nested Quereis

SELECT manufacturer, count(*) AS totalFROM ProductWHERE price < '$200’GROUP BY manufacturer

SELECT DISTINCT x.manufacturer, (SELECT count(*) FROM Product y WHERE x.manufacturer = y.manufacturer AND price < '$200’) AS totalFROM Product xWHERE price < '$200’

Why twice ?Dan Suciu -- p544 Fall 2011

Dan Suciu -- p544 Fall 2011 75

2. Subqueries in FROM

Find all products whose prices is > 20 and < 30

SELECT *FROM (SELECT * FROM Product AS Y WHERE Y.price > ‘$20’) AS xWHERE x.price < ‘$30’

Unnest this query !

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 76

3. Subqueries in WHERE

Find all countries that make some products with price < 100

SELECT DISTINCT x.countryFROM Company xWHERE EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price < ‘$100’)

Existential quantifiers

Using EXISTS:

Product (pname, price, category, manufacturer)

Company (cname, country)

Correlated subqery:uses x from outer query

Dan Suciu -- p544 Fall 2011 77

3. Subqueries in WHERE

Find all countries that make some products with price < 100

Predicate Calculus (a.k.a. First Order Logic)

{ y | x.Company(x,y)∃ ∧( z. p. c.Product(z,p,c,x)∃ ∃ ∃ ∧p<100)}

Existential quantifiersProduct (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 78

3. Subqueries in WHERE

Find all countries that make some products with price < 100

SELECT DISTINCT countryFROM CompanyWHERE cname IN (SELECT Product.manufacturer FROM Product WHERE Product.price < ‘$100’)

Using IN

Existential quantifiersProduct (pname, price, category, manufacturer)

Company (cname, country)

De-correlated subqery

Dan Suciu -- p544 Fall 2011 79

3. Subqueries in WHERE

Find all countries that make some products with price < 100

SELECT DISTINCT Company.countryFROM CompanyWHERE ‘$100’ > ANY (SELECT price FROM Product WHERE manufacturer = cname)

Using ANY:

Existential quantifiersProduct (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 80

3. Subqueries in WHERE

Find all countries that make some products with price < 100

SELECT DISTINCT x.countryFROM Company x, Product yWHERE x.cname = y.manufacturer and y.price < ‘$100’

Existential quantifiers are easy !

Now let’s unnest it:

Existential quantifiersProduct (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 81

3. Subqueries in WHERE

Universal quantifiers are hard !

Find the countries of all companies that make only products with price < 100

Universal quantifiersProduct (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 82

3. Subqueries in WHERE

Predicate Calculus (a.k.a. First Order Logic)

{ y | x.Company(x,y)∃ ∧( z. p. c.Product(z,p,c,x)∀ ∀ ∀ p<100) }

Find the countries of all companies that make only products with price < 100

Universal quantifiersProduct (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 83

3. Subqueries in WHERE

{ y | x. Company(x,y)∃ ∧( z. p. c.Product(z,p,c,x)∀ ∀ ∀ p<100) }

De Morgan’s Laws:¬(A B) = ¬A ¬B∧ ∨¬(A B) = ¬A ¬B∨ ∧¬ x. P(x) = x. ¬ P(x)∀ ∃¬ x. P(x) = x. ¬ P(x)∃ ∀

{ y| x.Company(x,y)∃ ∧¬( z p. p.Product(z,p,c,x)∃ ∃ ∃ ∧p≥100) }

{ y | x. Company(x,y)) } −∃{ y | x. Company(x,y) ∃ ∧( z p. c.Product(z,p,c,x)∃ ∃ ∃ ∧p≥100) }

¬(A B) = A ¬B ∧

=

=

Dan Suciu -- p544 Fall 2011 84

3. Subqueries in WHERE

2. Find all companies s.t. all their products have price < 100

1. Find the other companies: i.e. s.t. some product 100

SELECT DISTINCT countryFROM CompanyWHERE cname IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’)

SELECT DISTINCT countryFROM CompanyWHERE cname NOT IN (SELECT manufacturer FROM Product WHERE price >= ‘$100’)

Dan Suciu -- p544 Fall 2011 85

3. Subqueries in WHERE

Find the countries of all companies that make only products with price < 100

Universal quantifiers

Using EXISTS:

SELECT DISTINCT x.countryFROM Company xWHERE NOT EXISTS (SELECT * FROM Product y WHERE y.manufacturer = x.cname and y.price >= ‘$100’)

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 86

3. Subqueries in WHERE

SELECT DISTINCT Company.countryFROM CompanyWHERE ‘$100’ > ALL (SELECT price FROM Product WHERE manufacturer = cname)

Using ALL:

Find the countries of all companies that make only products with price < 100

Universal quantifiers

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 87

Question for Database Fansand their Friends

• Can we unnest this query ?

Find the countries of all companies that make only products with price < 100

Dan Suciu -- p544 Fall 2011 88

Monotone Queries• A query Q is monotone if:

– Whenever we add tuples to one or more of the tables…– … the answer to the query cannot contain fewer tuples

• Fact: all unnested queries are monotone – Proof: using the “nested for loops” semantics

• Fact: A query a universal quantifier is not monotone

• Consequence: we cannot unnest a query with a universal quantifier

Dan Suciu -- p544 Fall 2011

Queries that must be nested

89

Rule of Thumb:Non-monotone queries cannot be unnested. In particular, queries with a universal quantifier cannot be unnested

Dan Suciu -- p544 Fall 2011 90

More SQL

Read the following commands in the book

• CREATE TABLE• INSERT• DELETE• UPDATE

They are easy; but we need/use them all the time in class, and in the homework assignments

Dan Suciu -- p544 Fall 2011 91

Advanced SQLizing

1. Unnesting Aggregates

2. Finding witnesses

Unnesting Aggregates

For each category, find the maximum price

SELECT DISTINCT X.category, (SELECT max(Y.price) FROM Product Y WHERE X.category = Y.category)FROM Product X

SELECT category, max(price)FROM ProductGROUP BY category

Equivalent queries

Note: no need for DISTINCT(DISTINCT is the same as GROUP BY)

Product (pname, price, category, manufacturer)

Company (cname, country)

Unnesting Aggregates

Find the number of products made in each country

SELECT DISTINCT X.country, (SELECT count(*) FROM Company Y, Product Z WHERE Y.cname=Z.manufacturer AND Y.country = X.country)FROM Company X

SELECT X.country, count(*)FROM Company X, Product YWHERE X.cname=Y.manufacturer GROUP BY X.country

They are NOTequivalent !

(WHY?)

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 94

More Unnesting

• Find authors who wrote ³ 10 documents:• Attempt 1: with nested queries

SELECT DISTINCT Author.nameFROM AuthorWHERE count(SELECT Wrote.url FROM Wrote WHERE Author.login=Wrote.login) > 10

This isSQL bya novice

Author(login,name)

Wrote(login,url)

Dan Suciu -- p544 Fall 2011 95

More Unnesting

• Find all authors who wrote at least 10 documents:

• Attempt 2: SQL style (with GROUP BY)

SELECT DISTINCT Author.nameFROM Author, WroteWHERE Author.login=Wrote.loginGROUP BY Author.nameHAVING count(wrote.url) > 10

This isSQL by

an expert

Dan Suciu -- p544 Fall 2011 96

Finding Witnesses

For each country, find its most expensive products

Product (pname, price, category, manufacturer)

Company (cname, country)

Finding Witnesses

SELECT x.country, max(y.price)FROM Company x, Product yWHERE x.cname = y.manufacturerGROUP BY x.country

Finding the maximum price is easy…

But we need the witnesses, i.e. the products with max price

For each country, find its most expensive products

Product (pname, price, category, manufacturer)

Company (cname, country)

Dan Suciu -- p544 Fall 2011 98

Finding Witnesses

SELECT u.country, v.pname, v.priceFROM Company u, Product v, (SELECT x.country, max(y.price) as mprice FROM Company x, Product y WHERE x.cname = y.manufacturer GROUP BY x.country) AS pWHERE u.country = p.country and v.price = p.mprice

To find the witnesses, compute the maximum pricein a subquery

Dan Suciu -- p544 Fall 2011 99

Finding Witnesses

There is a more concise solution here:

SELECT x.country, y.pname, y.priceFROM Company x, Product yWHERE x.cname = y.manufacturer and y.price >= ALL (SELECT z.price FROM Product z WHERE x.cname = z.manufacturer)

Product (pname, price, category, manufacturer)

Company (cname, country)