lecture #4 october 19, 2000 sql. administration exam date officially moved to december 7 th, 6:30pm,...

74
Lecture #4 October 19, 2000 SQL

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Lecture #4October 19, 2000

SQL

Administration

• Exam date officially moved to December 7th, 6:30pm, here.

• Homework #3 – will be on the web site tomorrow. Due in two weeks. SQL.

• Accounts are being set up.

• Projects, projects, projects.

SQL IntroductionStandard language for querying and manipulating data

Structured Query Language

Many standards out there: SQL92, SQL2, SQL3, SQL99Vendors support various subsets of these, but all of what we’llbe talking about.

Basic form: (many many more bells and whistles in addition)

Select attributes From relations (possibly multiple, joined) Where conditions (selections)

Selections

SELECT * FROM Company WHERE country=“USA” AND stockPrice > 50

You can use: attribute names of the relation(s) used in the FROM. comparison operators: =, <>, <, >, <=, >= apply arithmetic operations: stockprice*2 operations on strings (e.g., “||” for concatenation). Lexicographic order on strings. Pattern matching: s LIKE p Special stuff for comparing dates and times.

Projections

SELECT name AS company, stockprice AS price FROM Company WHERE country=“USA” AND stockPrice > 50

SELECT name, stock price FROM Company WHERE country=“USA” AND stockPrice > 50

Select only a subset of the attributes

Rename the attributes in the resulting table

Ordering the Results

SELECT name, stock price FROM Company WHERE country=“USA” AND stockPrice > 50 ORDERBY country, name

Ordering is ascending, unless you specify the DESC keyword.

Ties are broken by the second attribute on the ORDERBY list, etc.

Joins

SELECT name, store FROM Person, Purchase WHERE per-name=buyer AND city=“Seattle” AND product=“gizmo”

Product ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)

Disambiguating Attributes

SELECT Person.name FROM Person, Purchase, Product WHERE Person.name=buyer AND product=Product.name AND Product.category=“telephony”

Product ( name, price, category, maker)Purchase (buyer, seller, store, product)Person( name, phone number, city)

Find names of people buying telephony products:

Tuple Variables

SELECT product1.maker, product2.maker FROM Product AS product1, Product AS product2 WHERE product1.category=product2.category AND product1.maker <> product2.maker

Product ( name, price, category, maker)

Find pairs of companies making products in the same category

First Unintuitive SQLismSELECT R.AFROM R,S,TWHERE R.A=S.A OR R.A=T.A

Looking for R (S T)

But what happens if T is empty?

Union, Intersection, Difference(SELECT name FROM Person WHERE City=“Seattle”)

UNION

(SELECT name FROM Person, Purchase WHERE buyer=name AND store=“The Bon”)

Similarly, you can use INTERSECT and EXCEPT.You must have the same attribute names (otherwise: rename).

Exercises

Product ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)

Ex #1: Find people who bought telephony products.Ex #2: Find names of people who bought American productsEx #3: Find names of people who bought American products and did not buy French productsEx #4: Find names of people who bought American products and they live in Seattle.Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

SubqueriesSELECT Purchase.productFROM PurchaseWHERE buyer = (SELECT name FROM Person WHERE social-security-number = “123 - 45 - 6789”);

In this case, the subquery returns one value.

If it returns more, it’s a run-time error.

Subqueries Returning Relations

SELECT Company.name FROM Company, Product WHERE Company.name=maker AND Product.name IN (SELECT product FROM Purchase WHERE buyer = “Joe Blow”);

Find companies who manufacture products bought by Joe Blow.

You can also use: s > ALL R s > ANY R EXISTS R

Conditions on Tuples

SELECT Company.name FROM Company, Product WHERE Company.name=maker AND (Product.name,price) IN (SELECT product, price) FROM Purchase WHERE buyer = “Joe Blow”);

Correlated Queries

SELECT title FROM Movie AS Old WHERE year < ANY (SELECT year FROM Movie WHERE title = Old.title);

Movie (title, year, director, length) Movie titles are not unique (titles may reappear in a later year).

Find movies whose title appears more than once.

Note scope of variables

Removing Duplicates

SELECT DISTINCT Company.name FROM Company, Product WHERE Company.name=maker AND (Product.name,price) IN (SELECT product, price) FROM Purchase WHERE buyer = “Joe Blow”);

Conserving Duplicates

(SELECT name FROM Person WHERE City=“Seattle”)

UNION ALL

(SELECT name FROM Person, Purchase WHERE buyer=name AND store=“The Bon”)

The UNION, INTERSECTION and EXCEPT operators operate as sets, not bags.

Grouping and Aggregation

Product Date Price Quantity

Bagel 10/21 0.85 15

Banana 10/22 0.52 7

Banana 10/19 0.52 17

Bagel 10/20 0.85 20

Example 1: find total sales for the entire database

Simple Aggregation

SELECT Sum(price * quantity)FROM Purchase

SELECT Sum(price * quantity)FROM PurchaseWHERE product = ‘bagel’

SQL supports several aggregation operations:

SUM, MIN, MAX, AVG, COUNT

Except COUNT, all aggregations apply to a single attribute

Grouping and Aggregation

Product Date Price Quantity

Bagel 10/21 0.85 15

Banana 10/22 0.52 7

Banana 10/19 0.52 17

Bagel 10/20 0.85 20

Example 2: find total sales per product.

Solution: Two Steps

Product Date Price Quantity

Banana 10/19 0.52 17

Banana 10/22 0.52 7

Bagel 10/20 0.85 20

Bagel 10/21 0.85 15

Example 2: find total sales per product.

First: group the entries by product.

Then, aggregate

Product TotalSales

Bagel $29.75

Banana $12.48

SELECT product, Sum(price * quantity) AS TotalSalesFROM PurchaseGROUP BY product

Another Example

SELECT product, Sum(price * quantity) AS SumSales Max(quantity) AS MaxQuantityFROM PurchaseGROUP BY product

For every product, what is the total sales and max quantity sold?

Product SumSales MaxQuantity

Banana $12.48 17

Bagel $29.75 20

Grouping and Aggregation: Summary

SELECT product, Sum(price)FROM Product, PurchaseWHERE Product.name = Purchase.productGROUP BY Product.name

1. Compute the relation (I.e., the FROM and WHERE).2. Group by the attributes in the GROUP BY3. Select one tuple for every group (and apply aggregation)

SELECT can have (1) grouped attributes or (2) aggregates.

HAVING Clause

SELECT product, Sum(price * quantity)FROM PurchaseGROUP BY productHAVING Sum(quantity) > 30

Same query, except that we consider only products that hadat least 100 buyers.

HAVING clause contains conditions on aggregates.

Exercises

Product ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)

Ex #6: What is the total sales of products by country of manufacture?

Ex #7: Find the total revenue from products of which 100,000 units or more have been sold.

Modifying the Database

We have 3 kinds of modifications: insertion, deletion, update.

Insertion: general form -- INSERT INTO R(A1,…., An) VALUES (v1,…., vn)

Insert a new purchase to the database:

INSERT INTO Purchase(buyer, seller, product, store) VALUES (‘Joe’, ‘Fred’, ‘wakeup-clock-espresso-machine’, ‘The Sharper Image’)

If we don’t provide all the attributes of R, they will be filled with NULL.

We can drop the attribute names if we’re providing all of them in order.

More Interesting Insertions

INSERT INTO PRODUCT(name)

SELECT DISTINCT product FROM Purchase WHERE product NOT IN (SELECT name FROM Product)

The query replaces the VALUES keyword.

Note the order of querying and inserting.

Deletions

DELETE FROM PURCHASE

WHERE seller = ‘Joe’ AND product = ‘Brooklyn Bridge’

Factoid about SQL: there is no way to delete only a single occurrence of a tuple that appears twice in a relation.

Updates

UPDATE PRODUCTSET price = price/2WHERE Product.name IN (SELECT product FROM Purchase WHERE Date =‘Oct, 25, 2000’);

Data Definition in SQL

So far, SQL operations on the data.

Data definition: defining the schema.

• Create tables• Delete tables• Modify table schema

But first: Define data types.

Finally: define indexes.

Data Types in SQL

• Character strings (fixed of varying length)• Bit strings (fixed or varying length)• Integer (SHORTINT)• Floating point• Dates and times

Domains will be used in table declarations.

To reuse domains:CREATE DOMAIN address AS VARCHAR(55)

You wish you had richer types, no? (hang on…)

Creating Tables

CREATE TABLE Person(

name VARCHAR(30), social-security-number INTEGER, age SHORTINT, city VARCHAR(30), gender BIT(1), Birthdate DATE

);

Deleting or Modifying a Table

Deleting: DROP Person;

Altering: (adding or removing an attribute).

ALTER TABLE Person ADD phone CHAR(16);

ALTER TABLE Person DROP age;

What happens when you make changes to the schema?

Default Values

The default of defaults: NULL

Specifying default values:

CREATE TABLE Person(

name VARCHAR(30), social-security-number INTEGER, age SHORTINT DEFAULT 100, city VARCHAR(30) DEFAULT ‘Seattle’, gender CHAR(1) DEFAULT ‘?’, Birthdate DATE

Indexes

REALLY important to speed up query processing time.

Suppose we have a relation

Person (name, SSN, age, city)

An index on SSN enables us to fetch a tuple for a given SSN very efficiently (not have to scan the whole relation).

The problem of deciding which indexes to put on the relations isvery hard! (it’s called: physical database design).

Creating Indexes

CREATE INDEX ssnIndex ON Person(SSN)

Indexes can be created on more than one attribute:

CREATE INDEX doubleindex ON Person (age, city)

Why not create indexes on all the attributes?

Defining Views (!!)Views are relations, except that they are not physically stored.They are used to:• simplify complex queries, and• define distinct conceptual interfaces for different classes of users.

Example view: purchases of telephony products.

CREATE VIEW telephony-purchases AS SELECT product, buyer, seller, store FROM Purchase, Product WHERE Purchase.product = Product.pname AND Product.category = ‘telephony’ The view is materialized when its results are stored in the DBMS.

A Different ViewCREATE VIEW Seattle-view AS

SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer

We can later use the views: SELECT name, store FROM Seattle-view, Product WHERE Seattle-view.product = Product.name AND Product.category = ‘shoes’

What’s really happening when we query a view?? It’s unfolded.

Updating ViewsHow can I insert a tuple into a table that doesn’t exist?

CREATE VIEW bon-purchase AS SELECT store, seller, product (note: buyer is not selected) FROM Purchase WHERE store = ‘The Bon Marche’

If we make the following insertion:

INSERT INTO bon-purchase VALUES (‘the Bon Marche’, ‘Joe’, ‘Denby Mug’)

We can simply add a tuple (‘the Bon Marche’, ‘Joe’, NULL, ‘Denby Mug’) to relation Purchase.

Non-Updatable ViewsCREATE VIEW Seattle-view AS

SELECT seller, product, store FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.name = Purchase.buyer

How can we add the following tuple to the view? (‘Joe’, ‘Shoe Model 12345’, ‘Nine West’)In principle, two tuples should be added to the database:

Person: (foo, NullPhoneNumber, ‘Seattle’)Purchase: (foo, ‘Joe’, ‘Nine West’, ‘Shoe Model 12345’)

But it’s very hard to manage the foo’s later, so this update is not legal.

Reusing a Materialized View• Suppose I have only the result of SeattleView:SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer

• and I want to answer the query SELECT buyer, seller FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer AND Purchase.product=‘gizmo’.

Then, I can rewrite the query using the view.

Query Rewriting Using Views

Rewritten query: SELECT buyer, seller FROM SeattleView WHERE product= ‘gizmo’

Original query:SELECT buyer, seller FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer AND Purchase.product=‘gizmo’.

Another Example• I still have only the result of SeattleView:SELECT buyer, seller, product, store FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer

• but I want to answer the query SELECT buyer, seller FROM Person, Purchase WHERE Person.city = ‘Seattle’ AND Person.per-name = Purchase.buyer AND Person.Phone LIKE ‘206 543 %’.

The General Problem

• Given a set of views V1,…,Vn, and a query Q, can we answer Q using only the answers to V1,…,Vn?

• Why do we care?– We can answer queries more efficiently. – We can query data sources on the WWW in a

principled manner.

• Many, many papers on this problem.• The best performing algorithm: The MiniCon

Algorithm, (Pottinger & Levy, 2000).• Great survey on the topic: (Halevy, 2000).

Querying the WWW• Assume a virtual schema of the WWW, e.g.,

– Course(number, university, title, prof, quarter)

• Every data source on the web contains the answer to a view over the virtual schema:

UW database: SELECT number, title, prof FROM Course WHERE univ=‘UW’ AND quarter=‘4/99’Stanford database: SELECT number, title, prof, quarter FROM Course WHERE univ=‘Stanford’User query: find all professors who teach “database systems”

SQL: Constraints and Triggers

• Chapter 6 Ullman and Widom• Certain properties we’d like our database to

hold• Modification of the database may break

these properties• Build handlers into the database definition• Key constraints• Referential integrity constraints.

Keys: Fundamental Constraint

• In the CREATE TABLE statement, use:– PRIMARY KEY, UNIQUE

CREATE TABLE MovieStar (

name CHAR(30) PRIMARY KEY,

address VARCHAR(255),

gender CHAR(1));

• Or, list at end of CREATE TABLEPRIMARY KEY (name)

Keys...

• Can use the UNIQUE keyword in same way– …but for any number of attributes

– foreign key only reference PRIMARY KEY

• Indexing KeysCREATE UNIQUE INDEX YearIndex ON Movie(year)

• Makes insertions easier to check for key constraints

• Subtle differences between PRIMARY KEY and UNIQUE

Foreign Key ConstraintsActedIn(ActorName, MovieName)Movies(MovieName, year)If MovieName in ActedIn is a foreign key for

Movies, then: 1) Foreign Key must be a reference to a valid

value in the referenced table.2) … must be a PRIMARY KEY in the

referenced table.Yes, this is a referential integrity constraint.

Declaring Foreign Key Constraints

• REFERENCES keyword...CREATE TABLE ActedIn (

Name CHAR(30) PRIMARY KEY,

MovieName CHAR(30)

REFERENCES Movies(MovieName));

• Or, summarize at end of CREATE TABLEFOREIGN KEY MovieName REFERENCES

Movies(MovieName)

• MovieName must be a PRIMARY KEY

How do we Maintain them?

• Given a change to DB, there are several possible violations:– Insert new tuple with bogus foreign key value– Update a tuple to a bogus foreign key value– Delete a tuple in the referenced table with the

referenced foreign key value– Update a tuple in the referenced table that

changes the referenced foreign key value

How to Maintain?• Recall, ActedIn has FK MovieName...

Movies(MovieName, year)

(Fatal Attraction, 1987)

ActedIn(ActorName, MovieName)

(Michael Douglas, Fatal Attraction)

insert: (Rick Moranis, Strange Brew)

How to Maintain?• Policies for handling the change…

– Reject the update (default)– Cascade (example: cascading deletes)– Set NULL

• Can set update and delete actions independently in CREATE TABLE

MovieName CHAR(30)

REFERENCES Movies(MovieName))

ON DELETE SET NULL

ON UPDATE CASCADE

Constraining Attribute Values

• Constrain invalid values– NOT NULL– gender CHAR(1)

CHECK (gender IN (‘F’, ‘M’))– MovieName CHAR(30)

CHECK (MovieName IN (SELECT MovieName FROM Movies))

• The constraint is checked whenever the value of the attribute is changed or added.

• Isn’t this the same as REFERENCE?

Constraining Values with User Defined ‘Types’

• Can define new domains to use as the attribute type...CREATE DOMAIN GenderDomain CHAR(1)

CHECK (VALUE IN (‘F’, ‘M’));

• Then update our attribute definition...gender GenderDomain

• Note use of VALUE to refer to the attribute value.• Deferred constraints: deal with circular

referencial integrity constraints.

More Complex Constraints...

• …Among several attributes in one table– Specify at the end of CREATE TABLE

CHECK (gender = ‘F’ OR name NOT LIKE ‘Ms.%’)

• Checked whenever a tuple of the relation is added or updated.

• Note, changes in other places in the database may cause the constraint to be violated. If this is important, use assertions.

Declaring Assertions

• CREATE ASSERTION <name> CHECK (<condition>)

CREATE ASSERTION RichPres CHECK

(NOT EXISTS

(SELECT *

FROM Studio, MovieExec

WHERE presC# = cert#

AND netWorth < 10000000))

Checked whenever any change occurs to the database.

Another Example

CREATE ASSERTION SumLength

CHECK (10000 < ALL (SELECT SUM(length) FROM Movie GROUP BY studioName)).

Can we also write this as a tuple constraint?

Different Constraint TypesType Where Declared When activated Guaranteed to hold?

Attribute with attribute on insertion not if CHECK or update subquery

Tuple relation schema insertion or not if CHECK update to subquery relation

Assertion database schema on change to Yes any relation mentioned

TriggersEnable the database programmer to specify:• when to check a constraint,• what exactly to do.

A trigger has 3 parts:

• An event (e.g., update to an attribute)• A condition (e.g., a query to check)• An action (deletion, update, insertion)

When the event happens, the system will check the constraint, and if satisfied, will perform the action.

NOTE: triggers may cause cascading effects. Database vendors did not wait for standards with triggers!

Elements of Triggers (in SQL3)• Timing of action execution: before, after or instead of triggering event

• The action can refer to both the old and new state of the database.

• Update events may specify a particular column or set of columns.

• A condition is specified with a WHEN clause.

• The action can be performed either for• once for every tuple, or• once for all the tuples that are changed by the database operation.

Example: Row Level Trigger

CREATE TRIGGER NoLowerPrices

AFTER UPDATE OF price ON ProductREFERENCING OLD AS OldTuple NEW AS NewTupleWHEN (OldTuple.price > NewTuple.price) UPDATE Product SET price = OldTuple.price WHERE name = NewTuple.name

FOR EACH ROW

Statement Level Trigger CREATE TRIGGER average-price-preserveINSTEAD OF UPDATE OF price ON Product

REFERENCING OLD_TABLE AS OldStuff NEW_TABLE AS NewStuffWHEN (1000 < (SELECT AVG (price) FROM ((Product EXCEPT OldStuff) UNION NewStuff))DELETE FROM Product WHERE (name, price, company) IN OldStuff;INSERT INTO Product (SELECT * FROM NewStuff)

Bad Things Can HappenCREATE TRIGGER Bad-trigger

AFTER UPDATE OF price IN ProductREFERENCING OLD AS OldTuple NEW AS NewTuple

WHEN (NewTuple.price > 50)

UPDATE Product SET price = NewTuple.price * 2 WHERE name = NewTuple.name

FOR EACH ROW

Embedded SQL

• Direct SQL is rarely used: usually, SQL is embedded in some application code.

• We need some method to reference SQL statements.

• But: there is an impedance mismatch problem.• So: we use cursors.• Many things can be explained with the impedance

mismatch.

Programs with SQL

Host language + Embedded SQL

Preprocessor

Host Language + function calls

Host language compiler

Host language program

Preprocessor

Host language compiler

The Impedance Mismatch Problem

The host language manipulates variables, values, pointers

SQL manipulates relations.

There is no construct in the host language for manipulating relations.

Why not use only one language?• Forgetting SQL: “we can quickly dispense with this idea” [Ullman & Widom, pg. 363].• SQL cannot do everything that the host language can do.

Interface: SQL / Host Language

Values get passed through shared variables.

Colons precede shared variables when they occur within the SQL statements.

EXEC SQL: precedes every SQL statement in the host language.

The variable SQLSTATE provides error messages and status reports (e.g., 00000 says that the operation completed with noproblem).

EXEC SQL BEGIN DECLARE SECTION; char productName[30];EXEC SQL END DECLARE SECTION;

Using Shared VariablesVoid simpleInsert() {

EXEC SQL BEGIN DECLARE SECTION; char productName[20], company[30]; char SQLSTATE[6]; EXEC SQL END DECLARE SECTION;

/* get values for productName and company somehow */

EXEC SQL INSERT INTO Product(name, company) VALUES (:productName, :company); }

Single-Row Select Statements

Void getPrice() {

EXEC SQL BEGIN DECLARE SECTION; char productName[20], company[30]; integer price; char SQLSTATE[6]; EXEC SQL END DECLARE SECTION;/* read value of product name */ EXEC SQL SELECT price INTO :price FROM Product WHERE Product.name = :productName;

/* print out value of price */ }

CursorsEXEC SQL DECLARE cursorName CURSOR FOR SELECT …. FROM …. WHERE …. ;

EXEC SQL OPEN cursorName;

while (true) {

EXEC SQL FETCH FROM cursorName INTO :variables;

if (NO_MORE_TUPLES) break;

/* do something with values */ } EXEC SQL CLOSE cursorName;

More on Cursors

• cursors can modify a relation as well as read it.

• We can determine the order in which the cursor will get tuples by the ORDER BY keyword in the SQL query.

• Cursors can be protected against changes to the underlying relations.

• The cursor can be a scrolling one: can go forward, backward +n, -n, Abs(n), Abs(-n).