cmsc424, spring 2005 cmsc424: database design lecture 4

33
CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

Post on 21-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

CMSC424: Database Design

Lecture 4

Page 2: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Review: Relational Data Model Key Abstraction: Relation

Given sets: R = {1, 2, 3}, S = {3, 4}

• R S = { (1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4) }

• A relation on R, S is any subset () of R S (e.g: { (1, 4), (3, 4)})

Mathematical relations

Account Branches Accounts Balances{ (Downtown, A-101, 500),

(Brighton, A-201, 900),(Brighton, A-217, 500) }

Database relations

Given attribute domainsBranches = { Downtown, Brighton, … }Accounts = { A-101, A-201, A-217, … }Balances = R

bname acct_no balance

Downtown

Brighton

Brighton

A-101

A-201

A-217

500

900

500

Page 3: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Review: Terms and Definitions

1. Tables = Relations

2. Columns = Attributes

3. Rows = Tuples

4. Relation Schema (or Schema) A list of attributes and their domains

We will require the domains to be atomic

E.g. account(account-number, branch-name, balance)

5. Relation Instance A particular instantiation of a relation with actual values

Will change with time

Page 4: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Bank Database: SchemaAccount

bname acct_no balance

Depositor

cname acct_no

Customer

cname cstreet ccity

Branch

bname bcity assets

Borrower

cname lno

Loan

bname lno amt

Page 5: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Bank Database: An InstanceAccount

bname acct_no balance

DowntownMianusPerryR.H.

BrightonRedwoodBrighton

A-101A-215A-102A-305A-201A-222A-217

500700400350900700750

Depositor

cname acct_no

JohnsonSmithHayesTurner

JohnsonJones

Lindsay

A-101A-215A-102A-305A-201A-217A-222

Customer

cname cstreet ccity

JonesSmithHayesCurry

LindsayTurner

WilliamsAdamsJohnsonGlennBrooksGreen

MainNorthMainNorthPark

PutnamNassauSpringAlma

Sand HillSenatorWalnut

HarrisonRye

HarrisonRye

PittsfieldStanfordPrincetonPittsfieldPalo AltoWoodsideBrooklynStanford

Branch

bname bcity assets

DowntownRedwood

PerryMianus

R.H.Pownel

N. TownBrighton

BrooklynPalo AltoHorseneckHorseneckHorseneckBennington

RyeBrooklyn

9M2.1M1.7M0.4M8M

0.3M3.7M7.1M

Borrower

cname lno

JonesSmithHayes

JacksonCurrySmith

WilliamsAdams

L-17L-23L-15L-14L-93L-11L-17L-16

Loan

bname lno amt

DowntownRedwood

PerryDowntown

MianusR.H.Perry

L-17L-23L-15L-14L-93L-11L-16

1000200015001500500900

1300

Page 6: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Review: Keys and Relations

1. Superkeys• set of attributes of table for which every row has distinct set of values

2. Candidate keys•“minimal” superkeys

3. Primary keys•DBA-chosen candidate keys

As in the E/R Model:

Act as Integrity Constraintsi.e., guard against illegal/invalid instance of given schema

e.g., Branch = (bname, bcity, assets)

bname bcity assets

Brighton

Brighton

Brooklyn

Boston

5M

3M

Invalid!!

Page 7: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

More on Keys

Determining Primary KeysIf relation schema derived from E-R diagrams, we can

determine the primary keys using the original entity and relationship sets

Otherwise, same way we do it for E-R diagrams

• Find candidate keys (minimal sets of attributes that can uniquely identify a tuple)

• Designate one of them to be primary key

Foreign KeysIf a relation schema includes the primary key of another

relation schema, that attribute is called the foreign key

Page 8: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Schema Diagram for the Banking Enterprise

Page 9: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Relational Query Languages

Theoretical QL’s give semantics to Practical QL’s

Recall: Query = “Retrieval Program”

Theoretical:1. Relational Algebra2. Relational Calculus

a. Tuple Relational Calculus (TRC)b. Domain Relational Calculus (DRC)

Practical:1. SQL (originally: SEQUEL from System R)2. Quel (used in Ingres)3. Datalog (Prolog-like – used in research lab systems)

Language Examples:

Page 10: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Relational Algebra

Basic Operators1. select ( σ )2. project ( )3. union ( )4. set difference ( – )5. cartesian product ( )6. rename ( ρ )

Relational Operator

Relation

Relation

Relation

Page 11: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Select ( σ )

Notation: σpredicate (Relation)

Relation: Can be name of table, or another query

Predicate:

1. Simple• attribute1 = attribute2

• attribute = constant value (also: ≠, <, >, ≤, ≥)

2. Complex • predicate AND predicate• predicate OR predicate• NOT predicate

Page 12: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Select ( σ )

Examples:

bname bcity assets

Downtown

Brighton

Brooklyn

Brooklyn

9M

7.1M

bname bcity assets

Downtown Brooklyn 9M

σ bcity = “Brooklyn” (branch) =

σ assets > 8M (σ bcity = “Brooklyn” (branch)) =

Notation: σpredicate (Relation)

Page 13: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Project ( )

Notation: A1, …, An (Relation)• Each Ai an attribute

• Idea: selects columns (vs. σ which selects rows)

cstreet, ccity (customer) = cstreet ccityMain

North

Park

Putnam

Nassau

Spring

Alma

Sand Hill

Senator

Walnut

Harrison

Rye

Pittsfield

Stanford

Princeton

Pittsfield

Palo Alto

Woodside

Brooklyn

Stanford

Examples:

Page 14: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Project ( )

Examples:

Notation: A1, …, An (Relation)• Each Ai an attribute

• Idea: selects columns (vs. σ which selects rows)

bcity (σassets > 5M (branch)) = bcity

Brooklyn

Horseneck

Page 15: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Union ( )

Notation: Relation1 Relation2

Example:

(cname (depositor)) (cname (borrower)) = cname

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Adams

R S valid only if:

1. R, S have same number of columns (arity)

2. R, S corresponding columns have same domain (compatibility)

Page 16: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Set Difference ( – )

bname lno amount

Downtown

Redwood

Perry

Downtown

Perry

L-17

L-23

L-15

L-14

L-16

1000

2000

1500

1500

1300

Notation: Relation1 - Relation2

R - S valid only if:

1. R, S have same number of columns (arity)

2. R, S corresponding columns have same domain (compatibility)

Example:

( bname (σamount ≥ 1000 (loan))) – (bname (σ balance < 800 (account))) =

bname acct_no balance

Mianus

Brighton

Redwood

Brighton

A-215

A-201

A-222

A-217

700

900

700

750

Page 17: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Set Difference ( – )

bname lno amount

Downtown

Redwood

Perry

Downtown

Perry

L-17

L-23

L-15

L-14

L-16

1000

2000

1500

1500

1300

Notation: Relation1 - Relation2

R - S valid only if:

1. R, S have same number of columns (arity)

2. R, S corresponding columns have same domain (compatibility)

Example:

( bname (σamount ≥ 1000 (loan))) – (bname (σ balance < 800 (account))) =

bname acct_no balance

Mianus

Brighton

Redwood

Brighton

A-215

A-201

A-222

A-217

700

900

700

750

–=

bname

Downtown

Perry

Page 18: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Cartesian Product ( )

Notation: Relation1 Relation2

Example:

R S like cross product for mathematical relations: • every tuple of R appended to every tuple of S

depositor borrower = depositor.

cname

acct_no borrower.

cname

lno

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Smith

A-101

A-101

A-101

A-101

A-101

A-101

A-101

A-101

A-215

Jones

Smith

Hayes

Jackson

Curry

Smith

Williams

Adams

Jones

L-17

L-23

L-15

L-14

L-93

L-11

L-17

L-16

L-17

How many tuples in the result?

A: 56

Page 19: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Rename ( ρ )

Notation: identifier (Relation)

renames a relation, or

Notation: identifier0 (identifier1, …, identifiern) (Relation)

renames relation and columns of n-column relation

Use: massage relations to make , – valid, or more readable

Page 20: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Rename ( ρ )

Notation: identifier0 (identifier1, …, identifiern) (Relation)

renames relation and columns of n-column relation

Example:

res (dcname, acctno, bcname, lno) (depositor borrower) =

depositor.

cname

acct_no borrower.

cname

lno

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Smith

A-101

A-101

A-101

A-101

A-101

A-101

A-101

A-101

A-215

Jones

Smith

Hayes

Jackson

Curry

Smith

Williams

Adams

Jones

L-17

L-23

L-15

L-14

L-93

L-11

L-17

L-16

L-17

Page 21: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Rename ( ρ )

Notation: identifier0 (identifier1, …, identifiern) (Relation)

renames relation and columns of n-column relation

Example:

res (dcname, acctno, bcname, lno) (depositor borrower) =

dcname acctno bcname lno

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Johnson

Smith

A-101

A-101

A-101

A-101

A-101

A-101

A-101

A-101

A-215

Jones

Smith

Hayes

Jackson

Curry

Smith

Williams

Adams

Jones

L-17

L-23

L-15

L-14

L-93

L-11

L-17

L-16

L-17

res =

Page 22: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Example Query in RA

Determine lno’s for loans that are for an amount that is larger than the amt of some other loan. (i.e. lno’s for all non-minimal loans)

Temp1 …

Temp2 … Temp1 …

Can do in steps:

Page 23: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Bank Database: An InstanceAccount

bname acct_no balance

DowntownMianusPerryR.H.

BrightonRedwoodBrighton

A-101A-215A-102A-305A-201A-222A-217

500700400350900700750

Depositor

cname acct_no

JohnsonSmithHayesTurner

JohnsonJones

Lindsay

A-101A-215A-102A-305A-201A-217A-222

Customer

cname cstreet ccity

JonesSmithHayesCurry

LindsayTurner

WilliamsAdamsJohnsonGlennBrooksGreen

MainNorthMainNorthPark

PutnamNassauSpringAlma

Sand HillSenatorWalnut

HarrisonRye

HarrisonRye

PittsfieldStanfordPrincetonPittsfieldPalo AltoWoodsideBrooklynStanford

Branch

bname bcity assets

DowntownRedwood

PerryMianus

R.H.Pownel

N. TownBrighton

BrooklynPalo AltoHorseneckHorseneckHorseneckBennington

RyeBrooklyn

9M2.1M1.7M0.4M8M

0.3M3.7M7.1M

Borrower

cname lno

JonesSmithHayes

JacksonCurrySmith

WilliamsAdams

L-17L-23L-15L-14L-93L-11L-17L-16

Loan

bname lno amt

DowntownRedwood

PerryDowntown

MianusR.H.Perry

L-17L-23L-15L-14L-93L-11L-16

1000200015001500500900

1300

Page 24: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

lno amt

L-17

L-23

L-15

L-14

L-93

L-11

L-16

1000

2000

1500

1500

500

900

1300

Example Query in RA

1. Find the base data we need

Temp1 lno,amt (loan)

2. Make a copy of (1)

Temp2 ρ Temp2 (lno2,amt2) (Temp1) lno2 amt2

L-17

L-23

L-15

L-14

L-93

L-11

L-16

1000

2000

1500

1500

500

900

1300

Page 25: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

lno amt lno2 amt2

L-17

L-17

L-17

L-23

L-23

L-23

1000

1000

1000

2000

2000

2000

L-17

L-23

L-16

L-17

L-23

L-16

1000

2000

1300

1000

2000

1300

3. Take the cartesian product of 1 and 2

Temp3 Temp1 Temp2

Example Query in RA

Page 26: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

lno (

σamt > amt2 (lno,amt (loan) (ρTemp2 (lno2,amt2) (lno,amt (loan)))))

4. Select non-minimal loans

Temp4 σamt > amt2 (Temp3)

5. Project on lno

Result lno (Temp4)

Example Query in RA

… or, if you prefer…

Page 27: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

What we learned so far…

Relational Algebra Operators

1. Select

2. Project

3. Set Union

4. Set Difference

5. Cartesian Product

6. Rename

These are called fundamental operations

Page 28: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Formal Definition

Basic expressionA relation in the databaseA constant relation

e.g. {(A-101, Downtown, 500), (A-215, Mianus, 700)…}

Let E1 and E2 be two relational-algebra expressions, then the following are also:

1. σP(E1), where P is a predicate on attributes in E1

2. pS(E1)where S is a list containing some attributes in E1

3. E1 E2,

4. E1 – E2

5. E1 E2

6. ρx(E1), where x is the new name for the result of E1

Page 29: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Relational AlgebraRedundant Operators

4. Update ( ) (we’ve already been using)

2. Division ( )

1. Natural Join ( )

3. Outer Joins ( )

• Redundant: Above can be expressed in terms of minimal RA e.g. depositor borrower =

π …(σ…(depositor ρ…(borrower)))

• Added as convenience

Page 30: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Natural Join

Idea: combines ρ, , σ

A B C D

1

2

2

3

α

α

α

β

+

-

-

+

10

10

20

10

E B D

‘a’

‘a’

‘b’

‘c’

α

α

β

β

10

20

10

10

r s

A B C D E

1

2

2

3

3

α

α

α

β

β

+

-

-

+

+

10

10

20

10

10

‘a’

‘a’

‘a’

‘b’

‘c’

=

Relation1 Relation2Notation:

πcname,acct_no,lno (σcname=cname2 (depositor ρt(cname2,lno) (borrower)))

depositor borrower

Page 31: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Division

A B

α

α

α

β

γ

γ

γ

γ

δ

δ

1

23

1

1

3

4

6

1

2

B

1

2

r

s

A

α

δ

=

Query: Find values for A in r which have corresponding B values for all B values in s

Relation1 Relation2Notation:

Idea: expresses “for all” queries

Page 32: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

Division

A B

α

α

α

β

γ

γ

γ

γ

δ

δ

1

23

1

1

3

4

6

1

2

B

1

2

r

s

A

α

δ=

17 3 = 5

The largest value of i such

that: i 3 ≤ 17

t

Relational Division

The largest value of t such that:( t s r )

Another way to look at it: and

Page 33: CMSC424, Spring 2005 CMSC424: Database Design Lecture 4

CMSC424, Spring 2005

A B C D E

α

α

α

β

β

γ

γ

γ

a

a

a

a

a

a

a

a

α

γ

γ

γ

γ

γ

γ

β

a

a

b

a

b

a

b

b

1

1

1

1

3

1

1

1

D E

a

b

1

1

r

s

A B C

α

γ

a

a

γγ

=

t

?

Division

A More Complex Example