cmsc424, spring 2005 cmsc424: database design lecture 4
Post on 21-Dec-2015
221 views
TRANSCRIPT
CMSC424, Spring 2005
CMSC424: Database Design
Lecture 4
CMSC424, Spring 2005
Review: Relational Data Model Key Abstraction: Relation
Given sets: R = {1, 2, 3}, S = {3, 4}
• R S = { (1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4) }
• A relation on R, S is any subset () of R S (e.g: { (1, 4), (3, 4)})
Mathematical relations
Account Branches Accounts Balances{ (Downtown, A-101, 500),
(Brighton, A-201, 900),(Brighton, A-217, 500) }
Database relations
Given attribute domainsBranches = { Downtown, Brighton, … }Accounts = { A-101, A-201, A-217, … }Balances = R
bname acct_no balance
Downtown
Brighton
Brighton
A-101
A-201
A-217
500
900
500
CMSC424, Spring 2005
Review: Terms and Definitions
1. Tables = Relations
2. Columns = Attributes
3. Rows = Tuples
4. Relation Schema (or Schema) A list of attributes and their domains
We will require the domains to be atomic
E.g. account(account-number, branch-name, balance)
5. Relation Instance A particular instantiation of a relation with actual values
Will change with time
CMSC424, Spring 2005
Bank Database: SchemaAccount
bname acct_no balance
Depositor
cname acct_no
Customer
cname cstreet ccity
Branch
bname bcity assets
Borrower
cname lno
Loan
bname lno amt
CMSC424, Spring 2005
Bank Database: An InstanceAccount
bname acct_no balance
DowntownMianusPerryR.H.
BrightonRedwoodBrighton
A-101A-215A-102A-305A-201A-222A-217
500700400350900700750
Depositor
cname acct_no
JohnsonSmithHayesTurner
JohnsonJones
Lindsay
A-101A-215A-102A-305A-201A-217A-222
Customer
cname cstreet ccity
JonesSmithHayesCurry
LindsayTurner
WilliamsAdamsJohnsonGlennBrooksGreen
MainNorthMainNorthPark
PutnamNassauSpringAlma
Sand HillSenatorWalnut
HarrisonRye
HarrisonRye
PittsfieldStanfordPrincetonPittsfieldPalo AltoWoodsideBrooklynStanford
Branch
bname bcity assets
DowntownRedwood
PerryMianus
R.H.Pownel
N. TownBrighton
BrooklynPalo AltoHorseneckHorseneckHorseneckBennington
RyeBrooklyn
9M2.1M1.7M0.4M8M
0.3M3.7M7.1M
Borrower
cname lno
JonesSmithHayes
JacksonCurrySmith
WilliamsAdams
L-17L-23L-15L-14L-93L-11L-17L-16
Loan
bname lno amt
DowntownRedwood
PerryDowntown
MianusR.H.Perry
L-17L-23L-15L-14L-93L-11L-16
1000200015001500500900
1300
CMSC424, Spring 2005
Review: Keys and Relations
1. Superkeys• set of attributes of table for which every row has distinct set of values
2. Candidate keys•“minimal” superkeys
3. Primary keys•DBA-chosen candidate keys
As in the E/R Model:
Act as Integrity Constraintsi.e., guard against illegal/invalid instance of given schema
e.g., Branch = (bname, bcity, assets)
bname bcity assets
Brighton
Brighton
Brooklyn
Boston
5M
3M
Invalid!!
CMSC424, Spring 2005
More on Keys
Determining Primary KeysIf relation schema derived from E-R diagrams, we can
determine the primary keys using the original entity and relationship sets
Otherwise, same way we do it for E-R diagrams
• Find candidate keys (minimal sets of attributes that can uniquely identify a tuple)
• Designate one of them to be primary key
Foreign KeysIf a relation schema includes the primary key of another
relation schema, that attribute is called the foreign key
CMSC424, Spring 2005
Schema Diagram for the Banking Enterprise
CMSC424, Spring 2005
Relational Query Languages
Theoretical QL’s give semantics to Practical QL’s
Recall: Query = “Retrieval Program”
Theoretical:1. Relational Algebra2. Relational Calculus
a. Tuple Relational Calculus (TRC)b. Domain Relational Calculus (DRC)
Practical:1. SQL (originally: SEQUEL from System R)2. Quel (used in Ingres)3. Datalog (Prolog-like – used in research lab systems)
Language Examples:
CMSC424, Spring 2005
Relational Algebra
Basic Operators1. select ( σ )2. project ( )3. union ( )4. set difference ( – )5. cartesian product ( )6. rename ( ρ )
Relational Operator
Relation
Relation
Relation
CMSC424, Spring 2005
Select ( σ )
Notation: σpredicate (Relation)
Relation: Can be name of table, or another query
Predicate:
1. Simple• attribute1 = attribute2
• attribute = constant value (also: ≠, <, >, ≤, ≥)
2. Complex • predicate AND predicate• predicate OR predicate• NOT predicate
CMSC424, Spring 2005
Select ( σ )
Examples:
bname bcity assets
Downtown
Brighton
Brooklyn
Brooklyn
9M
7.1M
bname bcity assets
Downtown Brooklyn 9M
σ bcity = “Brooklyn” (branch) =
σ assets > 8M (σ bcity = “Brooklyn” (branch)) =
Notation: σpredicate (Relation)
CMSC424, Spring 2005
Project ( )
Notation: A1, …, An (Relation)• Each Ai an attribute
• Idea: selects columns (vs. σ which selects rows)
cstreet, ccity (customer) = cstreet ccityMain
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut
Harrison
Rye
Pittsfield
Stanford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stanford
Examples:
CMSC424, Spring 2005
Project ( )
Examples:
Notation: A1, …, An (Relation)• Each Ai an attribute
• Idea: selects columns (vs. σ which selects rows)
bcity (σassets > 5M (branch)) = bcity
Brooklyn
Horseneck
CMSC424, Spring 2005
Union ( )
Notation: Relation1 Relation2
Example:
(cname (depositor)) (cname (borrower)) = cname
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Adams
R S valid only if:
1. R, S have same number of columns (arity)
2. R, S corresponding columns have same domain (compatibility)
CMSC424, Spring 2005
Set Difference ( – )
bname lno amount
Downtown
Redwood
Perry
Downtown
Perry
L-17
L-23
L-15
L-14
L-16
1000
2000
1500
1500
1300
Notation: Relation1 - Relation2
R - S valid only if:
1. R, S have same number of columns (arity)
2. R, S corresponding columns have same domain (compatibility)
Example:
( bname (σamount ≥ 1000 (loan))) – (bname (σ balance < 800 (account))) =
bname acct_no balance
Mianus
Brighton
Redwood
Brighton
A-215
A-201
A-222
A-217
700
900
700
750
CMSC424, Spring 2005
Set Difference ( – )
bname lno amount
Downtown
Redwood
Perry
Downtown
Perry
L-17
L-23
L-15
L-14
L-16
1000
2000
1500
1500
1300
Notation: Relation1 - Relation2
R - S valid only if:
1. R, S have same number of columns (arity)
2. R, S corresponding columns have same domain (compatibility)
Example:
( bname (σamount ≥ 1000 (loan))) – (bname (σ balance < 800 (account))) =
bname acct_no balance
Mianus
Brighton
Redwood
Brighton
A-215
A-201
A-222
A-217
700
900
700
750
–=
bname
Downtown
Perry
CMSC424, Spring 2005
Cartesian Product ( )
Notation: Relation1 Relation2
Example:
R S like cross product for mathematical relations: • every tuple of R appended to every tuple of S
depositor borrower = depositor.
cname
acct_no borrower.
cname
lno
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Smith
…
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-215
…
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
Jones
…
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
L-17
…
How many tuples in the result?
A: 56
CMSC424, Spring 2005
Rename ( ρ )
Notation: identifier (Relation)
renames a relation, or
Notation: identifier0 (identifier1, …, identifiern) (Relation)
renames relation and columns of n-column relation
Use: massage relations to make , – valid, or more readable
CMSC424, Spring 2005
Rename ( ρ )
Notation: identifier0 (identifier1, …, identifiern) (Relation)
renames relation and columns of n-column relation
Example:
res (dcname, acctno, bcname, lno) (depositor borrower) =
depositor.
cname
acct_no borrower.
cname
lno
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Smith
…
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-215
…
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
Jones
…
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
L-17
…
CMSC424, Spring 2005
Rename ( ρ )
Notation: identifier0 (identifier1, …, identifiern) (Relation)
renames relation and columns of n-column relation
Example:
res (dcname, acctno, bcname, lno) (depositor borrower) =
dcname acctno bcname lno
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Johnson
Smith
…
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-101
A-215
…
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
Jones
…
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
L-17
…
res =
CMSC424, Spring 2005
Example Query in RA
Determine lno’s for loans that are for an amount that is larger than the amt of some other loan. (i.e. lno’s for all non-minimal loans)
Temp1 …
Temp2 … Temp1 …
…
Can do in steps:
CMSC424, Spring 2005
Bank Database: An InstanceAccount
bname acct_no balance
DowntownMianusPerryR.H.
BrightonRedwoodBrighton
A-101A-215A-102A-305A-201A-222A-217
500700400350900700750
Depositor
cname acct_no
JohnsonSmithHayesTurner
JohnsonJones
Lindsay
A-101A-215A-102A-305A-201A-217A-222
Customer
cname cstreet ccity
JonesSmithHayesCurry
LindsayTurner
WilliamsAdamsJohnsonGlennBrooksGreen
MainNorthMainNorthPark
PutnamNassauSpringAlma
Sand HillSenatorWalnut
HarrisonRye
HarrisonRye
PittsfieldStanfordPrincetonPittsfieldPalo AltoWoodsideBrooklynStanford
Branch
bname bcity assets
DowntownRedwood
PerryMianus
R.H.Pownel
N. TownBrighton
BrooklynPalo AltoHorseneckHorseneckHorseneckBennington
RyeBrooklyn
9M2.1M1.7M0.4M8M
0.3M3.7M7.1M
Borrower
cname lno
JonesSmithHayes
JacksonCurrySmith
WilliamsAdams
L-17L-23L-15L-14L-93L-11L-17L-16
Loan
bname lno amt
DowntownRedwood
PerryDowntown
MianusR.H.Perry
L-17L-23L-15L-14L-93L-11L-16
1000200015001500500900
1300
CMSC424, Spring 2005
lno amt
L-17
L-23
L-15
L-14
L-93
L-11
L-16
1000
2000
1500
1500
500
900
1300
Example Query in RA
1. Find the base data we need
Temp1 lno,amt (loan)
2. Make a copy of (1)
Temp2 ρ Temp2 (lno2,amt2) (Temp1) lno2 amt2
L-17
L-23
L-15
L-14
L-93
L-11
L-16
1000
2000
1500
1500
500
900
1300
CMSC424, Spring 2005
lno amt lno2 amt2
L-17
L-17
…
L-17
L-23
L-23
…
L-23
…
1000
1000
…
1000
2000
2000
…
2000
…
L-17
L-23
…
L-16
L-17
L-23
…
L-16
…
1000
2000
…
1300
1000
2000
…
1300
…
3. Take the cartesian product of 1 and 2
Temp3 Temp1 Temp2
Example Query in RA
CMSC424, Spring 2005
lno (
σamt > amt2 (lno,amt (loan) (ρTemp2 (lno2,amt2) (lno,amt (loan)))))
4. Select non-minimal loans
Temp4 σamt > amt2 (Temp3)
5. Project on lno
Result lno (Temp4)
Example Query in RA
… or, if you prefer…
CMSC424, Spring 2005
What we learned so far…
Relational Algebra Operators
1. Select
2. Project
3. Set Union
4. Set Difference
5. Cartesian Product
6. Rename
These are called fundamental operations
CMSC424, Spring 2005
Formal Definition
Basic expressionA relation in the databaseA constant relation
e.g. {(A-101, Downtown, 500), (A-215, Mianus, 700)…}
Let E1 and E2 be two relational-algebra expressions, then the following are also:
1. σP(E1), where P is a predicate on attributes in E1
2. pS(E1)where S is a list containing some attributes in E1
3. E1 E2,
4. E1 – E2
5. E1 E2
6. ρx(E1), where x is the new name for the result of E1
CMSC424, Spring 2005
Relational AlgebraRedundant Operators
4. Update ( ) (we’ve already been using)
2. Division ( )
1. Natural Join ( )
3. Outer Joins ( )
• Redundant: Above can be expressed in terms of minimal RA e.g. depositor borrower =
π …(σ…(depositor ρ…(borrower)))
• Added as convenience
CMSC424, Spring 2005
Natural Join
Idea: combines ρ, , σ
A B C D
1
2
2
3
α
α
α
β
+
-
-
+
10
10
20
10
E B D
‘a’
‘a’
‘b’
‘c’
α
α
β
β
10
20
10
10
r s
A B C D E
1
2
2
3
3
α
α
α
β
β
+
-
-
+
+
10
10
20
10
10
‘a’
‘a’
‘a’
‘b’
‘c’
=
Relation1 Relation2Notation:
πcname,acct_no,lno (σcname=cname2 (depositor ρt(cname2,lno) (borrower)))
≡
depositor borrower
CMSC424, Spring 2005
Division
A B
α
α
α
β
γ
γ
γ
γ
δ
δ
1
23
1
1
3
4
6
1
2
B
1
2
r
s
A
α
δ
=
Query: Find values for A in r which have corresponding B values for all B values in s
Relation1 Relation2Notation:
Idea: expresses “for all” queries
CMSC424, Spring 2005
Division
A B
α
α
α
β
γ
γ
γ
γ
δ
δ
1
23
1
1
3
4
6
1
2
B
1
2
r
s
A
α
δ=
17 3 = 5
The largest value of i such
that: i 3 ≤ 17
t
Relational Division
The largest value of t such that:( t s r )
Another way to look at it: and
CMSC424, Spring 2005
A B C D E
α
α
α
β
β
γ
γ
γ
a
a
a
a
a
a
a
a
α
γ
γ
γ
γ
γ
γ
β
a
a
b
a
b
a
b
b
1
1
1
1
3
1
1
1
D E
a
b
1
1
r
s
A B C
α
γ
a
a
γγ
=
t
?
Division
A More Complex Example