cs304: database systemsdjmvfb/courses/cs2300/static...displaying a relational database schema and...
TRANSCRIPT
CS2300: File Structures and
Introduction to Database Systems
Lecture 9: Relational Model &
Relational Algebra
Doug McGeehan
1
222
Brief Review
• Relational model conceptsInformal Terms Formal Terms
Table Relation
Column Attribute
Row Tuple
All possible values in a column Domain
Table Definition Schema of Relation
Populated Table Extension/State
333
Brief Review
• Relational model concepts
• Keys and Superkeys
– e.g. Vehicle(VIN, Reg#, State, MPG, Odometer)
– Keys: {VIN} and {Reg#, State}
– Superkeys: {VIN, Odometer}
or {Reg#, State, MPG}
• They both include a key; one attribute can be removed
– Candidate keys: {VIN}, {Reg#, State}
– Primary key: {VIN}
444
Brief Review
• Relational model concepts
• Keys and Superkeys
• Relational model constraints
– Domain / NOT NULL constraints (on attributes)
– Key constraints (on a single relation)
– Entity integrity constraint (on a single relation)
– Referential integrity constraint (on two relations)
Referential Integrity
A referential integrity constraint can be
displayed in a relational database schema as
a directed arc from R1.FK to R2.
SID Name Address SID Course Grade
R1 R2
5
Valid and Invalid State
• Valid state: A database state satisfies all
integrity constraints.
• Invalid state: A database state that does not
obey some integrity constraint(s).
SID Name Address
101 Alice Rolla
SID Course Grade
111 CS238 A
101 CS304 A
R1 R2Invalid state
6
Valid and Invalid State
• Valid state: A database state satisfies all
integrity constraints.
• Invalid state: A database state that does not
obey some integrity constraint(s).
SID Name Address
101 Alice Rolla
SID Course Grade
101 CS238 A
101 CS304 A
R1 R2Valid state
7
8
Displaying a relational database
schema and its constraints
• Relation schema: displayed as a row of attribute names
• Name of the relation: above the attribute names
• Primary key attribute(s): underlined
• Foreign key (referential integrity) constraint:– A directed arc (arrow) from the foreign key attributes to the
referenced relation
– For clarity, can also point to the primary key of the referenced relation.
How to identify a foreign key?
• Start from a Candidate Key
• Check if it is referenced by other
relations or even in the same relation
9
SID Name Address SID Course Grade
R1 R2
10
Referencing and
referenced relations
can be the same
11
ExerciseConsider the following relations for a database that keeps track of student
enrollment in courses and the books adopted for each course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign keys for
this schema.
11
Question
• Car(State, Reg#, SerialNo, Make, Model, Year)
– Car has two candidate keys: {state, reg#}, {SerialNo}
• Accident(SerialNo, date)
Is SerialNo a foreign key in Car or Accident relation?
Answer: SerialNo is a foreign key in Accident relation
because it is a candidate key in Car relation.
12
Operations of the Relational Model
• Operations can be categorized into
– Retrieval
• Query a database
– Updates
• Change the database
• Basic update operations for changing the database:
– INSERT a new tuple in a relation
– DELETE an existing tuple from a relation
– MODIFY an attribute of an existing tuple
13
Update Operations on Relations
• Integrity constraints should not be
violated by the update operations.
• Updates may propagate to cause
other updates automatically.
– This may be necessary to maintain
integrity constraints.
14
Example Relations
coursesidtid
os20220
calculus45010
db 20220
db24020 depttnametid
mathcohen10
cslevy20
yearsnamesid
3white240
3jones202
1adams450
S
T
R
15
Example Relations
coursesidtid
os20220
calculus45010
db 20220
db24020 depttnametid
mathcohen10
cslevy20
yearsnamesid
3white240
3jones200
1adams450
S
T
R
16
Example Relations
coursesidtid
os20020
calculus45010
db 20020
db24020 depttnametid
mathcohen10
cslevy20
yearsnamesid
3white240
3jones200
1adams450
S
T
R
17
Possible Violations for Insert Operation
• INSERT may violate any of the constraints:
– Domain constraint / NOT NULL constraint:
• A new attribute in new tuple is not in attribute’s domain
– Key constraint:
• Key attribute in new tuple already exists in its relation
– Entity integrity:
• Primary key value is NULL in the new tuple
– Referential integrity:
• Foreign key in new tuple references non-existent primary key
18
Possible Violations for Delete Operation
• DELETE may violate only referential integrity:
– If the primary key value of the tuple being deleted is
referenced from other tuples in the database
– Can be remedied by several actions
• Reject the deletion
• Propagate the change to the referencing tuples
• Set the foreign keys of the referencing tuples to NULL
when the referencing attributes are not part of the
primary key
– One of the above options must be specified during
database design for each foreign key constraint
19
Possible Violations for Modify Operation
• MODIFY may violate any of the constraints when
– Updating the primary key (PK)
• Similar to a DELETE followed by an INSERT
• Need to specify similar remedies to DELETE
– Updating a foreign key (FK)
• May violate referential integrity
– Updating an ordinary attribute (neither PK nor FK):
• Can only violate domain constraints
or NOT NULL constraint
20
How to Deal With Violations?
• In case of integrity violation caused by any operation,
several actions can be taken:
– Reject the operation that causes the violation
– Correct the violation by triggering additional
updates
– Perform the operation but inform the user of the
violation
– Execute a user-specified error-correction routine
21
Relational Algebra
Chapter 8
22
What is an “Algebra”?
• Mathematical system consisting of:
– Operands : variables or values from which
new values can be constructed.
– Operators : symbols denoting procedures
that construct new values from given
values.
23
What is Relational Algebra?
• The relational model is an abstract
(mathematical) modeling of a table
• Relational algebra (RA): an algebra whose
operands are relations or variables that
represent relations.
– A (mathematical) query language for relations
– Query language = languages for writing questions
about the data
24
Why do we need to understand RA?
• Real queries are written in SQL, but are
translated by the query processor into
relational algebra
• Why?
– SQL is declarative, RA provides operations for
execution
– Optimization is easier in RA, since we can take
advantage of (provable) expression equivalences
25
What you should know from this course
1. How to write queries in relational algebra
2. How to calculate the result of a relational
algebra expression over a set of relations
3. How to determine whether two relational
algebra expressions are equivalent
26
Relational Algebra
• Relational algebra has a collection of operators
on relations
• Operators may be unary or binary
• The output of an operator is a relation
– Another way to say this is that the algebra is “closed”
– Therefore, operators can be composed one on
another
27
Basic Operators
• Relational Algebra has 5 basic operators:
– Project (π)
– Select ()
– Union (U)
– Set difference (-)
– Cartesian product (X)
• Other operators can be defined using the basic ones:
Intersection, Join, Division …
• A useful syntactic operator: Rename
28
Example Relations
coursesidtid
os20220
calculus45010
db 20220
db24020 depttnametid
mathcohen10
cslevy20
yearsnamesid
3white240
3jones202
1adams450
S = Students
T = Teachers
R = Studies
S
T
R
29
The Project Operation
• The Project operation is unary (i.e., it is
applied to a single relation)
• Denoted as: A1,…,An (R)
– A1,…,An are attributes
• Returns a new relation of only A1,…,An
from the original relation
• Output relation does not have a name
30
Example: Find the teacher’s ID (tid) for each course
coursetid
os20
calculus10
db20
coursesidtid
os20220
calculus45010
db20220
db24020
R
Less tuples in result. Why?
tid,course R
31
The Project Operation
• Duplicate elimination
– The Project operation removes any duplicate tuples,
so the output is a valid relation
• When computing a projection on a relation with
n tuples:
– What is the minimum cardinality of the result?
– What is the maximum cardinality?
32
The Project Operation
• list1 (list2 R) = list1 R
• Note:
– list2 must contain attributes in list1
– i.e. Intersection must not be empty
– Otherwise, the left-hand side is incorrect.
33
The Select Operation
• Unary operator, written as C(R)
– C is a Boolean condition over each single
tuple in R
– e.g. name = ‘Doug’(Instructors)
• Returns the tuples that satisfy C
34
Example
• Return the courses taught by teacher number 20
35
coursesidtid
os20220
db20220
db24020
coursesidtid
os20220
calculus45010
db20220
db24020
Rtid = 20 R
What types of Conditions can be used?
• The condition is made up of comparisons that
are connected using logical operators (and, or)
• Comparisons are between 2 attributes or
between an attribute and a constant
– attribute1 op attribute2 (e.g. balance<credit_limit)
– attribute1 op constant (e.g. tid=20)
Important! Conditions are evaluated a
single tuple at a time. 36
Types of Comparisons
• We can use any of the operators:
• When comparing an attribute with a
string, the string is written in single
quotes
= ,,,,,
)('' Tcohentname=37
The Select Operation
• Examples
– tid = 20 and course = `os’ R
38
coursesidtid
os20220
calculus45010
db20220
db24020
R
What are the results
of the above queries?
The Select Operation
• Examples
– tid = 20 and course = `os’ R
– tid = 10 and course = `db’ R
39
coursesidtid
os20220
calculus45010
db20220
db24020
R
What are the results
of the above queries?
The Select Operation
• Examples
– tid = 20 and course = `os’ R
– tid = 10 and course = `db’ R
– tid = 20 or sid=202 or sid=450 R
40
coursesidtid
os20220
calculus45010
db20220
db24020
R
What are the results
of the above queries?
The Select Operation
• c1 (c2 (R)) = c2 (c1 (R))
= c1 and c2 (R)
• When computing a selection on a relation
with n tuples,
– Minimum cardinality of result?
– Maximum cardinality of result?
41
Combining Selection and Projection
• What does this compute?
42
coursesidtid
os20220
calculus45010
db20220
db24020
Can we change
the order of
these two
operations?
course(tid = 20 R)
42
Example
• How would you find the names of the
third year students?
43
S yearsnamesid
3white240
3jones202
1adams450
sname
white
Jones
?
43
Example
• Find id of students named jones who are in
their first year or third year of studies
44
))(( )31('' Syearyearjonessnamesid ===
S yearsnamesid
3white240
3jones202
1adams450
sid
202
44
Once Again
45
• What is in the result of the query when
we have this instance of S?
S yearsnamesid
3white240
3jones202
1jones450
sid
202
45045
))(( )31('' Syearyearjonessnamesid ===
What’s Next
• More relational algebra operators …
• Project Phase 1 report due Monday
11:59pm
46