lect#6 - relational algebra.ppt
TRANSCRIPT
Relational Algebra
Lesson Outcomes2
By the end of the lesson students should be able to:
Relate the use of relational algebra in database manipulation
Identify the basic symbols used in relational algebra
Express database queries in the form of relational algebra expressions
What is Relational Algebra (RA) ?3
A query language (QL) is a language that allows users to manipulate and retrieve data from a database.
The relational model supports simple, powerful QLs (having strong formal foundation based on logics, allow for much optimization)
Two (mathematical) query languages form the basis for “real" languages (e.g., SQL) and for implementation. One of them is relational algebra
Relational Algebra is useful for representing query execution plans, and query optimization techniques.
Relational Algebra comprises a set of basic operations. An operation is the application of an operator to one or more
source (or input) relations to produce a new relation as a result. Relational algebra has the property of closure, that is the
output from each of the operations is another relation.
5 Basic RA Operations• Projection ( ) Retains only wanted columns from
relation (vertical).• Selection ( ) Selects a subset of rows from
relation (horizontal).• Cross-product (x) Allows us to combine two
relations.• Set-difference (–) Tuples in R1, but not in R2.• Union ( ) Tuples in R1 and/or in R2.
NOTE: R1 = relation/table 1 R2 = relation/table 2
s
p
Example InstancesS2Boats
bid bname color 101 Interlake blue 102 Interlake red 103 Clipper green 104 Marine red
sid bid day 22 101 10/10/96 58 103 11/12/96
sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0
Rents
Projection
page S( )2
Retains only attributes that are in the “projection list”.
Examples:
The 1st query will produce a relation with only one column which is age. While the 2nd query produces a relation with two columns: sname and rating. The number of rows depends on the number of unique content. e.g. 1st query, 2 rows, 2nd query, 4 rows
Projection operator eliminates duplicates rows
p sname rating S, ( )2
Projection
)2(, Sratingsnamep
page S( )2
sname rating yuppy 9 lubber 8 guppy 5 rusty 10
age 35.0 55.5
Eliminates redundant
35.0
Selection (s)
s rating S>8 2( )
Selects rows that satisfy selection condition. Can also have multiple conditions
Result is a relation with same number of columns BUT probably lesser number of rows.
Example:
sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0
Union and Set-Difference All of these operations take two input
relations, which must be union-compatible: Same number of fields. `Corresponding’ fields have the same
type.R1 U R2 R1 - R2 R2 - R1
Union
S1
S2
sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0
sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0
sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 44 guppy 5 35.0 28 yuppy 9 35.0
S1 U S2
Eliminates duplicates
Set Difference
S1
S2
S S1 2-
S2 – S1
sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0
sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0
sid sname rating age 22 dustin 7 45.0
sid sname rating age 28 yuppy 9 35.0 44 guppy 5 35.0
Cross-Product R1 x R2: Each row of R1 (m rows) paired with each row of
R2 (n rows). Q: How many rows in the result? m x n• Result schema has one field per field of R1 and R2, with
field names `inherited’ if possible. May have a naming conflict: Both R1 and R2 have a field
with the same name. In this case, can use the renaming operator (OPTIONAL):
X (E) - rename expression E as x (A1, A2, …, An) (E) = rename attributes in E as A1, A2,..,An
Cross Product Example
R1R2
R1 X R2 =
sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0
sid bid day 22 101 10/10/96 58 103 11/12/96
(sid) sname rating age (sid) bid day 22 dustin 7 45.0 22 101 10/ 10/96 22 dustin 7 45.0 58 103 11/ 12/96 31 lubber 8 55.5 22 101 10/ 10/96 31 lubber 8 55.5 58 103 11/ 12/96 58 rusty 10 35.0 22 101 10/ 10/96 58 rusty 10 35.0 58 103 11/ 12/96
Compound Operator: Intersection
In addition to the 5 basic operators, there are several additional “Compound Operators” These add no computational power to the
language, but are useful shorthands. Can be expressed solely with the basic
operations.
Intersection takes two input relations, which must be union-compatible.
Q: How to express it using basic operators?R S = R - (R - S)
Intersection
S1
S2
sid sname rating age 31 lubber 8 55.5 58 rusty 10 35.0
sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0
sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0
S1 n S2
Compound Operator: Join Joins are compound operators involving cross
product, selection, and (sometimes) projection. Most common type of join is a “natural join” (often
just called “join”). R S conceptually is: Compute R X S Select rows where attributes that appear in both relations have equal
values and domains Project all unique atttributes and one copy of each of the common
ones. Note: Usually done much more efficiently than this. Useful for putting “normalized” relations back
together.
Natural Join Example
R1S1
S1 R1 =
sid sname rating age bid day22 dustin 7 45.0 101 10/10/9658 rusty 10 35.0 103 11/12/96
sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0
sid bid day 22 101 10/10/96 58 103 11/12/96
“Theta” Join Examplesid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0
sid bid day22 101 10/10/9658 103 11/12/96
R1 S1
S1>< S1.sid < R1.sid R1 =
(sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96
Outer Join
• An extension of the join operation that avoids loss of information.
• Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.
• Uses null values:null signifies that the value is unknown or does
not exist
Outer Join – Example
Relation loan
customer-name loan-number
JonesSmithHayes
L-170L-230L-155
300040001700
loan-number amount
L-170L-230L-260
branch-name
DowntownRedwoodPerryridge
Relation borrower
Left Outer JoinJoin
loan Borrower
loan-number amount
L-170L-230
30004000
customer-name
JonesSmith
branch-name
DowntownRedwood
JonesSmithnull
loan-number amount
L-170L-230L-260
300040001700
customer-namebranch-name
DowntownRedwoodPerryridge
Left Outer Join
loan Borrower
Right Outer Join, Full Outer JoinRight Outer Join loan borrower
loan borrowerFull Outer Join
loan-number amount
L-170L-230L-155
30004000null
customer-name
JonesSmithHayes
branch-name
DowntownRedwood
null
loan-number amount
L-170L-230L-260L-155
300040001700null
customer-name
JonesSmithnull
Hayes
branch-name
DowntownRedwoodPerryridge
null
Compound Operator: Division
Division is useful for expressing “for all” queries like: Find sids of sailors who have reserved all boats.
For A/B attributes of B are subset of attributes of A. May need to “project” to make this happen.
E.g., let A have 2 fields, x and y; B have only field y:
A/B contains all x tuples such that for every y tuple in B, there is an xy tuple in A.
Examples of Division A/B (AB)sno pno s1 p1 s1 p2 s1 p3 s1 p4 s2 p1 s2 p2 s3 p2 s4 p2 s4 p4
pnop2
pnop2p4
pnop1p2p4
snos1s2s3s4
snos1s4 sno
s1A
B1B2
B3
A/B1 A/B2 A/B3
Another Division Example
A B
aaaaaaaa
C D
aabababb
E11113111
Relations r, s:
r s:
Dab
E11
A B
aa
C
s
Aggregate Functions and Operations
• Aggregation function takes a collection of values and returns a single value as a result.avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values
• Aggregate operation in relational algebra G1, G2, …, Gn ℱ F1( A1), F2( A2),…, Fn( An) (E)– E is any relational-algebra expression– G1, G2 …, Gn is a list of attributes on which to group (can be
empty)– Each Fi is an aggregate function– Each Ai is an attribute name
Aggregate Operation – Example
Relation r:A B
C
77310
ℱ sum(c) (r)sum-C
27ℱ
Aggregate Operation – Example
Relation account grouped by branch-name:
branch-name ℱ sum(balance) (account)
branch-name account-number balance
PerryridgePerryridgeBrightonBrightonRedwood
A-102A-201A-217A-215A-222
400900750750700
branch-name balance
PerryridgeBrightonRedwood
13001500700
ℱ
Assignment Operation
• The assignment operation () provides a convenient way to express complex queries. – Write query as a sequential program consisting
of a series of assignments followed by an expression whose value is displayed as a result of the query.
– Assignment must always be made to a temporary relation variable.
• Example: Write r s as temp1 R-S (r) temp2 R-S ((temp1 x s) – R-S,S (r))result = temp1 – temp2
References32
www.cs.kent.edu/~yuri/cs43005.s2008/lectures/lecture2.ppt
inst.eecs.berkeley.edu/~cs186/sp06/lecs/lecture8Alg.ppt
and others
Exercises33
Given the following relational schema: STUDENT (sid, sname) COURSE (ccode,cname, ccredithr, lno) LECTURER (lno, lname) ENROLMENT(sid,ccode) Write the relational algebra expression to extract
the following: The name of all courses offered The name of 3 credit hours’ courses
Exercises34
The name of courses along with the name of the lecturers who teach them
The name of courses and the name of lecturers
The name of lecturers who teach 3 cr hr’s courses as well as 4 cr hr’s courses
The name of lecturers who teach 3 cr hr’s courses but not 4 cr hr’s courses
Exercises35
The name of students who take SCB1033 and SCB1043.
The name of students who take all courses taught by Dr Samuel.
The maximum number of credit hours offered The number of students enrolled in each
course The number of credit hrs taken by a student
with id 1121.