tallahassee, florida, 2014 cop4710 database systems relational algebra fall 2014

28
Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Upload: antonia-bronson

Post on 14-Dec-2015

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Tallahassee, Florida, 2014

COP4710

Database Systems

Relational Algebra

Fall 2014

Page 2: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Why Do We Learn This?

• Querying the database: specify what we want from our database– Find all the people who earn more than

$1,000,000 and pay taxes in Tallahassee• Could write in C++/Java, but a bad idea

• Instead use high-level query languages:

– Theoretical: Relational Algebra, Datalog

– Practical: SQL

– Relational algebra: a basic set of operations on relations that provide the basic principles

2

Page 3: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

What is an “Algebra”?

• Mathematical system consisting of:– Operands --- variables or values from which new

values can be constructed

– Operators --- symbols denoting procedures that construct new values from given values

• Examples– Arithmetic(Elementary) algebra, linear algebra,

Boolean algebra ……• What are operands?

• What are operators?

3

Page 4: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

What is Relational Algebra?

• An algebra – Whose operands are relations or variables that

represent relations

– Whose operators are designed to do common things that we need to do with relations in a database

• relations as input, new relation as output

– Can be used as a query language for relations

4

Page 5: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Relational Operators at a Glance

• Five basic RA operations:– Basic Set Operations

• union, difference (no intersection, no complement)

– Selection: s– Projection: p – Cartesian Product: X

• When our relations have attribute names:– Renaming: r

• Derived operations:– Intersection, complement– Joins (natural join, equi-join, theta join, semi-join,

……)

5

Page 6: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Set Operations

• Union: all tuples in R1 or R2, denoted as R1 U R2 – R1, R2 must have the same schema– R1 U R2 has the same schema as R1, R2– Example:

• Active-Employees U Retired-Employees

– If any, is duplicate elimination required?

• Difference: all tuples in R1 and not in R2, denoted as R1 – R2– R1, R2 must have the same schema– R1 - R2 has the same schema as R1, R2 – Example

• All-Employees - Retired-Employees

6

Page 7: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Selection

• Returns all tuples which satisfy a condition, denoted as sc(R)– c is a condition: =, <, >, AND, OR, NOT– Output schema: same as input schema– Find all employees with salary more than

$40,000:• sSalary > 40000 (Employee)

7

SSN Name Dept-ID Salary

111060000 Alex 1 30K

754320032 Bob 1 32K

983210129 Chris 2 45K

SSN Name Dept-ID Salary

983210129 Chris 2 45K

Page 8: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Projection

• Unary operation: returns certain columns, denoted as PA1,…,An (R)– Eliminates duplicate tuples !– Input schema R(B1, …, Bm)– Condition: {A1, …, An} {B1, …, Bm}– Output schema S(A1, …, An)

• Example: project social-security number and names:– P SSN, Name (Employee)

8

SSN Name Dept-ID Salary

111060000 Alex 1 30K

754320032 Bob 1 32K

983210129 Chris 2 45K

SSN Name

111060000 Alex

754320032 Bob

983210129 Chris

Page 9: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Selection vs. Projection

• Think of relation as a table

– How are they similar?

– How are they different?

• Horizontal vs. vertical?

• Duplicate elimination for both? What about in real systems?

– Why do you need both?

9

Page 10: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Cartesian Product

• Each tuple in R1 with each tuple in R2, denoted as R1 x R2– Input schemas R1(A1,…,An), R2(B1,…,Bm)

– Output schema is S(A1, …, An, B1, …, Bm)• Two relations are combined!

– Very rare in practice; but joins are very common

– Example: Employee x Dependent

10

Page 11: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Example

11

SSN Name

111060000 Alex

754320032 Brandy

Employee-SSN Dependent-Name

111060000 Chris

754320032 David

Employee Dependent

SSN Name Employee-SSN Dependent-Name

111060000 Alex 111060000 Chris

111060000 Alex 754320032 David

754320032 Brandy 111060000 Chris

754320032 Brandy 754320032 David

Employee x Dependent

Page 12: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Renaming

• Does not change the relational instance, denoted as Notation: r S(B1,…,Bn) (R)

• Changes the relational schema only– Input schema: R(A1, …, An)– Output schema: S(B1, …, Bn)

• Example:r

Soc-sec-num, firstname(Employee)

12

SSN Name

111060000 Alex

754320032 Bob

983210129 Chris

Soc-sec-num firstname

111060000 Alex

754320032 Bob

983210129 Chris

Page 13: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Set Operations: Intersection

• Intersection: all tuples both in R1 and in R2, denoted as R1 R2– R1, R2 must have the same schema

– R1 R2 has the same schema as R1, R2

– Example• UnionizedEmployees RetiredEmployees

• Intersection is derived:– R1 R2 = R1 – (R1 – R2) why ?

13

Page 14: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Theta Join

• A join that involves a predicate , q denoted as R1 q

R2 – Input schemas: R1(A1,…,An), R2(B1,…,Bm)

– Output schema: S(A1,…,An,B1,…,Bm)

– Derived operator:

R1 q R2 = s q (R1 x R2)

1. Take the product R1 x R2

2. Then apply SELECTC to the result

– As for SELECT, C can be any Boolean-valued condition

14

Page 15: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Theta Join: Example

15

Name AddressAJ's 1800 Tennessee

Michael's Pub 513 Gaines

Bar Beer Price

AJ’s Bud 2.5

AJ’s Miller 2.75

Michael’s Pub Bud 2.5

Michael’s Pub Corona 3.0

Bar Sells

BarInfo := Sells Sells.Bar=Bar.Name Bar

Bar Beer Price Name Address

AJ’s Bud 2.5 AJ's 1800 Tennessee

AJ’s Miller 2.75 AJ's 1800 Tennessee

Michael’s Pub Bud 2.5 Michael's Pub 513 Gaines

Michael’s Pub Corona 3.0 Michael's Pub 513 Gaines

Page 16: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Natural Join

• Notation: R1 R2• Input Schema: R1(A1, …, An), R2(B1, …, Bm)• Output Schema: S(C1,…,Cp)

– Where {C1, …, Cp} = {A1, …, An} U{B1, …, Bm}

• Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes:– {A1,…,An} {B1,…, Bm} (called the join

attributes)

16

Page 17: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Natural Join: Examples

17

SSN Name

111060000 Alex

754320032 Brandy

SSN Dependent-Name

111060000 Chris

754320032 David

Employee Dependent

SSN Name Dependent-Name

111060000 Alex Chris

754320032 Brandy David

Employee Dependent =P SSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent)

Page 18: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Natural Join: Examples

18

A BX YX ZY ZZ V

B C

Z U

V W

Z V

A B C

X Z U

X Z V

Y Z U

Y Z V

Z V W

R S

R S

Page 19: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Equi-join

• Special case of theta join: condition c contains only conjunction of equalities– Result schema is the same as that of Cartesian

product

– May have fewer tuples than Cartesian product

– Most frequently used in practice:

R1 =A B R2

– Natural join is a particular case of equi-join

– A lot of research on how to do it efficiently

19

Page 20: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

A Joke About Join

A join query walks up to two tables

in a restaurant and asks :“Mind if I join you?”

20

Page 21: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Division

• A/B for A(x,y) and B(y)– Contains all tuples (x) such that for every y tuple

in B, there is an xy tuple in A

– Useful for expressing “for all” queries– For A/B, compute all x values that are not

‘disqualified’ by some y value in B• x value is disqualified if by attaching y value from B, we obtain an

xy tuple that is not in A

21

),( AyxByxBA

1. Disqualified x values: )))((( ABAxx

2. A/B: )(Ax Disqualified x values

Page 22: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Division: Example

22

sno pno s1 p1 s1 p2 s1 p3 s1 p4 s2 p1 s2 p2 s3 p2 s4 p2 s4 p4

pnop2

pnop2p4

pnop1p2p4

snos1s2s3s4

snos1s4

snos1

A

B1B2

B3

A/B1 A/B2 A/B3

Page 23: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Building Complex Expressions

• Algebras allow us to express sequences of operations in a natural way– Example

• In arithmetic algebra: (x + 4)*(y - 3)

– Relational algebra allows the same

• Three notations, just as in arithmetic:1. Sequences of assignment statements2. Expressions with several operators3. Expression trees

23

Page 24: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Sequences of Assignments

• Create temporary relation names• Renaming can be implied by giving relations a list of

attributes• Example: R3 := R1 JOINC R2 can be written:

R4 := R1 x R2R3 := SELECTC (R4)

24

Page 25: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Expressions with Several Operators

• Example: the theta-join R3 := R1 JOINC R2 can be

written: R3 := SELECTC (R1 x R2)

• Precedence of relational operators:1. Unary operators --- select, project, rename --- have highest

precedence, bind first

2. Then come products and joins

3. Then intersection

4. Finally, union and set difference bind last

• But you can always insert parentheses to force the order you desire

25

Page 26: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Expression Trees

• Leaves are operands – either variables standing for relations or

particular constant relations

• Interior nodes are operators, applied to their child or children

26

Page 27: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Expression Tree: Examples

Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than $3

27

Bars Sells

SELECTaddr = “Tennessee St.” SELECT price<3 AND beer=“Bud”

PROJECTname

RENAMER(name)

PROJECTbar

UNION

Page 28: Tallahassee, Florida, 2014 COP4710 Database Systems Relational Algebra Fall 2014

Summary of Relational Algebra

• Why bother ? Can write any RA expression directly in C++/Java, seems easy– Two reasons:

• Each operator admits sophisticated implementations (think of and s C)

• Expressions in relational algebra can be rewritten: optimized

s(age >= 30 AND age <= 35)(Employees)– Method 1: scan the file, test each employee– Method 2: use an index on age

Employees Relatives– Iterate over Employees, then over Relatives? Or iterate over

Relatives, then over Employees?– Sort Employees, Relatives, do “merge-join”– “hash-join”– etc.

28