relational algebra “the beginning…” what is a query? why use a theoretical basis for designing...
TRANSCRIPT
Relational Algebra“The beginning…”
What is a query?
Why use a theoretical basis for designing ‘queries’?
What is a relation?
What is relational algebra (RA)?
Start with some unary operators in RA
Then the basic binary operators
A queryWhat do we mean by a query to a database?
Given one or more relations, apply operations that ultimately return a new relation.
Usually we assume the new relation will have different information than the original input relations and will combine the original relations in meaningful ways.
We can write a query accurately in English
We can accurately name the resulting relation in English
We can draw queries as a way of planning them out
SQL is the language used for talking to a
relational database management system Allows
changing the schema (“data definition”), changing the data (“data modification”), and querying
But the querying part of SQL isn't very good Confusing syntax that is hard to read Major differences from one DBMS to the next
We teach you relational algebra (RA) for querying We use a graphical representation of RA It's easy to convert RA into SQL
A relational database is a set of relations A relation is a set of rows and a set of columns A relation has a name A relation has an identifier which is a set of
columns We consistently underline columns in the identifier
Rules: A row has a value for each column in the identifier
(A row may or may not have values for columns that are not in the identifier)
No two rows have the same values in all the identifying columns
Is this a relation?
First name Last name Award Movie
Jeff Bridges Best Actor
Crazy Heart
Sandra Bullock Best Actress
The Blind Side
Kathryn Bigelow Director The Hurt Locker
Is this a relation?
Oscar
First name
Last name
Award Movie
Jeff Bridges Best Actor
Crazy Heart
Sandra Bullock Best Actress
The Blind Side
Kathryn Bigelow Director The Hurt Locker
Is this a relation?
Oscar
First name Last name Award Movie
Jeff Bridges Best Actor Crazy HeartSandra Bullock Best
ActressThe Blind Side
Kathryn Bigelow Director The Hurt LockerPaul Ottosson Sound
EditingThe Hurt Locker
Paul Ottosson Sound Mixing
The Hurt Locker
Is this a relation?
Oscar
First name Last name Award Movie
Jeff Bridges Best Actor Crazy Heart
Sandra Bullock Best Actress The Blind Side
Kathryn Bigelow Director The Hurt Locker
Paul Ottosson Sound Mixing
The Hurt Locker
Paul Ottosson Sound Mixing
Spider Man
Are these two relations the same (both named
Oscar)?
Last name Award Movie First name
Bigelow Director The Hurt Locker
Kathryn
Bridges Best Actor Crazy Heart Jeff
Bullock Best Actress The Blind Side
Sandra
First name Last name
Award Movie
Jeff Bridges Best Actor Crazy Heart
Sandra Bullock Best Actress The Blind Side
Kathryn Bigelow Director The Hurt Locker
More about relations
No two rows of data are duplicated
implied by our rule about identifying columns
No two columns have the same name
No two relations in a given database have the same name
Relational Algebra
Inputs:
one or two relations
Operators:
work on those relations (or more explicitly columns or rows)
Output:
one relation
RA OperatorsWe'll look at unary operators first
These take a relation and return a relation
projectfilter
reduceGroup
ProjectProject removes and adds non-
identifying columnsAdds computed columns –
arithmetic, string manipulation, etc.
result:as wide as number of columns named,
which must include the identifying columnsexactly as tall as original relationidentifier is same
SQL:Select columns from relation;
Project example
First name Last name Award Movie BirthYearJeff Bridges Best Actor Crazy Heart 1949Sandra Bullock Best Actress The Blind Side 1964Kathryn Bigelow Director The Hurt Locker 1951
oscar
ProjectId: ???Carry: MovieCompute: age = currentYear - BirthYear
?????
Project example
First name Last name Award Movie BirthYearJeff Bridges Best Actor Crazy Heart 1949Sandra Bullock Best Actress The Blind Side 1964Kathryn Bigelow Director The Hurt Locker 1951
oscar
ProjectId: First_name, Last_name, AwardCarry: MovieCompute: age = currentYear - BirthYear
First_name, Last_name, Award, Movie, Actor_ageof Oscar
Filter
Filter removes rows based on a conditionresult:
as wide as the original relationshorter or exactly as tall as the original
relationidentifier is same
SQL:Select * from relation where condition;
Filter example: Find each oscar award for the movie “The Blind
Side”
First name Last name Award Movie Birth year
Jeff Bridges Best Actor Crazy Heart 1949
Sandra Bullock Best Actress
The Blind Side
1964
Kathryn Bigelow Director The Hurt Locker
1951
We often combine project and filter
An activity you could try on your own Go to sqlzoo.net
Try tutorial 0 Reverse engineer the example queries into relational
algebra
Reduce changes the identifier, result:
narrower or the same width as the original relationpossibly shorter than the original relationunsafe!
SQL:Select distinct new_identifier from original relation
Let’s try it with the tiny database:
Practice Query #3
Reduce
Reduce example
First name Last name Award Movie
Jeff Bridges Best Actor Crazy Heart
Sandra Bullock Best Actress The Blind Side
Kathryn Bigelow Director The Hurt Locker
Paul Ottosson Sound Editing The Hurt Locker
Paul Ottosson Sound Mixing The Hurt Locker
GroupGroup changes the identifier, combining duplicate
rows using an aggregate functionresult:
narrower, the same width, or wider than the original relationpossibly shorter than the original relationunsafe!
SQL (2 basic types):Select distinct new_identifier, aggregate_function from relation group by new_identifier
Select aggregate_function from relation
Group example
First name Last name Award Movie Birth year
Jeff Bridges Best Actor Crazy Heart 1949
Sandra Bullock Best Actress The Blind Side 1964
Kathryn Bigelow Director The Hurt Locker 1951
Group example 2
First name Last name Award Birth year
Jeff Bridges Best Actor 1949
Sean Penn Best Actor 1960
Sandra Bullock Best Actress 1964
Kate Winslet Best Actress 1975