relational algebra “the beginning…” what is a query? why use a theoretical basis for designing...

27
Relational Algebra “The beginning…” What is a query? Why use a theoretical basis for designing ‘queries’? What is a relation? What is relational algebra (RA)? Start with some unary operators in RA Then the basic binary operators

Upload: brice-hampton

Post on 17-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Relational Algebra“The beginning…”

What is a query?

Why use a theoretical basis for designing ‘queries’?

What is a relation?

What is relational algebra (RA)?

Start with some unary operators in RA

Then the basic binary operators

A query

What do we mean by a query to a database?

A queryWhat do we mean by a query to a database?

Given one or more relations, apply operations that ultimately return a new relation.

Usually we assume the new relation will have different information than the original input relations and will combine the original relations in meaningful ways.

We can write a query accurately in English

We can accurately name the resulting relation in English

We can draw queries as a way of planning them out

SQL is the language used for talking to a

relational database management system Allows

changing the schema (“data definition”), changing the data (“data modification”), and querying

But the querying part of SQL isn't very good Confusing syntax that is hard to read Major differences from one DBMS to the next

We teach you relational algebra (RA) for querying We use a graphical representation of RA It's easy to convert RA into SQL

A relational database is a set of relations A relation is a set of rows and a set of columns A relation has a name A relation has an identifier which is a set of

columns We consistently underline columns in the identifier

Rules: A row has a value for each column in the identifier

(A row may or may not have values for columns that are not in the identifier)

No two rows have the same values in all the identifying columns

Is this a relation?

First name Last name Award Movie

Jeff Bridges Best Actor

Crazy Heart

Sandra Bullock Best Actress

The Blind Side

Kathryn Bigelow Director The Hurt Locker

Is this a relation?

Oscar

First name

Last name

Award Movie

Jeff Bridges Best Actor

Crazy Heart

Sandra Bullock Best Actress

The Blind Side

Kathryn Bigelow Director The Hurt Locker

Is this a relation?

Oscar

First name Last name Award Movie

Jeff Bridges Best Actor Crazy HeartSandra Bullock Best

ActressThe Blind Side

Kathryn Bigelow Director The Hurt LockerPaul Ottosson Sound

EditingThe Hurt Locker

Paul Ottosson Sound Mixing

The Hurt Locker

Is this a relation?

Oscar

First name Last name Award Movie

Jeff Bridges Best Actor Crazy Heart

Sandra Bullock Best Actress The Blind Side

Kathryn Bigelow Director The Hurt Locker

Paul Ottosson Sound Mixing

The Hurt Locker

Paul Ottosson Sound Mixing

Spider Man

Are these two relations the same (both named

Oscar)?

Last name Award Movie First name

Bigelow Director The Hurt Locker

Kathryn

Bridges Best Actor Crazy Heart Jeff

Bullock Best Actress The Blind Side

Sandra

First name Last name

Award Movie

Jeff Bridges Best Actor Crazy Heart

Sandra Bullock Best Actress The Blind Side

Kathryn Bigelow Director The Hurt Locker

More about relations

No two rows of data are duplicated

implied by our rule about identifying columns

No two columns have the same name

No two relations in a given database have the same name

Relational Algebra

Inputs:

one or two relations

Operators:

work on those relations (or more explicitly columns or rows)

Output:

one relation

RA OperatorsWe'll look at unary operators first

These take a relation and return a relation

projectfilter

reduceGroup

ProjectProject removes and adds non-

identifying columnsAdds computed columns –

arithmetic, string manipulation, etc.

result:as wide as number of columns named,

which must include the identifying columnsexactly as tall as original relationidentifier is same

SQL:Select columns from relation;

Project example

First name Last name Award Movie BirthYearJeff Bridges Best Actor Crazy Heart 1949Sandra Bullock Best Actress The Blind Side 1964Kathryn Bigelow Director The Hurt Locker 1951

oscar

ProjectId: ???Carry: MovieCompute: age = currentYear - BirthYear

?????

Project example

First name Last name Award Movie BirthYearJeff Bridges Best Actor Crazy Heart 1949Sandra Bullock Best Actress The Blind Side 1964Kathryn Bigelow Director The Hurt Locker 1951

oscar

ProjectId: First_name, Last_name, AwardCarry: MovieCompute: age = currentYear - BirthYear

First_name, Last_name, Award, Movie, Actor_ageof Oscar

Filter

Filter removes rows based on a conditionresult:

as wide as the original relationshorter or exactly as tall as the original

relationidentifier is same

SQL:Select * from relation where condition;

Filter example: Find each oscar award for the movie “The Blind

Side”

First name Last name Award Movie Birth year

Jeff Bridges Best Actor Crazy Heart 1949

Sandra Bullock Best Actress

The Blind Side

1964

Kathryn Bigelow Director The Hurt Locker

1951

What algorithm do you think Filter uses?

What algorithm do you think Filter uses? Trick question: depends on the DBMS and the

configuration!

We often combine project and filter

An activity you could try on your own Go to sqlzoo.net

Try tutorial 0 Reverse engineer the example queries into relational

algebra

Reduce changes the identifier, result:

narrower or the same width as the original relationpossibly shorter than the original relationunsafe!

SQL:Select distinct new_identifier from original relation

Let’s try it with the tiny database:

Practice Query #3

Reduce

Reduce example

First name Last name Award Movie

Jeff Bridges Best Actor Crazy Heart

Sandra Bullock Best Actress The Blind Side

Kathryn Bigelow Director The Hurt Locker

Paul Ottosson Sound Editing The Hurt Locker

Paul Ottosson Sound Mixing The Hurt Locker

GroupGroup changes the identifier, combining duplicate

rows using an aggregate functionresult:

narrower, the same width, or wider than the original relationpossibly shorter than the original relationunsafe!

SQL (2 basic types):Select distinct new_identifier, aggregate_function from relation group by new_identifier

Select aggregate_function from relation

Group example

First name Last name Award Movie Birth year

Jeff Bridges Best Actor Crazy Heart 1949

Sandra Bullock Best Actress The Blind Side 1964

Kathryn Bigelow Director The Hurt Locker 1951

Group example 2

First name Last name Award Birth year

Jeff Bridges Best Actor 1949

Sean Penn Best Actor 1960

Sandra Bullock Best Actress 1964

Kate Winslet Best Actress 1975

Tiny database

SQL (2 basic types):

Select distinct new_identifier, aggregate_function

from relation

group by new_identifier

Select aggregate_function

from relation

Practice queries 12, 16