lecture 8 alg

Upload: janina-pancca

Post on 03-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Lecture 8 Alg

    1/30

    By relieving the brain of all unnecessary

    work, a good notation sets it free to

    concentrate on more advanced problems,

    and, in effect, increases the mental power of

    the race.

    -- Alfred North Whitehead (1861 - 1947)

    Relational Algebra

    CS 186 Spring 2006, Lecture 8

    R & G, Chapter 4

    p

  • 7/28/2019 Lecture 8 Alg

    2/30

    Administrivia

    Midterm 1 Tues 2/21, in class Covers all material up to and including Th 2/16

    Closed bookcan bring 1 8.5x11 sheet of notes

    No calculators no cell phones

    Midterm 2 Th 3/21, in class

    Focuses on material after MT 1

    Same rules as above.

    Homework 1 2/14 Homework 2 Out after MT 1, Due 3/14

  • 7/28/2019 Lecture 8 Alg

    3/30

    Relational Query Languages

    Query languages:Allow manipulation and retrieval ofdata from a database.

    Relational model supports simple, powerful QLs:

    Strong formal foundation based on logic.Allows for much optimization.

    Query Languages != programming languages!

    QLs not expected to be Turing complete.

    QLs not intended to be used for complex calculations.

    QLs support easy, efficient access to large data sets.

  • 7/28/2019 Lecture 8 Alg

    4/30

    Formal Relational Query Languages

    Two mathematical Query Languages form thebasis for real languages (e.g. SQL), and forimplementation:

    Relational Algebra: More operational, veryuseful for representing execution plans.

    Relational Calculus: Lets users describe what

    they want, rather than how to compute it.(Non-procedural, declarative.)

    Understanding Algebra & Calculus is key to

    understanding SQL, query processing!

  • 7/28/2019 Lecture 8 Alg

    5/30

    Preliminaries

    A query is applied to relation instances, and theresult of a query is also a relation instance.

    Schemasof input relations for a query are fixed (butquery will run over any legal instance)

    The schema for the resultof a given query is alsofixed. It is determined by the definitions of the querylanguage constructs.

    Positional vs. named-field notation:

    Positional notation easier for formal definitions,named-field notation more readable.

    Both used in SQL

  • 7/28/2019 Lecture 8 Alg

    6/30

    Relational Algebra: 5 Basic Operations

    Selection ( ) Selects a subset ofrowsfromrelation (horizontal).

    Projection ( ) Retains only wanted columnsfrom relation (vertical).

    Cross-product (x) Allows us to combine tworelations.

    Set-difference () Tuples in r1, but not in r2.

    Union ( ) Tuples in r1 and/or in r2.

    Since each operation returns a relation, operationscan

    be composed! (Algebra is closed.)

    p

  • 7/28/2019 Lecture 8 Alg

    7/30

    Example Instances R1

    S1

    S2

    Boats

    bid bname color

    101 Interlake blue102 Interlake red

    103 Clipper green

    104 Marine red

    sid bid day

    22 101 10/10/96

    58 103 11/12/96

    sid sname rating age

    22 dustin 7 45.0

    31 lubber 8 55.558 rusty 10 35.0

    sid sname rating age28 yuppy 9 35.0

    31 lubber 8 55.5

    44 guppy 5 35.0

    58 rusty 10 35.0

  • 7/28/2019 Lecture 8 Alg

    8/30

    Projection

    page S( )2 Examples: ;

    Retains only attributes that are in the projectionlist.

    Schemaof result:

    exactly the fields in the projection list, with thesame names that they had in the input relation.

    Projection operator has to eliminate duplicates(How do they arise? Why remove them?)

    Note: real systems typically dont do duplicateelimination unless the user explicitly asks for it.(Why not?)

    p

    sname rating

    S

    ,

    ( )2

  • 7/28/2019 Lecture 8 Alg

    9/30

    Projection

    )2(, Sratingsnamep

    p

    ageS( )2

    S2

    sid sname rating age

    28 yuppy 9 35.0

    31 lubber 8 55.5

    44 guppy 5 35.0

    58 rusty 10 35.0

    sname rating

    yuppy 9

    lubber 8guppy 5rusty 10

    age

    35.055.5

  • 7/28/2019 Lecture 8 Alg

    10/30

    Selection ()

    rating S8 2( ) p sname rating rating S, ( ( )) 8 2

    Selects rows that satisfy selection condition. Result is a relation.

    Schemaof result is same as that of the input relation.

    Do we need to do duplicate elimination?

    sid sname rating age

    28 yuppy 9 35.0

    31 lubber 8 55.5

    44 guppy 5 35.0

    58 rusty 10 35.0

    sname ratingyuppy 9

    rusty 10

  • 7/28/2019 Lecture 8 Alg

    11/30

    Union and Set-Difference

    All of these operations take two input relations,which must beunion-compatible:

    Same number of fields.

    `Corresponding fields have the same type.

    For which, if any, is duplicate elimination

    required?

  • 7/28/2019 Lecture 8 Alg

    12/30

    Union

    S S1 2

    S1

    S2

    sid sname rating age

    22 dustin 7 45.0

    31 lubber 8 55.5

    58 rusty 10 35.0

    sid sname rating age

    28 yuppy 9 35.031 lubber 8 55.5

    44 guppy 5 35.0

    58 rusty 10 35.0

    sid sname rating age22 dustin 7 45.031 lubber 8 55.558 rusty 10 35.0

    44 guppy 5 35.028 yuppy 9 35.0

  • 7/28/2019 Lecture 8 Alg

    13/30

    Set Difference

    S1

    S2

    S S1 2

    S2S1

    sid sname rating age

    22 dustin 7 45.0

    31 lubber 8 55.5

    58 rusty 10 35.0

    sid sname rating age

    28 yuppy 9 35.031 lubber 8 55.5

    44 guppy 5 35.0

    58 rusty 10 35.0

    sid sname rating age22 dustin 7 45.0

    sid sname rating age

    28 yuppy 9 35.044 guppy 5 35.0

  • 7/28/2019 Lecture 8 Alg

    14/30

    Cross-Product

    S1 x R1: Each row of S1 paired with each row of R1.

    Q: How many rows in the result?

    Result schemahas one field per field of S1 and R1,with field names `inherited if possible.

    May have a naming conflict: Both S1 and R1 havea field with the same name.

    In this case, can use the renaming operator:

    ( ( , ), )C sid sid S R1 1 5 2 1 1

  • 7/28/2019 Lecture 8 Alg

    15/30

    Cross Product Example

    R1 S1

    R1 X S1 =

    sid sname rating age

    22 dustin 7 45.0

    31 lubber 8 55.5

    58 rusty 10 35.0

    sid bid day22 101 10/10/96

    58 103 11/12/96

    (sid) sname rating age (sid) bid day

    22 dustin 7 45.0 22 101 10/10/

    22 dustin 7 45.0 58 103 11/12/31 lubber 8 55.5 22 101 10/10/

    31 lubber 8 55.5 58 103 11/12/

    58 rusty 10 35.0 22 101 10/10/

    58 rusty 10 35.0 58 103 11/12/

  • 7/28/2019 Lecture 8 Alg

    16/30

    Compound Operator: Intersection

    In addition to the 5 basic operators, there areseveral additional Compound Operators

    These add no computational power to thelanguage, but are useful shorthands.

    Can be expressed solely with the basic ops.

    Intersection takes two input relations, whichmust beunion-compatible.

    Q: How to express it using basic operators?

    R S = R (R S)

  • 7/28/2019 Lecture 8 Alg

    17/30

    Intersection

    S1

    S2

    S S1 2

    sid sname rating age

    31 lubber 8 55.558 rusty 10 35.0

    sid sname rating age

    22 dustin 7 45.0

    31 lubber 8 55.5

    58 rusty 10 35.0

    sid sname rating age

    28 yuppy 9 35.031 lubber 8 55.5

    44 guppy 5 35.0

    58 rusty 10 35.0

  • 7/28/2019 Lecture 8 Alg

    18/30

    Compound Operator: Join

    Joins are compound operators involving cross product,

    selection, and (sometimes) projection. Most common type of join is a natural join (often just called

    join). R S conceptually is: Compute R X S

    Select rows where attributes that appear in both relations have equal

    values Project all unique atttributes and one copy of each of the common

    ones.

    Note: Usually done much more efficiently than this.

    Useful for putting normalized relations back together.

  • 7/28/2019 Lecture 8 Alg

    19/30

    Natural Join Example

    R1 S1

    R1 S1 =

    sid sname rating age bid day22 dustin 7 45.0 101 10/10/9658 rusty 10 35.0 103 11/12/96

    sid sname rating age

    22 dustin 7 45.0

    31 lubber 8 55.5

    58 rusty 10 35.0

    sid bid day22 101 10/10/96

    58 103 11/12/96

  • 7/28/2019 Lecture 8 Alg

    20/30

    Other Types of Joins

    Condition Join (or theta-join):

    Result schemasame as that of cross-product.

    May have fewer tuples than cross-product.

    Equi-Join: Special case: condition ccontains

    only conjunction ofequalities.

    R c S c R S ( )

  • 7/28/2019 Lecture 8 Alg

    21/30

    Theta Join Examplesid sname rating age

    22 dustin 7 45.0

    31 lubber 8 55.5

    58 rusty 10 35.0

    sid bid day

    22 101 10/10/96

    58 103 11/12/96

    R1 S1

    S1S1.sidR1.sid

    R1 =

    (sid) sname rating age (sid) bid day22 dustin 7 45.0 58 103 11/12/9631 lubber 8 55.5 58 103 11/12/96

  • 7/28/2019 Lecture 8 Alg

    22/30

    Compound Operator: Division

    Useful for expressing for all queries like:Find sids of sailors who have reservedallboats.

    For A/B attributes of B are subset of attrs of A.

    May need to project to make this happen.

    E.g., letAhave 2 fields, xand y; Bhave onlyfield y:

    A/Bcontains allxtuples such that for everyytuple in B, there is anxytuple inA.

    A B x y B( x,y A)

  • 7/28/2019 Lecture 8 Alg

    23/30

  • 7/28/2019 Lecture 8 Alg

    24/30

    Expressing A/B Using Basic Operators

    Division is not essential op; just a useful shorthand.

    (Also true of joins, but joins are so common that systemsimplement joins specially.)

    Idea: ForA/B, compute allxvalues that are not`disqualified by some yvalue in B.

    xvalue is disqualifiedif by attaching yvalue from B, weobtain an xytuple that is not inA.

    Disqualified x values: p px x A B A(( ( ) ) )

    A/B: px A( ) Disqualified x values

  • 7/28/2019 Lecture 8 Alg

    25/30

    ExamplesReserves

    Sailors

    Boats

    sid bid day

    22 101 10/10/96

    58 103 11/12/96sid sname rating age

    22 dustin 7 45.0

    31 lubber 8 55.558 rusty 10 35.0

    bid bname color101 Interlake Blue

    102 Interlake Red

    103 Clipper Green

    104 Marine Red

  • 7/28/2019 Lecture 8 Alg

    26/30

    Find names of sailors whove reserved boat #103

    Solution 1: p sname bidserves Sailors(( Re ) )

    103

    Solution 2: p sname bid

    serves Sailors( (Re ))103

  • 7/28/2019 Lecture 8 Alg

    27/30

    Find names of sailors whove reserved a red boat

    Information about boat color only available inBoats; so need an extra join:

    p sname color red Boats serves Sailors(( ' ' ) Re )

    A more efficient (???) solution:

    psname

    (psid

    ((pbid

    (color'red'

    Boats)) Res) Sailors)

    A query optimizer can find this given the first solution!

  • 7/28/2019 Lecture 8 Alg

    28/30

    Find sailors whove reserved a red or a green boat

    Can identify all red or green boats, then findsailors whove reserved one of these boats:

    ( , (

    ' ' ' '

    ))Tempboats

    color red color green

    Boats

    psname Tempboats serves Sailors( Re )

  • 7/28/2019 Lecture 8 Alg

    29/30

    Find sailors whove reserved a red and a green boat

    Previous approach wont work! Must identifysailors whove reserved red boats, sailors whovereserved green boats, then find the intersection(note that sidis a key for Sailors):

    p ( , ((' '

    ) Re ))Tempredsid color red

    Boats serves

    psname Tempred Tempgreen Sailors(( ) )

    p ( , (( ' ' ) Re ))Tempgreen sid color green Boats serves

  • 7/28/2019 Lecture 8 Alg

    30/30

    Find the names of sailors whove reserved all boats

    Uses division; schemas of the input relations to/ must be carefully chosen:

    p p( , (,

    Re ) / ( ))Tempsidssid bid

    servesbid

    Boats

    psname Tempsids Sailors( )

    To find sailors whove reserved all Interlake boats:

    / (' '

    )p bid bname Interlake

    Boats

    .....