boyce-codd nf & lossless decomposition professor sin-min lee

64
Boyce-Codd NF & Los sless Decomposition Professor Sin-Min Lee

Post on 22-Dec-2015

236 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Boyce-Codd NF & Lossless Decomposition

Professor Sin-Min Lee

Page 2: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Armstrong’s Axioms

For computing the set of FDs that follow a given FD, the following rules called Armstrong’s axioms are useful:

1. Reflexivity: If B A, then A B

2. Augmentation: If A B, then A C B C Note also that if A B, then A C B for any set of attributes C.

3. Transitivity: If A B and B C then A C

Page 3: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Example of 3NF but not BCNF

R(A B C D) 1 2 1 2 1 3 2 1 2 1 1 1 1 3 2 2 2 1 3 2 Is this table BCNF?

Page 4: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

FD graph of R is A B C D B A AC B *BD A, C *CD A, B Prime Attribute = key attributes are

{B,C,D} B A and B is not a key, So R is not 3NF R is not BCNF

Page 5: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Projecting FDs

Given a relation R (A,B,C,D) and F(R) = {AB, BC, CD}.

Suppose S is projected from R as S(A,C,D). What is F(S). To compute F(S), start by computing the closures of all attributes in S.

In R, A+ = {AB, AC, AD}In S, A+ = {AC, AD} C+ = {CD} and D+ = {D}

Since A+ contains all attributes of S, it is not required to compute (AC)+, (AD)+ or (ACD)+.

Page 6: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Inference Rules for FD’s

Is equivalent to

Splitting rule and Combining rule

A1 ... Am B1 ... Bm

A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm

A1, A2, …, An B1

A1, A2, …, An B2

. . . . .A1, A2, …, An Bm

A1, A2, …, An B1

A1, A2, …, An B2

. . . . .A1, A2, …, An Bm

Page 7: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Inference Rules for FD’s(continued)

Trivial Rule

Why ?

A1 … Am

where i = 1, 2, ..., n

A1, A2, …, An AiA1, A2, …, An Ai

Page 8: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Inference Rules for FD’s(continued)

Transitive Closure Rule

If

and

then

Why ?

A1, A2, …, An B1, B2, …, BmA1, A2, …, An B1, B2, …, Bm

B1, B2, …, Bm C1, C2, …, CpB1, B2, …, Bm C1, C2, …, Cp

A1, A2, …, An C1, C2, …, CpA1, A2, …, An C1, C2, …, Cp

Page 9: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

A1 … Am B1 … Bm C1 ... Cp

Page 10: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Example (continued)

Start from the following FDs:

Infer the following FDs:

1. name color2. category department3. color, category price

1. name color2. category department3. color, category price

Inferred FDWhich Ruledid we apply ?

4. name, category name

5. name, category color

6. name, category category

7. name, category color, category

8. name, category price

Page 11: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Another Rule

If

then

Augmentation follows from trivial rules and transitivityHow ?

A1, A2, …, An BA1, A2, …, An B

A1, A2, …, An , C1, C2, …, Cp BA1, A2, …, An , C1, C2, …, Cp B

Augmentation

Page 12: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Problem: infer ALL FDs

Given a set of FDs, infer all possible FDs

How to proceed ? Try all possible FDs, apply all 3 rules

E.g. R(A, B, C, D): how many FDs are possible ?

Drop trivial FDs, drop augmented FDs Still way too many

Better: use the Closure Algorithm (next)

Page 13: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Closure of a set of Attributes

Given a set of attributes A1, …, An

The closure, {A1, …, An}+ , is the set of attributes Bs.t. A1, …, An B

Given a set of attributes A1, …, An

The closure, {A1, …, An}+ , is the set of attributes Bs.t. A1, …, An B

name colorcategory departmentcolor, category price

name colorcategory departmentcolor, category price

Example:

Closures: name+ = {name, color} {name, category}+ = {name, category, color, department, price} color+ = {color}

Page 14: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Closure Algorithm

Start with X={A1, …, An}.

Repeat until X doesn’t change do:

if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.

Start with X={A1, …, An}.

Repeat until X doesn’t change do:

if B1, …, Bn C is a FD and B1, …, Bn are all in X then add C to X.

{name, category}+ = {name, category, color, department, price}

name colorcategory departmentcolor, category price

name colorcategory departmentcolor, category price

Example:

Page 15: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Example

Compute {A,B}+ X = {A, B, }

Compute {A, F}+ X = {A, F, }

R(A,B,C,D,E,F) A, B CA, D EB DA, F B

A, B CA, D EB DA, F B

Page 16: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Using Closure to Infer ALL FDs

A, B CA, D BB D

A, B CA, D BB D

Example:

Step 1: Compute X+, for every X:

A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD

A+ = A, B+ = BD, C+ = C, D+ = DAB+ = ABCD, AC+ = AC, AD+ = ABCDABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)BCD+ = BCD, ABCD+ = ABCD

Step 2: Enumerate all FD’s X Y, s.t. Y X+ and XY = :

AB CD, ADBC, ABC D, ABD C, ACD BAB CD, ADBC, ABC D, ABD C, ACD B

Page 17: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Problem: Finding FDs

Approach 1: During Database Design Designer derives them from real-world

knowledge of users Problem: knowledge might not be available

Approach 2: From a Database Instance Analyze given database instance and find all

FD’s satisfied by that instance Useful if designers don’t get enough

information from users Problem: FDs might be artifical for the given

instance

Page 18: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Find All FDs

Student Dept Course Room

Alice CSE C++ 020

Bob CSE C++ 020

Alice EE HW 040

Carol CSE DB 045

Dan CSE Java 050

Elsa CSE DB 045

Frank EE Circuits 020

Do all FDsmake sensein practice ?

Page 19: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Answer

Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course

Course Dept, RoomDept, Room CourseStudent, Dept Course, RoomStudent, Course Dept, RoomStudent, Room Dept, Course

Do all FDsmake sensein practice ?

Page 20: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Keys

A key is a set of attributes A1, ..., An s.t. for any other attribute B, we have A1, ..., An B

A minimal key is a set of attributes which is a key and for which no subset is a key

Note: book calls them superkey and key

Page 21: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Computing Keys

Compute X+ for all sets X If X+ = all attributes, then X is a key List only the minimal keys

Note: there can be many minimal keys !

Example: R(A,B,C), ABC, BCAMinimal keys: AB and BC

Page 22: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Examples of Keys

Product(name, price, category, color)name, category pricecategory color

Keys are: {name, category} and all supersets

Enrollment(student, address, course, room, time)student addressroom, time coursestudent, course room, time

Keys are:

Page 23: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Relational Schema Design(or Logical Schema Design)

Main idea: Start with some relational schema Find out its FD’s Use them to design a better

relational schema

Page 24: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Data Anomalies

When a database is poorly designed we get anomalies:

Redundancy: data is repeated

Update anomalies: need to change in several places

Delete anomalies: may lose data when we don’t want

Page 25: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Relational Schema Design

Anomalies:• Redundancy = repeat data• Update anomalies = Fred moves to “Bellevue”• Deletion anomalies = Joe deletes his phone number:

what is his city ?

Example: Persons with several phones

SSN Name, CitySSN Name, City

Name SSN PhoneNumber

City

Fred 123-45-6789 206-555-1234

Seattle

Fred 123-45-6789 206-555-6543

Seattle

Joe 987-65-4321 908-555-2121

Westfieldbut not SSN PhoneNumber

Page 26: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Relation DecompositionBreak the relation into two:

Name SSN City

Fred 123-45-6789 Seattle

Joe 987-65-4321 Westfield

SSN PhoneNumber

123-45-6789 206-555-1234

123-45-6789 206-555-6543

987-65-4321 908-555-2121

Anomalies have gone:• No more repeated data• Easy to move Fred to “Bellevue” (how ?)• Easy to delete all Joe’s phone number (how ?)

Name SSN PhoneNumber City

Fred 123-45-6789 206-555-1234 Seattle

Fred 123-45-6789 206-555-6543 Seattle

Joe 987-65-4321 908-555-2121 Westfield

Page 27: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Relational Schema Design

PersonbuysProduct

name

price name ssn

Conceptual Model:

Relational Model:plus FD’s

Normalization:Eliminates anomalies

Page 28: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Decompositions in General

R1 = projection of R on A1, ..., An, B1, ..., Bm

R2 = projection of R on A1, ..., An, C1, ..., Cp

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)

Page 29: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Decomposition

Sometimes it is correct:

Name PriceCategory

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera

Name Price

Gizmo 19.99

OneClick 24.99

Gizmo 19.99

NameCategory

Gizmo Gadget

OneClick Camera

Gizmo Camera

Lossless decomposition

Page 30: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Incorrect Decomposition

Sometimes it is not:

Name PriceCategory

Gizmo 19.99 Gadget

OneClick 24.99 Camera

Gizmo 19.99 Camera

NameCategory

Gizmo Gadget

OneClick Camera

Gizmo Camera

PriceCategory

19.99 Gadget

24.99 Camera

19.99 Camera

What’sincorrect ??

Lossy decomposition

Page 31: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Decompositions in General

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

If A1, ..., An B1, ..., Bm

Then the decomposition is lossless

R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)

Example: name price, hence the first decomposition is lossless

Note: don’t need necessarily A1, ..., An C1, ..., Cp

Page 32: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Normal Forms

First Normal Form = all attributes are atomic

Second Normal Form (2NF) = old and obsolete

Third Normal Form (3NF) = this lecture

Boyce Codd Normal Form (BCNF) = this lecture

Others...

Page 33: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 34: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

R (J, K, L)

F = (JK L, L K)

Two candidate keys: JK and JL

R is in 3NFJK L JK is a superkey

L K K is prime

BCNF decomposition yields:R1 (L,K), R2 (L,J)

testing for JK L requires a join

There is some redundancy in R

Page 35: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Boyce-Codd Normal Form

A simple condition for removing anomalies from relations:

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute, it should determine all the attributes of R.

A relation R is in BCNF if:

If A1, ..., An B is a non-trivial dependency

in R , then {A1, ..., An} is a key for R

A relation R is in BCNF if:

If A1, ..., An B is a non-trivial dependency

in R , then {A1, ..., An} is a key for R

Page 36: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 37: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 38: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

BCNF Decomposition Algorithm

A’s OthersB’s

R1

Is there a 2-attribute relation that isnot in BCNF ?

Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2

Until no more violations

Repeat choose A1, …, Am B1, …, Bn that violates the BNCF condition split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others]) continue with both R1 and R2

Until no more violations

R2

Page 39: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Example

What are the dependencies?SSN Name, City

What are the keys?{SSN, PhoneNumber}

Is it in BCNF?

Name SSN PhoneNumber

City

Fred 123-45-6789 206-555-1234

Seattle

Fred 123-45-6789 206-555-6543

Seattle

Joe 987-65-4321 908-555-2121

Westfield

Joe 987-65-4321 908-555-1234

Westfield

Page 40: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Decompose it into BCNF

Name SSN City

Fred 123-45-6789 Seattle

Joe 987-65-4321 Westfield

SSN PhoneNumber

123-45-6789 206-555-1234

123-45-6789 206-555-6543

987-65-4321 908-555-2121

987-65-4321 908-555-1234

SSN Name, City

Let’s check anomalies:• Redundancy ?• Update ?• Delete ?

Page 41: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Summary of BCNF Decomposition

Find a dependency that violates the BCNF condition:

A’sOthers B’s

R1 R2

Heuristics: choose B , B , … B “as large as possible”1 2 m

Decompose:

2-attribute relations are BCNF

Continue untilthere are noBCNF violationsleft.

A1, A2, …, An B1, B2, …, Bm

Page 42: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Example Decomposition Person(name, SSN, age, hairColor, phoneNumber)

SSN name, ageage hairColor

Decompose in BCNF (in class):

Step 1: find all keys (How ? Compute S+, for various sets S)

Step 2: now decompose

Page 43: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Other Example

R(A,B,C,D) A B, B C

Key: AD Violations of BCNF: A B, A C, ABC Pick A BC: split into R1(A,BC)

R2(A,D) What happens if we pick A B first ?

Page 44: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Lossless Decompositions A decomposition is lossless if we can recover: R(A,B,C)

R1(A,B) R2(A,C)

R’(A,B,C) should be the same as R(A,B,C)

R’ is in general larger than R. Must ensure R’ = R

Decompose

Recover

Page 45: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Lossless Decompositions

Given R(A,B,C) s.t. AB, the decomposition into R1(A,B), R2(A,C) is lossless

Page 46: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

3NF: A Problem with BCNF

Unit Company Product

Unit Company

Unit Product

FD’s: Unit Company; Company, Product UnitSo, there is a BCNF violation, and we decompose.

Unit Company

No FDs

Notice: we loose the FD: Company, Product Unit

Page 47: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

So What’s the Problem?

Unit Company Product

Unit Company Unit Product

Galaga99 UW Galaga99 databasesBingo UW Bingo databases

No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again (anomalies?):

Galaga99 UW databasesBingo UW databases

Violates the dependency: company, product -> unit!

Page 48: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Solution: 3rd Normal Form (3NF)

A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } is a key for R, or B is part of a key.

A relation R is in 3rd normal form if :

Whenever there is a nontrivial dependency A1, A2, ..., An Bfor R , then {A1, A2, ..., An } is a key for R, or B is part of a key.

Tradeoff:BCNF = no anomalies, but may lose some FDs3NF = keeps all FDs, but may have some anomalies

Page 49: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Purpose of Normalization

To reduce the chances for anomalies to occur in a database.

normalization prevents the possible corruption of databases stemming from what are called “insertion   anomalies," "deletion anomalies," and "update anomalies."

Page 50: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Insertion Anomaly

A failure to place a new database entry into all the places in the database where that new entry needs to be stored.

In a properly normalized database, a new entry needs to be inserted into only one place in the database

Page 51: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Deletion Anomaly

A failure to remove an existing database entry when it is time to remove that entry.

In a properly normalized database, an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database

Page 52: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee

Update anomaly

An update of a database involves modifications that may be additions, deletions, or both. Thus "update anomalies" can be either of the kinds of anomalies discussed above.

Page 53: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 54: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 55: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 56: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 57: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 58: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 59: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 60: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 61: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 62: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 63: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee
Page 64: Boyce-Codd NF & Lossless Decomposition Professor Sin-Min Lee