marcus klang ([email protected]) normalization...

47
Normalization Anomalies Functional Dependencies Closures Key Computation Projecting Relations BCNF Reconstructing Information Other Normal Forms Marcus Klang ([email protected]) Normalization 1 / 47

Upload: others

Post on 20-Feb-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

NormalizationAnomalies

Functional Dependencies

Closures

Key Computation

Projecting Relations

BCNF

Reconstructing Information

Other Normal Forms

Marcus Klang ([email protected]) Normalization 1 / 47

Page 2: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Normalization, Background

This “feels right”:titleyearlengthfilmType

Movie

nameStar0..*0..* stars-in

This doesn’t:

titleyearlengthfilmTypestarName

Movie

What happens if we convert the second model into a relation with theschema Movies(title, year, length, filmType, starName)?Anomalies occur — see next slide.

Marcus Klang ([email protected]) Normalization 2 / 47

Page 3: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Anomalies

starName

Dana CarverEmilio EstevezHarrison FordMark HamillCarrie Fisher

Mike Myerscolor

color

colorcolor

color

filmType

color95Wayne's World 1992

Star Wars 1977 1241977Star Wars 124

95Wayne's World 19921991Mighty Ducks 104

Star Wars 1241977lengthyeartitle

Redundancy — information is repeated in several tuples (length,filmType).

Update anomalies — you must be careful to change every occurrenceof a value (the length of Star Wars must be changed in three places).

Deletion anomalies — if a set of values becomes empty we may loseother information (delete Estevez ⇒ all information about MightyDucks is lost).

Marcus Klang ([email protected]) Normalization 3 / 47

Page 4: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Normalization

If we have a relation, e.g., Movies on the previous slide, there exists aformal procedure to:

1 discover anomalies, and

2 decompose (“split”) the relation into two (or more) relations withoutanomalies.

This procedure is called normalization. Normalization builds on the theoryof functional dependencies.

(In the example on the previous slides, common sense would have led usright — it seems “unnatural” to put the starName attribute in theMovies entity set. However, there are other examples where commonsense may not suffice.)

Marcus Klang ([email protected]) Normalization 4 / 47

Page 5: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Functional Dependencies

A functional dependency, FD, on a relation R is a statement of the form:

Definition

If two tuples of R agree on the attributes A1,A2, . . . ,An, then they mustalso agree on another attribute, B. Notation:

A1A2 . . .An → B

A functional dependency is a constraint on a schema and must hold for allinstances of a schema. Example:

Persons(persNo, name, address)

persNo → namepersNo → address

or, shorter:

persNo → name address

Marcus Klang ([email protected]) Normalization 5 / 47

Page 6: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

FD Example

Relation Movies(title, year, length, filmType, studioName,

starName). The following FD’s hold:

title year → lengthtitle year → filmTypetitle year → studioName

or:

title year → length filmType studioName

but the following FD does not hold:

title year → starName // wrong!

This is because there may be several stars in a movie.

Marcus Klang ([email protected]) Normalization 6 / 47

Page 7: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

FD’s Are About the Schema

Notice again that you cannot find FD’s by looking at one specific instanceof a relation. FD’s are semantic properties that concern the meaning ofthe attributes.

Example: by looking at the instance of the Movies schema below, youmight be led to believe that the FD title → filmType holds. This is nottrue (e.g., there are three versions of King Kong, two in color and one inblack-and-white).

starName

Dana CarverEmilio EstevezHarrison FordMark HamillCarrie Fisher

Mike Myerscolor

color

colorcolor

color

filmType

color95Wayne's World 1992

Star Wars 1977 1241977Star Wars 124

95Wayne's World 19921991Mighty Ducks 104

Star Wars 1241977lengthyeartitle

Marcus Klang ([email protected]) Normalization 7 / 47

Page 8: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Keys of Relations

We have already defined the concept of a key for a relation informally. Aformal definition:

Definition

A set of one or more attributes A1,A2, . . . ,An is a key for a relation R if:

1 Those attributes functionally determine all other attributes of R.

2 No proper subset of A1,A2, . . . ,An functionally determines all otherattributes of R.

The last point means that a key must be minimal.

Example: {title, year, starName} is a key for the relation Movies.

A relation may have more than one key. In that case, one of the keys ischosen as the primary key. A set of attributes that contains a key is calleda superkey (“superset of key”).

Marcus Klang ([email protected]) Normalization 8 / 47

Page 9: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Simple Rules About FD’s

1 Transitivity (the most important rule):

A→ B ∧ B → C ⇒ A→ C

2 Splitting/combining (just a convenient notation):

A→ B ∧ A→ C ⇔ A→ BC

3 Trivial dependencies that always hold, where the right side of an FDis a subset of the left side:

AB → A

Marcus Klang ([email protected]) Normalization 9 / 47

Page 10: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Closure of a Set of Attributes

Definition

The closure of a set of attributes, {A1,A2, . . . ,An}, under a set S of FD’sis the set of attributes B such that A1A2 . . .An → B. That is,A1A2 . . .An → B follows from the FD’s in S .

The closure of {A1,A2, . . . ,An} is denoted {A1,A2, . . . ,An}+.

Closures are used for finding keys (slide 12). The algorithm for computingclosures works by adding attributes on the right side of FD’s to an initialset.

Example on next slide.

Marcus Klang ([email protected]) Normalization 10 / 47

Page 11: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Closure Computation, Example

Relation R(A,B,C ,D,E ,F ).

FD’s: AB → C , BC → AD, D → E , CF → B.

Compute X = {A,B}+:

1 Start with X = {A,B}2 AB → C , so C can be added, X = {A,B,C}3 BC → AD, so AD can be added, X = {A,B,C ,D}4 D → E , so E can be added, X = {A,B,C ,D,E}5 Nothing more can be added, so {A,B}+ = {A,B,C ,D,E}

From this we can infer that AB → D and AB → E follows from the initialset of FD’s. (But not AB → F , so {A,B} is not a key for R.)

Marcus Klang ([email protected]) Normalization 11 / 47

Page 12: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Key Computation I

The same relation and FD’s as in the previous example:

R(A,B,C ,D,E ,F )

FD1. AB → CFD2. BC → ADFD3. D → EFD4. CF → B

What are the keys in this relation? To answer this you have to computethe closures of all subsets of attributes. The keys are the subsets whoseclosure contains all five attributes.

Initial observation: F is not on the right-hand side of any FD. This meansthat all keys must contain F .

One-attribute subsets containing F :

{F}+ = {F}

Marcus Klang ([email protected]) Normalization 12 / 47

Page 13: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Key Computation II

Two-attribute subsets containing F :

{AF}+ ⇒ {AF}{BF}+ ⇒ {BF}{CF}+ ⇒ {CF} FD4⇒ {CFB} FD2⇒ {CFBAD} FD3⇒ {CFBADE}{DF}+ ⇒ {DF} FD3⇒ {DFE}{EF}+ ⇒ {EF}

This shows that {CF} is a key. So, when we examine three-attributesubsets we don’t have to consider subsets that contain {CF} (such subsetswould be superkeys).

Marcus Klang ([email protected]) Normalization 13 / 47

Page 14: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Key Computation III

Three-attribute subsets containing F but not C :

{ABF}+ ⇒ {ABFCDE}{ADF}+ ⇒ {ADFE}{AEF}+ ⇒ {AEF}{BDF}+ ⇒ {BDFE}{BEF}+ ⇒ {BEF}{DEF}+ ⇒ {DEF}

{ABF} is a key. The four-attribute subsets {ADEF} and {BDEF} are notkeys, so the keys are {CF} and {ABF}.

Marcus Klang ([email protected]) Normalization 14 / 47

Page 15: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Normalization again

From slide 4. There exists a formal procedure to:

1 discover anomalies, and

2 decompose (“split”) the relation into two (or more) relations withoutanomalies.

This procedure is called normalization.

There are several “normal forms” (1st, 2nd, 3rd, Boyce-Codd, 4th, . . . ).The most important is Boyce-Codd normal form (BCNF).

Most books start by first defining the first normal form, then thesecond, . . . Our book defines BCNF directly. Advantages with this:

BCNF removes most anomalies, and

it is easy to check if a relation is in BCNF.

Marcus Klang ([email protected]) Normalization 15 / 47

Page 16: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Anomalies again

From slide 3:

starName

Dana CarverEmilio EstevezHarrison FordMark HamillCarrie Fisher

Mike Myerscolor

color

colorcolor

color

filmType

color95Wayne's World 1992

Star Wars 1977 1241977Star Wars 124

95Wayne's World 19921991Mighty Ducks 104

Star Wars 1241977lengthyeartitle

Redundancy — information is repeated in several tuples (length,filmType).

Update anomalies — you must be careful to change every occurrenceof a value (the length of Star Wars must be changed in three places).

Deletion anomalies — if a set of values becomes empty we may loseother information (delete Estevez ⇒ all information about MightyDucks is lost).

Marcus Klang ([email protected]) Normalization 16 / 47

Page 17: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

BCNF, Boyce-Codd Normal Form

Definition

A relation R is in BCNF if and only if: whenever there is a nontrivial FDA1A2 . . .An → B for R, it is the case that A1,A2, . . . ,An is a superkey forR.

Example. The relation Movies(title, year, length, filmType,

starName) has the FD’s:

title year starName → length filmTypetitle year → length filmType

{title, year, starName} is the key.

{title, year} is not a superkey, i.e., it is not a superset of {title,year, starName}.

Movies is not in BCNF.

Marcus Klang ([email protected]) Normalization 17 / 47

Page 18: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

A Relation in BCNF

Consider the relation Movies1(title, year, length, filmType), i.e.,Movies without the starName. This relation has the only FD:

title year → length filmType

{title, year} is the key. There are no other non-trivial FD’s, so Movies1

is in BCNF.

Marcus Klang ([email protected]) Normalization 18 / 47

Page 19: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Decomposition into BCNF

By repeatedly applying suitable decompositions, we can split any relationinto smaller relations that are in BCNF.

Important: it must be possible to reconstruct the original relation instanceexactly by joining the decomposed relation instances. The reconstructionwill be shown later.

The following decomposition algorithm meets this goal:

BCNF decomposition

1 Start with a BCNF-violating FD, A1A2 . . .An → B1B2 . . .Bm.Optionally expand the right-hand side as much as possible (closure).

2 Create a new relation with all the attributes of the FD, i.e., all theA’s and all the B’s.

3 Create a new relation with the left-hand side of the FD, i.e., all theA’s, plus all the attributes not involved in the FD.

4 Repeat if necessary.

Marcus Klang ([email protected]) Normalization 19 / 47

Page 20: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

BCNF Decomposition, Example

Movies(title, year, length, filmType, starName).

BCNF-violating FD: title year → length filmType

New relation: Movies1(title, year, length, filmType)

New relation: Movies2(title, year, starName)

We have already checked that Movies1 is in BCNF. Movies2 is also inBCNF — it contains no FD’s at all, so the key is all three attributes.

Marcus Klang ([email protected]) Normalization 20 / 47

Page 21: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Another BCNF Decomposition

Consider the relation

MovieStudios(title, year, length, filmType, studioName,studioAddr)

The only key is {title, year}.

An obvious FD is studioName → studioAddr, and studioName is not asuperkey. Decomposition:

BCNF-violating FD: studioName → studioAddr

New relation: MovieStudios1(studioName, studioAddr)

New relation: MovieStudios2(title, year, length,

filmType, studioName)

Marcus Klang ([email protected]) Normalization 21 / 47

Page 22: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

What Causes non-BCNF?

The causes of BCNF-violations:

Putting a many-many relationship in one relation (the first example,with Movies containing star names).

Transitive dependencies (the second example, where studioName →studioAddr).

Marcus Klang ([email protected]) Normalization 22 / 47

Page 23: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Projecting Relations (1)

An unnormalized relation is normalized by splitting the relation in two (ormore) relations. This is done by eliminating certain attributes from therelation schema. It is called projection.

Question: what FD’s hold in the projected relation? Example:

R(A,B,C ,D) with FD’s A→ B, B → C , C → D.

B is removed from R. What FD’s hold in the new relation S(A,C ,D)?

We compute S ’s FDs by taking the closure of all subsets of S ’s attributesover R’s FD’s. For each closure of a set X , we add the FD X → E that isin X+ and in the schema of S , but not in X . However, the procedure canoften be simplified.

Marcus Klang ([email protected]) Normalization 23 / 47

Page 24: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Projecting Relations (2)

We start with all singleton sets:

{A}+ = {A,B,C ,D} gives A→ C , A→ D{C}+ = {C ,D} gives C → D{D}+ = {D} gives nothing

Since {A}+ includes all attributes of S , any superset of {A} will not yieldany more information. Thus, we don’t need to test any more set containingA. (If we have A→ D, then it’s not interesting with AC → D as well)

This gives us only one doubleton set:

{C ,D}+ = {C ,D} gives nothing

Hence, the FD’s for S are: A→ C , A→ D, C → D.

Observe that A→ C and A→ D hold in the projected relation.

Marcus Klang ([email protected]) Normalization 24 / 47

Page 25: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

An Important Note on Projecting

Remember that FD’s are semantic statements about the data in theschema. FD’s hold regardless of the decomposition of data into relationsand cannot disappear just because data items are split over severalrelations.

The previous example again:

R(A,B,C ,D) with FD’s A→ B, B → C , C → D.

B is removed from R. What FD’s hold in the new relation S(A,C ,D)?

Wrong reasoning

“A→ B cannot hold since B is not in S , B → C cannot hold since B isnot in S , so C → D is the only FD that holds in S .”

You must take dependencies that have been derived from the transitiverule into account!

Marcus Klang ([email protected]) Normalization 25 / 47

Page 26: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Another Projection Example

Assume that we have the following relation and FDs:

MS(title, year, length, filmType, studioName, studioAddr)title year → length filmType studioNamestudioName → studioAddr

If we project this relation into a new relation M2 by removing the attributestudioName, we get:

M2(title, year, length, filmType, studioAddr)title year → length filmTypetitle year → studioAddr

Note the second FD for M2, which we get from the transitive dependencyin MS. This makes sense, since if we know the title and year of a movie,then we also know the address of the studio who owns it.

Marcus Klang ([email protected]) Normalization 26 / 47

Page 27: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Common Sense I

We started our discussion of normalization by saying that the followingE/R diagram “feels right”, but then we did not take it as a basis for therelations:

titleyearlengthfilmType

Movie

nameStar0..*0..* stars-in

If we had followed the rules for translating an E/R model into relations wewould have ended up with the following relations:

Movies(title, year, length, filmType)Stars(starName) [superfluous if every star is in a movie]StarsIn(title, year, starName)

i.e., exactly the same relations as Movies1 and Movies2 on slide 20.

Marcus Klang ([email protected]) Normalization 27 / 47

Page 28: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Common Sense II

The example with movies and their owning studios should be modeled likethis:

titleyearlengthfilmType

Movie

studioNamestudioAddr

Studio10..* owns

By the rules, this would have resulted in the following relations:

Movies(title, year, length, filmType, studioName)Studios(studioName, studioAddr)

i.e., exactly the same relations as MovieStudios2 and MovieStudios1 onslide 21.

Marcus Klang ([email protected]) Normalization 28 / 47

Page 29: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Common Sense III

Conclusions:

Careful E/R modeling is essential for proper understanding of aproblem.

But careful E/R modeling also results in relations that are betternormalized than if you start with relations directly.

This is not to say that careful E/R modeling always results in normalizedrelations. Consider the following model, which “feels right” but gives arelation that isn’t in BCNF:

pNbrnamestreetAddrpostCodecity

Person

Marcus Klang ([email protected]) Normalization 29 / 47

Page 30: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Reconstructing Information

We earlier said that “we must be able to reconstruct the original relationinstance exactly from the decomposed relation instances”. Suchdecompositions are called “lossless”.

The reconstruction is performed by joining two relations. Two tuplescan be joined if they agree on the values of an attribute (or values of setsof attributes).

The decomposition algorithm presented earlier, which is based on FD’s,yields relations that may be reconstructed by joining on the attributes onthe left-hand sides of the FD.

Other “ad hoc” algorithms may yield relations that, when joined,contain spurious (false) tuples (“lossy” decompositions).

Marcus Klang ([email protected]) Normalization 30 / 47

Page 31: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Testing for Lossless Join

In the book the chase algorithm, which tests for lossless join, is described.Not part of the course.

A simpler test, which can be used only for testing decomposition intotwo relations, is the following:

Lossless Test

A decomposition of a relation in two relations R1 and R2 is lossless if:

R1 ∩ R2 is a superkey of R1, or

R1 ∩ R2 is a superkey of R2.

Marcus Klang ([email protected]) Normalization 31 / 47

Page 32: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

A Reconstruction Example

A relation R(A,B,C ) with only one FD:

FD1. A→ B

R is not in BCNF: {A,C} is the only key, FD1 violates the BCNFcondition. A decomposition of R that is not done according to the BCNFrules is R1(A,B) and R2(B,C ). Both R1 and R2 are in BCNF.

Example projection and reconstruction:

3C

4221 2

BA

221 2

BA3C

422B

422322

3C

4211 2

BAproject⇒

reconstruct(join on B)⇒

The tuples (1, 2, 4) and (2, 2, 3) were not in the original relation and arespurious. Exercise: decompose according to the BCNF rules and thencheck the same example.

Marcus Klang ([email protected]) Normalization 32 / 47

Page 33: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

More Reconstruction

The same example as on the previous slide, with better names for theattributes. People have names and own cars. A person may own manycars, and a car may be owned by many persons. This is described by therelation R(pNo, name, carNo), with the FD:

FD1. pNo → name

{pNo, carNo} is the only key, FD1 violates the BCNF condition. Adecomposition of R that is not done according to the BCNF rules isR1(pNo, name) and R2(name, carNo). Both R1 and R2 are in BCNF,since they have only two attributes each.

With these attributes, it is obvious that the decomposition in R1 and R2

is stupid. If we instead decompose according to the BCNF rules, we getthe relations S1(pNo, name) and S2(pNo, carNo), which can be joinedon pNo.

Marcus Klang ([email protected]) Normalization 33 / 47

Page 34: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Other Normal Forms, Motivation

Sometimes, BCNF is “too strong”, in the sense that we may loseimportant FD’s if we decompose into BCNF.

Example: the relation Bookings(title, theater, city) describesthat a movie plays in a theater in a city. FD’s:

FD1. theater → cityFD2. title city → theater

FD1 says (unrealistically) that all theaters have different names. FD2 says(unrealistically) that two theaters in the same city never show the samemovie.

The keys are {title, city} and {theater, title}. So, FD1 is aBCNF violation. Attempt at decomposition on next slide.

Marcus Klang ([email protected]) Normalization 34 / 47

Page 35: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Other Normal Forms, cont’d

If we decompose the relation Bookings into BCNF we get the relations:

R1(theater, city)R2(theater, title)

These relations can be updated independently of each other. But when wedo that, we cannot check that FD2, title city → theater, holds. I.e.,when we join the relations (on theater) we may get tuples for which FD2does not hold.

Menlo ParkParkGuild Menlo Park

citytheater

The NetParkGuild The Net

titletheater

The Nettitle

The NetMenlo ParkParkGuild Menlo Park

citytheater

join (on theater)

Marcus Klang ([email protected]) Normalization 35 / 47

Page 36: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

3NF, Third Normal Form

Third normal form is a relaxation of BCNF. Definition:

Definition

A relation R is in third normal form if and only if: whenever there is anontrivial FD A1A2 . . .An → B for R, it is the case that A1,A2, . . . ,An isa superkey for R, or B is a member of some key.

The definition is the same as for BCNF with the addition “or B is amember of some key”.

In the relation Bookings(title, theater, city) this allows theBCNF-violating FD:

theater → city

Since city is a member of the key {title, city}, Bookings is in 3NF.

Marcus Klang ([email protected]) Normalization 36 / 47

Page 37: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Another Example

The following relation is not in BCNF:

Persons(persNo, name, street, postCode, city)

The problem is with the postal code. FD’s:

FD1. persNo → name street postCode cityFD2. postCode → cityFD3. street city → postCode

FD2 and FD3 are BCNF violations. Notice that the relation is not even in3NF, since neither city nor postCode is a member of a key.

Marcus Klang ([email protected]) Normalization 37 / 47

Page 38: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Decomposition 1

We can decompose the Persons relation into BCNF relations. If we startwith FD2, we get:

R1(postCode, city)R2(persNo, name, street, postCode)

Both R1 and R2 are in BCNF.

This decomposition eliminates the redundancy that the city has to bementioned several times for each postal code. And if we change the cityfor a postal code, we only have to perform the change in one place.

In the decomposition we have lost the possibility to check FD3, streetcity → postCode.

Marcus Klang ([email protected]) Normalization 38 / 47

Page 39: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Decomposition 1 Comments

In practice, you probably wouldn’t bother to decompose the Persons

relation:

How often are postal codes changed? Probably never or very seldom.

How costly is the extra storage requirement? Probably not very costly.

How often do you wish to check FD3? Probably never.

There is also an advantage to keeping the original relation. If youdecompose, you have to perform a join on two tables every time you wishto access a person’s full address.

Marcus Klang ([email protected]) Normalization 39 / 47

Page 40: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Decomposition 2

If we instead start decomposing with FD3 we get:

R1(street, city, postCode)R2(persNo, name, street, city)

R1 is in 3NF, R2 is in BCNF. We do not wish to decompose R1 further —we would need to join three tables in order to access a person’s fulladdress.

Conclusions:

Sometimes, unnormalized relations and the resulting redundancy areacceptable.

But you must be able to motivate why you choose to useunnormalized relations (and to be able to realize that you are usingthem . . . ).

Marcus Klang ([email protected]) Normalization 40 / 47

Page 41: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

More Redundancy

Some redundancy is not detected by the BCNF condition.Consider the relation StarMovie(name, street, city, title,

year): a star has a name, may have several addresses, and may star inseveral movies. Example instance:

1983C. Fisher Return of the Jedi5 Locust Ln. Malibu123 Maple St. 1983Return of the JediHollywoodC. Fisher

Empire Strikes BackC. Fisher 5 Locust Ln. 1980MalibuC. Fisher Empire Strikes BackHollywood123 Maple St. 1980

city

MalibuHollywood

street

5 Locust Ln.123 Maple St.

Star WarsC. Fisher 1977C. Fisher 1977Star Wars

yeartitlename

There is obvious redundancy in this relation, but the relation is still inBCNF (it has no FD’s at all).

Marcus Klang ([email protected]) Normalization 41 / 47

Page 42: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Common Sense Again

In the relation StarMovie we have put two many–many relationships inone relation:

streetcity

AddressnameStar0..*0..* title

year

Movie0..* 0..*

Naturally, this is not a reasonable thing to do, but it illustrates the needfor yet another normal form.

Marcus Klang ([email protected]) Normalization 42 / 47

Page 43: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Multivalued Dependencies

In the relation StarMovie there are no FD’s at all. For example, name →street city does not hold, since a star may have several addresses.However, for each star there is a well-defined set of addresses. Also, foreach star there is a well-defined set of movies. Furthermore, these sets areindependent of each other.

This is called a multivalued dependency (MVD) and is denoted (notethe two-headed arrows):

name � street cityname � title year

MVD’s always come in pairs.

Marcus Klang ([email protected]) Normalization 43 / 47

Page 44: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

A Formal Definition of MVD

Definition

In a relation R, the MVD

A1A2 . . .An � B1B2 . . .Bm

holds if: for each pair of tuples t and u of relation R that agree on all theA’s, we can find in R some tuple v that agrees:

1 With both t and u on the A’s,

2 With t on the B’s,

3 With u on all attributes of R that are not among the A’s or the B’s.

Note that v may be = t or = u.

a1 b1 c1

a1 b1 c2

a1 b2 c2

t

v

uMarcus Klang ([email protected]) Normalization 44 / 47

Page 45: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Reasoning About MVD’s

Some rules about MVD’s are similar to the FD rules.

Trivial dependencies have the same definition.

The transitive rule is the same.

The splitting rule for FD’s does not hold for MVD’s.

Every FD is a MVD.

Marcus Klang ([email protected]) Normalization 45 / 47

Page 46: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

4NF, Fourth Normal Form

Definition

A relation R is in fourth normal form if and only if: whenever there is anontrivial MVD A1A2 . . .An � B for R, it is the case that A1,A2, . . . ,An

is a superkey for R.

I.e., the same definition as BCNF but FD is changed to MVD.

The decomposition into 4NF is analogous to BCNF decomposition. Whenthe StarMovie relation is decomposed into 4NF relations, the relationsbecome:

R1(name, street, city)R2(name, title, year)

Marcus Klang ([email protected]) Normalization 46 / 47

Page 47: Marcus Klang (marcus.klang@cs.lth.se) Normalization 1/47fileadmin.cs.lth.se/cs/Education/EDAF20/2020/lectures/f6/... · 2020-02-05 · Normalization Anomalies Functional Dependencies

Normal Forms Hierarchy

Relations in 4NF ⊂ BCNF ⊂ 3NF (⊂ 2NF ⊂ 1NF).

Property 3NF BCNF 4NF

Eliminates redundancy due to FD’s Most Yes YesEliminates redundancy due to MVD’s No No YesPreserves FD’s Yes Maybe MaybePreserves MVD’s Maybe Maybe Maybe

You should aim for at least BCNF for all relations. In some cases, 3NF isacceptable.

Again: you have to know what normal form your relations are in, and toknow why if you choose a lower form than BCNF.

Marcus Klang ([email protected]) Normalization 47 / 47