schema refinement and normal formsefreidoc.fr/l3/bdd/cours/2011-12 : cours complet en...•...

Post on 22-Apr-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Schema Refinement and Normal Forms

V2.0 2011/09/09 Jeffrey D. Ullman – Jean-Michel Busca

Outline

1. Problem statment

• Redundancy, Anomalies

• Refinement Process, Properties

2. Functional Dependencies

3. Normal Forms

4. Decomposition Algorithm

1

Why Refine Schemas?

Problems:

• Sally's address is recorded multiple times

• the same for Miller's manufacturer

• Bob does not like beer: NULL values

2

name addr beer manf Sally 123 Maple Bud A.B. Sally 123 Maple Miller Pete's Tom 45 Bank Miller Pete's Bob 10 River NULL NULL

Why Refine Schemas? (2)

Consequences:

1. redundancy and NULL values waste space

in the database

2. redundancy can lead to several anomalies

when updating data

3

name addr beer manf Sally 10 Bank Bud A.B. Sally 123 Maple Miller Pete's Tom 45 Bank Miller Pete's Bob 10 River NULL NULL

Update Anomalies

If Sally moves:

• we need to change all her tuples

• otherwise an inconsistency is created

4

If Sally does not like Bud anymore:

• we loose track of Bud's manufacturer

(if Sally was the only one to like Bud)

name addr beer manf Sally 10 Bank Bud A.B. Sally 123 Maple Miller Pete's Tom 45 Bank Miller Pete's Bob 10 River NULL NULL

Delete Anomalies

5

If we now state that Bob likes Bud Lite:

• we need to know Bud Lite's manufacturer

(if Bob is the first to like Bud Lite)

name addr beer manf Sally 10 Bank Bud A.B. Sally 123 Maple Miller Pete's Tom 45 Bank Miller Pete's Bob 10 River Bud Lite ???

Insert Anomalies

6

Schema Refinement

The goal of schema refinement is to redesign the database schema so as to eliminate:

• redundancy and NULL values

• update, delete, insert anomalies

Schema refinement relies on the theory of Functional Dependencies.

7

Example: Functional Dependencies

Two functional dependencies here:

• name determines addr (name -> addr)

• beer determines manf (beer -> manf)

• ??? can be deduced from name/beer

• this is why data is redundant

8

name addr beer manf Sally 123 Maple Bud A.B. Sally ??? Miller Pete's Tom 45 Bank Miller ???

Refinement Process

How do we know refinement is needed?

• Redundancy arises when a relation schema forces an association between attributes that is not natural.

• Functional dependencies help identify such situations and suggest one or more refinements.

9

Refinement Process (2)

Redundant relations are then decomposed into smaller non-redundant ones.

• Non-redundant relations are projections of the original relation.

• Example: BadDesign(name, addr, beer, manuf) is decomposed into:

- Drinkers(name, addr)

- Beers(beer, manf)

- Likes(name, beer)

10

Decomposition Properties

There are two important properties of a decomposition:

1. Lossless Join : it should be possible to project the original relations onto the decomposed schema, and then reconstruct the original.

2. Dependency Preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied. 11

Outline

1. Problem statment

2. Functional Dependencies

• Definition, Axiomes, Closure Set

• Super Keys, Keys

3. Normal Forms

4. Decomposition Algorithm

12

13

Functional Dependencies

X ->Y reads "Y functionally depends on X" or "X determines Y"

X ->Y is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they also agree on all attributes of Y:

• For any tuples t1, t2 of R

• If t1[X] = t2[X] then t1[Y] = t2[Y]

• Say "X -> Y holds in R"

14

Functional Dependencies (2)

Conventions:

• X, Y, Z,… represent sets of attributes.

• A, B, C,… represent single attributes.

• no set formers in sets of attributes:

- just ABC, rather than {A,B,C }.

Armstrong's Axioms

Reflexivity

If Y ⊆ X then X -> Y

Augmentation

If X -> Y then XZ -> YZ

Transitivity

If X -> Y and Y -> Z then X -> Z

15

Armstrong's Axioms (2)

Derived properties:

Union

If X -> Y and X -> Z then X -> YZ

Decomposition

If X -> YZ then X -> Y and X -> Z

Pseudo-transitivity

If X -> Y and V,Y -> Z then VX -> Z

16

Elementary FD

X -> A is an elementary FD iif:

1. A is a single attribute

2. A does not belong to X

3. There is no X' X such that X' -> A

When looking for FDs, we are only concerned by elementary FDs.

17

18

Example: FD’s

Drinkers(name, addr, beersLiked, manf, favBeer)

Reasonable FD’s to assert:

1. name -> addr favBeer

Note this FD is the same as

name -> addr

name -> favBeer

2. beersLiked -> manf

19

Example: Possible Data

name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud

Because name -> addr Because name -> favBeer

Because beersLiked -> manf

20

Keys of Relations

FDs generalize the notion of key.

K is a superkey for relation R if

K determines all attributes of R.

K is a key for R if K is a superkey,

but no proper subset of K is a

superkey: K is a minimal superkey.

21

Example: Superkey

Drinkers(name, addr, beersLiked, manf, favBeer)

name, beersLiked is a superkey because together these attributes determine all the other attributes.

• name -> addr favBeer

• beersLiked -> manf

22

Example: Key

name, beersLiked is a key because neither name nor beersLiked is a superkey.

• name doesn’t -> manf

• beersLiked doesn’t -> addr

There are no other keys, but lots of superkeys.

• Any superset of name, beersLiked.

23

How to find relevant FDs?

1. Look for IDs:

• SSn is an ID for People.

• Thus SSn->name, SSn->firstName, etc.

2. Look for properties:

• "No two courses can meet in the same room at the same time."

• Thus: hour room -> course

24

How to find relevant FDs? (2)

3. Infer other elementary FDs using

Armstrong's axioms and properties.

• If A -> B and B -> C,

• Then A -> C also holds.

Outline

1. Problem statment

2. Functional Dependencies

3. Normal Forms

• Definition

• 1NF, 2NF, 3NF

4. Decomposition Algorithm

25

Normal Forms

A Normal Form defines the quality of a database schema with respect to redundancy.

Several Normal Forms have been proposed, based on FDs:

• 1NF, 2NF, 3NF

• Boyce-Codd NF, 4NF, 5NF, …

26

Normal Forms (2)

Note: Other NFs exist that are based on more sophisticated integrity constraints, such as multivalued dependencies and join dependencies.

NFs have increasingly restrictive requirements, i.e.

• a schema in 2NF is also in 1NF

• a schema in 3NF is also in 2NF

27

First Normal Form

A schema is 1FN if all attributes are atomic, i.e. neither compound or multi-valued.

Every relational schema is 1NF by definition.

28

Second Normal Form

A schema is in 2NF iif:

1. It is in 1NF

2. For every non-key attribute A, for every candidate key K, A depends on the whole key K rather than just a part of it.

Counter example:

R(A, B, C, D) with A -> C

R is not in 2NF

29

30

Example: 2NF?

Drinkers(name, addr, beersLiked, manf, favBeer)

FD’s: name->addr, name->favBeer, beersLiked->manf

The only key is name, beersLiked.

adr depends on a subset of the key.

Conclusion: Drinkers is not in 2NF and thus not in 3NF either.

Third Normal Form

A schema is in 3NF iif:

1. It is in 2NF

2. For every non-key attribute A, A does not depend on another non-key attribute.

Counter example:

R(A, B, C) with B -> C

R is not in 3NF

31

32

Example: 3NF?

Beers(name, manf, manfAddr) FD’s: name->manf, manf->manfAddr, name->manfAddr

Only key is name.

The key has only one attribute: Beers is in 2NF.

manf->manfAddr: there is a FD between two non-key attributes.

Conclusion: Beers is not in 3NF.

Outline

1. Problem statment

2. Functional Dependencies

3. Normal Forms

4. Decomposition Algorithm

33

2FN Decomposition Algorithm

Start: a relation R in 1NF

For every key subset X that determines non-key attributes A, B, C, …:

• create a new relation Rx with primary key X and A, B, C, … as non-key attributes

• remove attributes A, B, C, … from original relation R

34

3FN Decomposition Algorithm

Start: a relation R in 2NF

For every non-key attribute set X that determines non-key attributes A, B, C, …:

• create a new relation Rx with primary key X and A, B, C, … as non-key attributes

• remove attributes A, B, C, … from original relation R

35

3FN Decomposition Algorithm (2)

Note that Rx is not necessarily in 3NF.

If so, repeat the process on Rx.

36

top related