chapter 8: top-down relational database design: normalization · database system concepts, 6th ed....

89
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Chapter 8: Top-down Relational Database Design: NORMALIZATION

Upload: others

Post on 29-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

Database System Concepts, 6th Ed.

©Silberschatz, Korth and Sudarshan

See www.db-book.com for conditions on re-use

Chapter 8: Top-down Relational

Database Design: NORMALIZATION

Page 2: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.2 Database System Concepts - 6th Edition

What can happen when we combine

relations/tables?

Suppose we combine the tables

instructor and department

This creates redundancy (repetition of

information):

Page 3: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.3 Database System Concepts - 6th Edition

What can happen when we combine

relations/tables?

Another problem: UPDATE

When any of the redundant info is changed, the changes

hav eto be applied to multiple tuples!

Page 4: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.4 Database System Concepts - 6th Edition

What can happen when we combine

relations/tables?

Another problem: INSERTION

When a new department is created, there are no

instructors associated with it yet → need to use NULL

values!

Page 5: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.5 Database System Concepts - 6th Edition

Is there any reason to combine

relations/tables?

Any query that involves a natural join

between department and instructor will

execute faster on the combined table!

This is generally preferred in data mining.

Page 6: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.6 Database System Concepts - 6th Edition

The “top-down” approach

In this chapter, we look at the problem in the

opposite direction:

Starting with a large table that contains many

columns and much redundant information, how can

we split (decompose) it into tables with fewer

columns and less redundancy?

Page 7: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.7 Database System Concepts - 6th Edition

Suppose we start with the table inst_dept. How would we

get the idea to decompose it into instructor and

department?

Naïve approach: spot redundancies in data … but it

doesn’t work for two reasons!

Top-down: Decomposition

Page 8: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.8 Database System Concepts - 6th Edition

Problem 1: It’s costly

Real-life DBs can have large amount of data (hundreds

of columns, hundreds of millions of rows)

Spotting redundancies requires consideration of

combinations of elements from a set that is already

large →

→ Combinatorial explosion (N-squared and worse)

Spotting redundancies in data

Page 9: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.9 Database System Concepts - 6th Edition

Problem 2: From data alone, it’s impossible to decide

whether a pattern discovered is coincidence or not

Is it the case that departments always reside in one

building and have a unique budget?

Spotting redundancies in data

Page 10: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.10 Database System Concepts - 6th Edition

Solution:

Examine not the data itself (a.k.a. syntax), but the meaning

of the data, a.k.a. the semantics!

The designer must be allowed to specify rules of the

enterprise, a.k.a. functional dependencies, e.g.

dept_name → building, budget

Page 11: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.11 Database System Concepts - 6th Edition

dept_name building, budget

What does it mean?

“If several rows have the same value for dept_name,

then they also have the same values for building and

budget.”

or

“If there were a schema (dept_name, building, budget),

then dept_name would be a candidate key.”

Page 12: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.12 Database System Concepts - 6th Edition

Since in our table inst_dept dept_name is not a candidate key,

the building and budget of a department may have to be repeated

along with dept_name.

This indicates the need to decompose inst_dept.

dept_name building, budget

“If there were a schema (dept_name, building, budget),

then dept_name would be a candidate key.”

Page 13: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.13 Database System Concepts - 6th Edition

This example also shows how functional dependencies (FD) are

different from keys: a FD captures a rule that is in general more

granular than a key.

A key is a FD, but a FD is not always a key!

dept_name building, budget

“If there were a schema (dept_name, building, budget),

then dept_name would be a candidate key.”

Page 14: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.14 Database System Concepts - 6th Edition

Not all decompositions are good!

Suppose we decompose

employee(ID, name, street, city, salary)

into

employee1 (ID, name)

employee2 (name, street, city, salary)

Problem: we cannot reconstruct the original employee relation!

Page 15: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.15 Database System Concepts - 6th Edition

A lossy decomposition

Page 16: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.16 Database System Concepts - 6th Edition

But there are also lossless decompositions!

Technically it’s called a lossless-join decomposition

Decomposition of R = (A, B, C)

R1 = (A, B) R2 = (B, C)

A B

1

2

A

B

1

2

r B,C(r)

A (r) B (r) A B

1

2

C

A

B

B

1

2

C

A

B

C

A

B

A,B(r)

Page 17: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.17 Database System Concepts - 6th Edition

How to avoid lossy decompositions?

Page 18: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.18 Database System Concepts - 6th Edition

Goal — Devise a theory for the following

Decide whether a particular relation R is in “good” form.

When the relation R is not in “good” form, decompose

it into a set of relations {R1, R2, ..., Rn} such that

each relation is in good form

the decomposition is a lossless-join decomposition

Our theory is based on dependencies:

functional dependencies

multivalued dependencies

The process outlined above is called NORMALIZATION

Page 19: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.19 Database System Concepts - 6th Edition

8.2 First Normal Form (1NF)

A domain is atomic if its elements are treated by the

DBMS as indivisible units.

Examples of non-atomic domains:

Names with first +middle + last

IDs that can be broken up into parts (e.g. CS401)

Phone numbers

Any composite attributes!

A relational schema R is in first normal form (1NF) if

the domains of all attributes of R are atomic.

For now, we assume all relations to be in 1NF (but see

Ch.22: Object-Based Databases)

Page 20: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.20 Database System Concepts - 6th Edition

1NF

Atomicity is actually a property of how the elements of the

domain are used!

Example: Students are given roll numbers which are

strings of the form CS0012 or EE1127

Strings would normally be considered indivisible …

… but if the first two characters are extracted to find

the dept., the domain of roll numbers is not atomic.

Doing so is a bad idea: leads to encoding of

information in the app. program rather than in the DB.

Why is this bad?

What should the DB designer do in this case?

Page 21: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.21 Database System Concepts - 6th Edition

8.3 Functional Dependencies (FD)

FDs are constraints on the set of legal relations.

Require that the value for a certain set of attributes

determine uniquely the value for another set of

attributes.

A FD is a generalization of the concept of key: A key

requires that the value for a certain set of attributes

determine uniquely the value for all remaining

attributes.

Page 22: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.22 Database System Concepts - 6th Edition

Functional Dependencies

Let R be a relation schema, and a, b two sets of attributes

R and R

The functional dependency

holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is,

t1[] = t2 [] t1[ ] = t2 [ ]

Example: Consider r(A,B ) with the following instance of r.

On this instance, A B does NOT hold, but B A does hold.

1 4

1 5

3 7

Page 23: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.23 Database System Concepts - 6th Edition

QUIZ: Functional Dependencies

Decide if the following FDs hold or not:

A B

B A

{A, C} D

{A, B, C} D

Page 24: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.24 Database System Concepts - 6th Edition

FD vs. key

K is a superkey for relation schema R if and only if K R

K is a candidate key for R if and only if

K R, and

for no K, R

FDs allow us to express constraints that cannot be expressed

using (super)keys. Consider the schema:

inst_dept (ID, name, salary, dept_name, building, budget )

We expect these FDs to hold:

dept_name building

ID building

but would not expect the following FD to hold:

dept_name salary

Page 25: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.25 Database System Concepts - 6th Edition

QUIZ: FD vs. key

Decide if the following are candidate keys of not:

A

B

{A, C}

{A, B, C}

{A, B, C, D}

D

Page 26: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.26 Database System Concepts - 6th Edition

Uses for FDs

Test relations to see if they are legal under a given set of

FDs.

If a relation r is legal under a set F of FDs, we say

that r satisfies F.

Specify constraints on the set of legal relations

We say that F holds on R if all legal relations on R

satisfy the set of FDs F.

Note: A specific instance of a relation schema may

satisfy a FD even if the FD does not hold on all legal

instances.

Example: a specific instance of instructor may, by

chance, satisfy

name ID.

Page 27: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.27 Database System Concepts - 6th Edition

Trivial FD

A functional dependency is trivial if it is satisfied by all

instances of a relation

Example:

ID, name ID

name name

In general, is trivial if

Page 28: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.28 Database System Concepts - 6th Edition

QUIZ: Trivial FDs

Give 4 examples of trivial FDs in this relation.

Page 29: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.29 Database System Concepts - 6th Edition

The Holy Grail:

Closure of a set of FDs

Given a set F of FDs, there are certain other

FDs that are logically implied by F.

For example: If A B and B C, then we can

infer that A C

The set of all FDs logically implied by F is the

closure of F.

We denote the closure of F by F+.

F+ is a superset of F.

Page 30: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.30 Database System Concepts - 6th Edition

8.3.2 Boyce-Codd Normal Form

is trivial (i.e., )

is a superkey for R

A relation schema R is in BCNF with respect to a set F of

FDs if for all FDs in F+ of the form

(where R and R), at least one of the following is

true:

Page 31: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.31 Database System Concepts - 6th Edition

QUIZ: BCNF

is trivial (i.e., )

is a superkey for R

at least one of the following holds:

Is this schema in BCNF?

instr_dept (ID, name, salary, dept_name, building, budget )

Page 32: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.32 Database System Concepts - 6th Edition

QUIZ: BCNF

is trivial (i.e., )

is a superkey for R

at least one of the following holds:

Is this schema in BCNF?

instr_dept (ID, name, salary, dept_name, building, budget )

No, because

dept_name building, budget

holds, but dept_name is not a superkey (Why?)

Page 33: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.33 Database System Concepts - 6th Edition

Extra-credit QUIZ: BCNF

EOL1/3

Page 34: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.34 Database System Concepts - 6th Edition

Quiz: What is the difference between keys and

functional dependencies (FD)?

Page 35: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.35 Database System Concepts - 6th Edition

A key is a FD, but a FD is not always a key!

Quiz: What is the difference between keys and

functional dependencies (FD)?

Page 36: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.36 Database System Concepts - 6th Edition

BCNF Decomposition

Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.

We decompose R into:

• (U )

• ( R - ( - ) )

Page 37: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.37 Database System Concepts - 6th Edition

Decomposing a Schema into BCNF

We decompose R into:

• (U )

• ( R - ( - ) )

In our example:

= dept_name

= building, budget

and inst_dept is replaced by

(U ) = ( dept_name, building, budget )

( R - ( - ) ) = ( ID, name, salary, dept_name )

Page 38: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.38 Database System Concepts - 6th Edition

QUIZ 1: BCNF

We decompose R into:

• (U )

• ( R - ( - ) )

Take = {A, B, C, D} = {C, D, E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H}

What is the decomposition?

Page 39: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.39 Database System Concepts - 6th Edition

QUIZ 2: BCNF

We decompose R into:

• (U )

• ( R - ( - ) )

Take = {A, B} = {E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H}

What is the decomposition?

Page 40: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.40 Database System Concepts - 6th Edition

QUIZ 3: BCNF

Is this relation in BCNF?

Hint: Rename the attributes A, B, C, ….

Page 41: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.41 Database System Concepts - 6th Edition

QUIZ 3: BCNF

A: Not BCNF, b/c both FDs are violations!

Decompose it to BCNF!

Page 42: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.42 Database System Concepts - 6th Edition

QUIZ 3: BCNF

Solution:

Page 43: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.43 Database System Concepts - 6th Edition

Dependency Preservation

Constraints, including FDs, are costly to check in

practice unless they pertain to only one relation.

If it is sufficient to test only those dependencies on each

individual relation of a decomposition in order to

ensure that all functional dependencies hold, then

that decomposition is dependency preserving.

Page 44: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.44 Database System Concepts - 6th Edition

BCNF and Dependency Preservation

ER model of a bank: A

customer can have

more than 1 personal

banker, but at most

one at any given

branch.

A ternary relationship-

set is needed:

Page 45: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.45 Database System Concepts - 6th Edition

BCNF and Dependency Preservation

Implementation:

R = cust_banker_branch = (customer_id, employee_id,

branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Is cust_banker_branch in BCNF?

Page 46: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.46 Database System Concepts - 6th Edition

BCNF and Dependency Preservation

Implementation:

R = cust_banker_branch = (customer_id, employee_id,

branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Apply the decomposition algorithm!

Page 47: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.47 Database System Concepts - 6th Edition

BCNF and Dependency Preservation

Implementation:

R = cust_banker_branch = (customer_id, employee_id,

branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Decomposition:

R1 = (employee_id, branch_name)

R2 = (customer_id, employee_id, type)

Problem: FD2 is now “spread” across

two relations!

Page 48: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.48 Database System Concepts - 6th Edition

BCNF and Dependency Preservation

Conclusion:

BCNF is not dependency preserving (in

general)

Because it is not always possible to achieve both

BCNF and dependency preservation, we consider a

weaker normal form …

Page 49: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.49 Database System Concepts - 6th Edition

Third Normal Form = 3NF

A relation schema R is in third normal form (3NF) if for all:

in F+

at least one of the following holds:

is trivial (i.e., )

is a superkey for R

Each attribute A in – is contained in a candidate key for R.

(NOTE: each attribute may be in a different candidate key)

If a relation is in BCNF it is in 3NF (since in BCNF one of the first two

conditions above must hold).

Third condition is a minimal relaxation of BCNF to ensure dependency

preservation.

Page 50: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.50 Database System Concepts - 6th Edition

SKIP all other 3NF theory!

The only facts about 3NF we cover are

those on the previous slide!

Page 51: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.51 Database System Concepts - 6th Edition

Whatever happened with 2NF?

In a nutshell, it forbids attributes to depend on parts of keys.

It is not of practical use anymore.

See Second normal form - Wikipedia, the free encyclopedia

for more details.

Page 52: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.52 Database System Concepts - 6th Edition

Review of Normal Forms

Page 53: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.53 Database System Concepts - 6th Edition

Updated list of Normalization Goals

Let R be a relation scheme with a set F of FDs:

Decide whether R is in “good” form.

If R is not in “good” form, decompose it into a set of relation schemes

{R1, R2, ..., Rn} such that :

each relation scheme is in good form

the decomposition is a lossless-join decomposition

Preferably, the decomposition should be dependency preserving.

Page 54: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.54 Database System Concepts - 6th Edition

8.4 Functional-Dependency Theory

This is the formal theory that tells us which

functional dependencies are implied logically

by a given set of functional dependencies.

Page 55: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.55 Database System Concepts - 6th Edition

Remember: Closure of a Set of FDs

Given a set F set of FDs, there are certain other

FDs that are logically implied by F.

E.g. transitivity: If A B and B C, then

also A C

The set of all functional dependencies logically

implied by F is the closure of F.

We denote the closure of F by F+.

Page 56: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.56 Database System Concepts - 6th Edition

Armstrong’s Axioms

We can find F+, the closure of F, by repeatedly applying

Armstrong’s Axioms:

if , then (reflexivity)

if , then (augmentation)

if , and , then (transitivity)

These rules are

sound (They generate only FDs that actually hold)

complete (They generate all FDs that hold).

Page 57: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.57 Database System Concepts - 6th Edition

QUIZ: Armstrong’s Axioms

Write Armstrong’s Axioms:

(reflexivity)

(augmentation)

(transitivity)

Page 58: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.58 Database System Concepts - 6th Edition

QUIZ: Armstrong’s Axioms

Write Armstrong’s Axioms:

if , then (reflexivity)

if , then (augmentation)

if , and , then (transitivity)

Page 59: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.59 Database System Concepts - 6th Edition

Examples of use of A’s Axioms

Given the following relation: R = (A, B, C, G, H, I)

and the set of FDs F = { A B

A C

CG H

CG I

B H}

Some other members of the closure F+ are:

A H

by transitivity from A B and B H

AG I

by augmenting A C with G, to get AG CG

and then transitivity with CG I

CG HI

by augmenting CG I to infer CG CGI,

and augmenting of CG H to infer CGI HI,

and then transitivity

Page 60: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.60 Database System Concepts - 6th Edition

Your turn!

if , then (reflexivity)

if , then (augmentation)

if , and , then (transitivity)

Prove that

if and only if

Double implication:

L-to-R and R-to-L!

EOL 2/3

Page 61: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.61 Database System Concepts - 6th Edition

QUIZ: BCNF and 3NF

Consider the following relation:

What non-trivial FDs exist?

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

Page 62: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.62 Database System Concepts - 6th Edition

QUIZ: BCNF and 3NF

F1: Person, Shop Type → Nearest Shop

F2: Nearest Shop → Shop Type

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

Is the relation in BCNF?

Let’s simplify the notation:

AB → C and C → B.

Page 63: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.63 Database System Concepts - 6th Edition

QUIZ: BCNF and 3NF

No, b/c C → B is a violation: C is not superkey.

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

Is the relation in 3NF?

Let’s simplify the notation:

AB → C and C → B.

Page 64: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.64 Database System Concepts - 6th Edition

QUIZ: BCNF and 3NF

No, b/c C → B is a violation: C is not superkey.

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

Remember: 3NF has the following condition in addition to BCNF:

Each attribute A in – is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key)

Let’s simplify the notation:

AB → C and C → B.

Page 65: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.65 Database System Concepts - 6th Edition

QUIZ: BCNF and 3NF

Do the BCNF decomposition

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

B is part of the candidate key AB.

This shows that C → B is not a 3NF violation, so the relation is

in 3NF!

Let’s simplify the notation:

AB → C and C → B.

Page 66: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.66 Database System Concepts - 6th Edition

QUIZ: BCNF and 3NF

Is the decomposition above dependency-preserving?

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

R1 = {B, C} R2 = {A, C}

Let’s simplify the notation:

AB → C and C → B.

Page 67: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.67 Database System Concepts - 6th Edition

More FD theorems, a.k.a. rules or

results

Exercise 8.26

Exercise 8.4

Exercise 8.5

Page 68: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.68 Database System Concepts - 6th Edition

Naïve Algorithm for Computing F+

Apply as many axioms and theorems to derive new FDs!

Can you find 3 more FDs in this manner?

Do you see a problem with this approach?

Page 69: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.69 Database System Concepts - 6th Edition

Extra-credit QUIZ

Page 70: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.70 Database System Concepts - 6th Edition

Algorithm for Computing F+

To compute the closure F+ of a set of FDs F:

Assign F+ = F

repeat

for each FD f in F+

apply reflexivity and augmentation rules on f

add the resulting FDs to F +

for each pair of FDs f1and f2 in F +

if f1 and f2 can be combined using transitivity,

add the resulting FD to F +

until F + does not change any further

Page 71: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.71 Database System Concepts - 6th Edition

QUIZ: Algorithm for Computing F+

To compute the closure F+ of a set of FDs F:

Assign F+ = F

repeat

for each FD f in F+

apply reflexivity and augmentation rules on f

add the resulting FDs to F +

for each pair of FDs f1and f2 in F +

if f1 and f2 can be combined using transitivity,

add the resulting FD to F +

until F + does not change any further

Apply the algorithm to R = {A, B, C}, with AB → C

Page 72: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.72 Database System Concepts - 6th Edition

Closure of Attribute Sets

Since computing the entire closure F+ is in general a formidable

task, we set ourselves first a more modest goal:

Given a set of attributes , define the closure of under F

(denoted by +) as the set of attributes that are functionally

determined by under F

Algorithm to compute +, the closure of under F

result := ;

while (changes to result) do

for each in F do

begin

if result then result := result

end

Page 73: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.73 Database System Concepts - 6th Edition

Example of Attribute Set Closure

R = (A, B, C, G, H, I)

F = {A B A C CG H CG I B H}

(AG)+

1. result = AG

2. result = ABCG (A C and A B)

3. result = ABCGH (CG H and CG AGBC)

4. result = ABCGHI (CG I and CG AGBCH)

Is AG a candidate key?

1. Is AG a super key?

1. Does AG R? Yes, b/c (AG)+ = R.

2. Is any subset of AG a superkey?

1. Does A R? No, b/c (A)+ ≠ R

2. Does G R? No, b/c (G)+ ≠ R

Stop (Why?)

Page 74: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.74 Database System Concepts - 6th Edition

Uses of Attribute Closure Algorithm

Testing for superkey:

To test if is a superkey, we compute +, and

check if + contains all attributes of R

Testing if certain FDs hold:

To check if holds (is in F+), just check if

+

Another algorithm for computing closure F+ of F:

For each R, find the closure +

for each S +, we output the FD S

Still very expensive, but at least

we have a more systematic way

of doing it!

Page 75: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.75 Database System Concepts - 6th Edition

Uses of Attribute Closure Algorithm

Testing for superkey:

To test if is a superkey, we compute +, and

check if + contains all attributes of R

Testing if a certain FDs holds:

To check if holds (is in F+), just check if

+

Another algorithm for computing closure F+ of F:

For each R, find the closure +

for each S +, we output the FD S

Page 76: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.76 Database System Concepts - 6th Edition

QUIZ: Uses of Attribute Closure Alg.

Testing for superkey:

To test if is a superkey, we compute +, and

check if + contains all attributes of R

R = (A, B, C, D)

F = {A BC, B C, A B, AB C, BC → D}

Is AD a superkey?

Is AD a candidate key?

Page 77: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.77 Database System Concepts - 6th Edition

QUIZ: Uses of Attribute Closure Alg.

Testing if a certain FDs holds:

To check if holds (is in F+), just check if

+

R = (A, B, C, D)

F = {A BC, B C, A B, AB C, BC → D}

Does AC → D hold ?

Page 78: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.78 Database System Concepts - 6th Edition

Why is closure important for BCNF?

To check if a non-trivial dependency causes a violation of BCNF

1. compute + (the attribute closure of ), and

2. verify that it includes all attributes of R, that is, it is a superkey of R.

Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.

If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.

However, simplified test using only F is incorrect when testing a relation in a decomposition of R

Consider R = (A, B, C, D, E), with F = { A B, BC D}

Decompose R into R1 = (A, B) and R2 = (A, C, D, E)

Neither of the dependencies in F contain only attributes from (A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF.

In fact, dependency AC D in F+ shows R2 is not in BCNF.

Page 79: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.79 Database System Concepts - 6th Edition

SKIP: Canonical Cover

and everything until …

Page 80: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.80 Database System Concepts - 6th Edition

8.8 Overall DB Design Process

We have assumed that the schema R is given, but how

does R appear in practice?

R can be generated when converting E-R diagram to a

set of tables.

R can be a single relation containing all attributes that are

of interest (called universal relation).

Normalization then breaks R into smaller relations.

R can be the result of some ad hoc design of relations,

which we then test/convert to normal form.

Page 81: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.81 Database System Concepts - 6th Edition

ER Model and Normalization

When an E-R diagram is carefully designed, identifying all entities

correctly, the tables generated from the E-R diagram should not need

further normalization.

However, in a real (imperfect) design, there can be:

FDs from non-key attributes of an entity set to other attributes of the

same entity set, e.g.:

employee entity with attributes including department_name and

building, and the FD department_name building

Good design would have made department a separate entity

FDs from non-key attributes of a relationship set to other …

It’s possible, but rare, since most relationships are binary.

Page 82: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.82 Database System Concepts - 6th Edition

Denormalization for Performance

May want to use non-normalized schema for performance

For example, displaying prereqs along with course_id, and title requires

join of course with prereq

Alternative 1: Use denormalized relation containing attributes of course

as well as prereq with all above attributes

faster lookup

extra space and extra execution time for updates

extra coding work for programmer and possibility of error in extra code

Alternative 2: use a materialized view defined as

course prereq

Benefits and drawbacks same as above, except no extra coding work

for programmer and avoids possible errors

Page 83: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.83 Database System Concepts - 6th Edition

Other Design Issues

Some aspects of DB design are not caught by normalization.

Examples of bad DB design, to be avoided: Instead of

earnings (company_id, year, amount ), use

Separate tables: earnings_2004, earnings_2005, earnings_2006, etc. All

these tables are in BCNF, but:

querying across years is difficult

a new table needs to be created each year

One table, but with a separate column for each year:

company_year (company_id, earnings_2004, earnings_2005, earnings_2006)

It’s also in BCNF, but also makes querying across years difficult and requires new

attribute each year.

Is an example of a crosstab, where values for one attribute become

column names

Used in spreadsheets, and other data analysis tools

Page 84: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.84 Database System Concepts - 6th Edition

SKIP

8.9 Modeling Temporal Data

Page 85: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.85 Database System Concepts - 6th Edition

Homework for Ch.8

8.4, 8.5

8.6 (Derive only 6 new FDs, using the

closure algorithm in Fig. 8.7)

8.26

8.29 except (d) – 3NF decomposition

Page 86: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.86 Database System Concepts - 6th Edition

The next slides are a collection of the

algorithms we need to know from this chapter

Page 87: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.87 Database System Concepts - 6th Edition

Algorithm for F+

Assign F+ = F

repeat

for each FD f in F+

apply reflexivity and augmentation rules on f

add the resulting FDs to F +

for each pair of FDs f1and f2 in F +

if f1 and f2 can be combined using transitivity,

add the resulting FD to F +

until F + does not change any further

Page 88: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.88 Database System Concepts - 6th Edition

Algorithm for +

result := ;

while (changes to result) do

for each in F do

begin

if result then result := result

end

Page 89: Chapter 8: Top-down Relational Database Design: NORMALIZATION · Database System Concepts, 6th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-use Chapter 8: Top-down

©Silberschatz, Korth and Sudarshan 8.89 Database System Concepts - 6th Edition

Decomposing a Schema into BCNF

Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.

We decompose R into:

• (U )

• ( R - ( - ) )