chapter 8: top-down relational database design: normalization · database system concepts, 6th ed....
TRANSCRIPT
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Chapter 8: Top-down Relational
Database Design: NORMALIZATION
©Silberschatz, Korth and Sudarshan 8.2 Database System Concepts - 6th Edition
What can happen when we combine
relations/tables?
Suppose we combine the tables
instructor and department
This creates redundancy (repetition of
information):
©Silberschatz, Korth and Sudarshan 8.3 Database System Concepts - 6th Edition
What can happen when we combine
relations/tables?
Another problem: UPDATE
When any of the redundant info is changed, the changes
hav eto be applied to multiple tuples!
©Silberschatz, Korth and Sudarshan 8.4 Database System Concepts - 6th Edition
What can happen when we combine
relations/tables?
Another problem: INSERTION
When a new department is created, there are no
instructors associated with it yet → need to use NULL
values!
©Silberschatz, Korth and Sudarshan 8.5 Database System Concepts - 6th Edition
Is there any reason to combine
relations/tables?
Any query that involves a natural join
between department and instructor will
execute faster on the combined table!
This is generally preferred in data mining.
©Silberschatz, Korth and Sudarshan 8.6 Database System Concepts - 6th Edition
The “top-down” approach
In this chapter, we look at the problem in the
opposite direction:
Starting with a large table that contains many
columns and much redundant information, how can
we split (decompose) it into tables with fewer
columns and less redundancy?
©Silberschatz, Korth and Sudarshan 8.7 Database System Concepts - 6th Edition
Suppose we start with the table inst_dept. How would we
get the idea to decompose it into instructor and
department?
Naïve approach: spot redundancies in data … but it
doesn’t work for two reasons!
Top-down: Decomposition
©Silberschatz, Korth and Sudarshan 8.8 Database System Concepts - 6th Edition
Problem 1: It’s costly
Real-life DBs can have large amount of data (hundreds
of columns, hundreds of millions of rows)
Spotting redundancies requires consideration of
combinations of elements from a set that is already
large →
→ Combinatorial explosion (N-squared and worse)
Spotting redundancies in data
©Silberschatz, Korth and Sudarshan 8.9 Database System Concepts - 6th Edition
Problem 2: From data alone, it’s impossible to decide
whether a pattern discovered is coincidence or not
Is it the case that departments always reside in one
building and have a unique budget?
Spotting redundancies in data
©Silberschatz, Korth and Sudarshan 8.10 Database System Concepts - 6th Edition
Solution:
Examine not the data itself (a.k.a. syntax), but the meaning
of the data, a.k.a. the semantics!
The designer must be allowed to specify rules of the
enterprise, a.k.a. functional dependencies, e.g.
dept_name → building, budget
©Silberschatz, Korth and Sudarshan 8.11 Database System Concepts - 6th Edition
dept_name building, budget
What does it mean?
“If several rows have the same value for dept_name,
then they also have the same values for building and
budget.”
or
“If there were a schema (dept_name, building, budget),
then dept_name would be a candidate key.”
©Silberschatz, Korth and Sudarshan 8.12 Database System Concepts - 6th Edition
Since in our table inst_dept dept_name is not a candidate key,
the building and budget of a department may have to be repeated
along with dept_name.
This indicates the need to decompose inst_dept.
dept_name building, budget
“If there were a schema (dept_name, building, budget),
then dept_name would be a candidate key.”
©Silberschatz, Korth and Sudarshan 8.13 Database System Concepts - 6th Edition
This example also shows how functional dependencies (FD) are
different from keys: a FD captures a rule that is in general more
granular than a key.
A key is a FD, but a FD is not always a key!
dept_name building, budget
“If there were a schema (dept_name, building, budget),
then dept_name would be a candidate key.”
©Silberschatz, Korth and Sudarshan 8.14 Database System Concepts - 6th Edition
Not all decompositions are good!
Suppose we decompose
employee(ID, name, street, city, salary)
into
employee1 (ID, name)
employee2 (name, street, city, salary)
Problem: we cannot reconstruct the original employee relation!
©Silberschatz, Korth and Sudarshan 8.15 Database System Concepts - 6th Edition
A lossy decomposition
©Silberschatz, Korth and Sudarshan 8.16 Database System Concepts - 6th Edition
But there are also lossless decompositions!
Technically it’s called a lossless-join decomposition
Decomposition of R = (A, B, C)
R1 = (A, B) R2 = (B, C)
A B
1
2
A
B
1
2
r B,C(r)
A (r) B (r) A B
1
2
C
A
B
B
1
2
C
A
B
C
A
B
A,B(r)
©Silberschatz, Korth and Sudarshan 8.17 Database System Concepts - 6th Edition
How to avoid lossy decompositions?
©Silberschatz, Korth and Sudarshan 8.18 Database System Concepts - 6th Edition
Goal — Devise a theory for the following
Decide whether a particular relation R is in “good” form.
When the relation R is not in “good” form, decompose
it into a set of relations {R1, R2, ..., Rn} such that
each relation is in good form
the decomposition is a lossless-join decomposition
Our theory is based on dependencies:
functional dependencies
multivalued dependencies
The process outlined above is called NORMALIZATION
©Silberschatz, Korth and Sudarshan 8.19 Database System Concepts - 6th Edition
8.2 First Normal Form (1NF)
A domain is atomic if its elements are treated by the
DBMS as indivisible units.
Examples of non-atomic domains:
Names with first +middle + last
IDs that can be broken up into parts (e.g. CS401)
Phone numbers
Any composite attributes!
A relational schema R is in first normal form (1NF) if
the domains of all attributes of R are atomic.
For now, we assume all relations to be in 1NF (but see
Ch.22: Object-Based Databases)
©Silberschatz, Korth and Sudarshan 8.20 Database System Concepts - 6th Edition
1NF
Atomicity is actually a property of how the elements of the
domain are used!
Example: Students are given roll numbers which are
strings of the form CS0012 or EE1127
Strings would normally be considered indivisible …
… but if the first two characters are extracted to find
the dept., the domain of roll numbers is not atomic.
Doing so is a bad idea: leads to encoding of
information in the app. program rather than in the DB.
Why is this bad?
What should the DB designer do in this case?
©Silberschatz, Korth and Sudarshan 8.21 Database System Concepts - 6th Edition
8.3 Functional Dependencies (FD)
FDs are constraints on the set of legal relations.
Require that the value for a certain set of attributes
determine uniquely the value for another set of
attributes.
A FD is a generalization of the concept of key: A key
requires that the value for a certain set of attributes
determine uniquely the value for all remaining
attributes.
©Silberschatz, Korth and Sudarshan 8.22 Database System Concepts - 6th Edition
Functional Dependencies
Let R be a relation schema, and a, b two sets of attributes
R and R
The functional dependency
holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is,
t1[] = t2 [] t1[ ] = t2 [ ]
Example: Consider r(A,B ) with the following instance of r.
On this instance, A B does NOT hold, but B A does hold.
1 4
1 5
3 7
©Silberschatz, Korth and Sudarshan 8.23 Database System Concepts - 6th Edition
QUIZ: Functional Dependencies
Decide if the following FDs hold or not:
A B
B A
{A, C} D
{A, B, C} D
©Silberschatz, Korth and Sudarshan 8.24 Database System Concepts - 6th Edition
FD vs. key
K is a superkey for relation schema R if and only if K R
K is a candidate key for R if and only if
K R, and
for no K, R
FDs allow us to express constraints that cannot be expressed
using (super)keys. Consider the schema:
inst_dept (ID, name, salary, dept_name, building, budget )
We expect these FDs to hold:
dept_name building
ID building
but would not expect the following FD to hold:
dept_name salary
©Silberschatz, Korth and Sudarshan 8.25 Database System Concepts - 6th Edition
QUIZ: FD vs. key
Decide if the following are candidate keys of not:
A
B
{A, C}
{A, B, C}
{A, B, C, D}
D
©Silberschatz, Korth and Sudarshan 8.26 Database System Concepts - 6th Edition
Uses for FDs
Test relations to see if they are legal under a given set of
FDs.
If a relation r is legal under a set F of FDs, we say
that r satisfies F.
Specify constraints on the set of legal relations
We say that F holds on R if all legal relations on R
satisfy the set of FDs F.
Note: A specific instance of a relation schema may
satisfy a FD even if the FD does not hold on all legal
instances.
Example: a specific instance of instructor may, by
chance, satisfy
name ID.
©Silberschatz, Korth and Sudarshan 8.27 Database System Concepts - 6th Edition
Trivial FD
A functional dependency is trivial if it is satisfied by all
instances of a relation
Example:
ID, name ID
name name
In general, is trivial if
©Silberschatz, Korth and Sudarshan 8.28 Database System Concepts - 6th Edition
QUIZ: Trivial FDs
Give 4 examples of trivial FDs in this relation.
©Silberschatz, Korth and Sudarshan 8.29 Database System Concepts - 6th Edition
The Holy Grail:
Closure of a set of FDs
Given a set F of FDs, there are certain other
FDs that are logically implied by F.
For example: If A B and B C, then we can
infer that A C
The set of all FDs logically implied by F is the
closure of F.
We denote the closure of F by F+.
F+ is a superset of F.
©Silberschatz, Korth and Sudarshan 8.30 Database System Concepts - 6th Edition
8.3.2 Boyce-Codd Normal Form
is trivial (i.e., )
is a superkey for R
A relation schema R is in BCNF with respect to a set F of
FDs if for all FDs in F+ of the form
(where R and R), at least one of the following is
true:
©Silberschatz, Korth and Sudarshan 8.31 Database System Concepts - 6th Edition
QUIZ: BCNF
is trivial (i.e., )
is a superkey for R
at least one of the following holds:
Is this schema in BCNF?
instr_dept (ID, name, salary, dept_name, building, budget )
©Silberschatz, Korth and Sudarshan 8.32 Database System Concepts - 6th Edition
QUIZ: BCNF
is trivial (i.e., )
is a superkey for R
at least one of the following holds:
Is this schema in BCNF?
instr_dept (ID, name, salary, dept_name, building, budget )
No, because
dept_name building, budget
holds, but dept_name is not a superkey (Why?)
©Silberschatz, Korth and Sudarshan 8.33 Database System Concepts - 6th Edition
Extra-credit QUIZ: BCNF
EOL1/3
©Silberschatz, Korth and Sudarshan 8.34 Database System Concepts - 6th Edition
Quiz: What is the difference between keys and
functional dependencies (FD)?
©Silberschatz, Korth and Sudarshan 8.35 Database System Concepts - 6th Edition
A key is a FD, but a FD is not always a key!
Quiz: What is the difference between keys and
functional dependencies (FD)?
©Silberschatz, Korth and Sudarshan 8.36 Database System Concepts - 6th Edition
BCNF Decomposition
Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.
We decompose R into:
• (U )
• ( R - ( - ) )
©Silberschatz, Korth and Sudarshan 8.37 Database System Concepts - 6th Edition
Decomposing a Schema into BCNF
We decompose R into:
• (U )
• ( R - ( - ) )
In our example:
= dept_name
= building, budget
and inst_dept is replaced by
(U ) = ( dept_name, building, budget )
( R - ( - ) ) = ( ID, name, salary, dept_name )
©Silberschatz, Korth and Sudarshan 8.38 Database System Concepts - 6th Edition
QUIZ 1: BCNF
We decompose R into:
• (U )
• ( R - ( - ) )
Take = {A, B, C, D} = {C, D, E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H}
What is the decomposition?
©Silberschatz, Korth and Sudarshan 8.39 Database System Concepts - 6th Edition
QUIZ 2: BCNF
We decompose R into:
• (U )
• ( R - ( - ) )
Take = {A, B} = {E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H}
What is the decomposition?
©Silberschatz, Korth and Sudarshan 8.40 Database System Concepts - 6th Edition
QUIZ 3: BCNF
Is this relation in BCNF?
Hint: Rename the attributes A, B, C, ….
©Silberschatz, Korth and Sudarshan 8.41 Database System Concepts - 6th Edition
QUIZ 3: BCNF
A: Not BCNF, b/c both FDs are violations!
Decompose it to BCNF!
©Silberschatz, Korth and Sudarshan 8.42 Database System Concepts - 6th Edition
QUIZ 3: BCNF
Solution:
©Silberschatz, Korth and Sudarshan 8.43 Database System Concepts - 6th Edition
Dependency Preservation
Constraints, including FDs, are costly to check in
practice unless they pertain to only one relation.
If it is sufficient to test only those dependencies on each
individual relation of a decomposition in order to
ensure that all functional dependencies hold, then
that decomposition is dependency preserving.
©Silberschatz, Korth and Sudarshan 8.44 Database System Concepts - 6th Edition
BCNF and Dependency Preservation
ER model of a bank: A
customer can have
more than 1 personal
banker, but at most
one at any given
branch.
A ternary relationship-
set is needed:
©Silberschatz, Korth and Sudarshan 8.45 Database System Concepts - 6th Edition
BCNF and Dependency Preservation
Implementation:
R = cust_banker_branch = (customer_id, employee_id,
branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Is cust_banker_branch in BCNF?
©Silberschatz, Korth and Sudarshan 8.46 Database System Concepts - 6th Edition
BCNF and Dependency Preservation
Implementation:
R = cust_banker_branch = (customer_id, employee_id,
branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Apply the decomposition algorithm!
©Silberschatz, Korth and Sudarshan 8.47 Database System Concepts - 6th Edition
BCNF and Dependency Preservation
Implementation:
R = cust_banker_branch = (customer_id, employee_id,
branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Decomposition:
R1 = (employee_id, branch_name)
R2 = (customer_id, employee_id, type)
Problem: FD2 is now “spread” across
two relations!
©Silberschatz, Korth and Sudarshan 8.48 Database System Concepts - 6th Edition
BCNF and Dependency Preservation
Conclusion:
BCNF is not dependency preserving (in
general)
Because it is not always possible to achieve both
BCNF and dependency preservation, we consider a
weaker normal form …
©Silberschatz, Korth and Sudarshan 8.49 Database System Concepts - 6th Edition
Third Normal Form = 3NF
A relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Each attribute A in – is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF (since in BCNF one of the first two
conditions above must hold).
Third condition is a minimal relaxation of BCNF to ensure dependency
preservation.
©Silberschatz, Korth and Sudarshan 8.50 Database System Concepts - 6th Edition
SKIP all other 3NF theory!
The only facts about 3NF we cover are
those on the previous slide!
©Silberschatz, Korth and Sudarshan 8.51 Database System Concepts - 6th Edition
Whatever happened with 2NF?
In a nutshell, it forbids attributes to depend on parts of keys.
It is not of practical use anymore.
See Second normal form - Wikipedia, the free encyclopedia
for more details.
©Silberschatz, Korth and Sudarshan 8.52 Database System Concepts - 6th Edition
Review of Normal Forms
©Silberschatz, Korth and Sudarshan 8.53 Database System Concepts - 6th Edition
Updated list of Normalization Goals
Let R be a relation scheme with a set F of FDs:
Decide whether R is in “good” form.
If R is not in “good” form, decompose it into a set of relation schemes
{R1, R2, ..., Rn} such that :
each relation scheme is in good form
the decomposition is a lossless-join decomposition
Preferably, the decomposition should be dependency preserving.
©Silberschatz, Korth and Sudarshan 8.54 Database System Concepts - 6th Edition
8.4 Functional-Dependency Theory
This is the formal theory that tells us which
functional dependencies are implied logically
by a given set of functional dependencies.
©Silberschatz, Korth and Sudarshan 8.55 Database System Concepts - 6th Edition
Remember: Closure of a Set of FDs
Given a set F set of FDs, there are certain other
FDs that are logically implied by F.
E.g. transitivity: If A B and B C, then
also A C
The set of all functional dependencies logically
implied by F is the closure of F.
We denote the closure of F by F+.
©Silberschatz, Korth and Sudarshan 8.56 Database System Concepts - 6th Edition
Armstrong’s Axioms
We can find F+, the closure of F, by repeatedly applying
Armstrong’s Axioms:
if , then (reflexivity)
if , then (augmentation)
if , and , then (transitivity)
These rules are
sound (They generate only FDs that actually hold)
complete (They generate all FDs that hold).
©Silberschatz, Korth and Sudarshan 8.57 Database System Concepts - 6th Edition
QUIZ: Armstrong’s Axioms
Write Armstrong’s Axioms:
(reflexivity)
(augmentation)
(transitivity)
©Silberschatz, Korth and Sudarshan 8.58 Database System Concepts - 6th Edition
QUIZ: Armstrong’s Axioms
Write Armstrong’s Axioms:
if , then (reflexivity)
if , then (augmentation)
if , and , then (transitivity)
©Silberschatz, Korth and Sudarshan 8.59 Database System Concepts - 6th Edition
Examples of use of A’s Axioms
Given the following relation: R = (A, B, C, G, H, I)
and the set of FDs F = { A B
A C
CG H
CG I
B H}
Some other members of the closure F+ are:
A H
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG
and then transitivity with CG I
CG HI
by augmenting CG I to infer CG CGI,
and augmenting of CG H to infer CGI HI,
and then transitivity
©Silberschatz, Korth and Sudarshan 8.60 Database System Concepts - 6th Edition
Your turn!
if , then (reflexivity)
if , then (augmentation)
if , and , then (transitivity)
Prove that
if and only if
Double implication:
L-to-R and R-to-L!
EOL 2/3
©Silberschatz, Korth and Sudarshan 8.61 Database System Concepts - 6th Edition
QUIZ: BCNF and 3NF
Consider the following relation:
What non-trivial FDs exist?
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
©Silberschatz, Korth and Sudarshan 8.62 Database System Concepts - 6th Edition
QUIZ: BCNF and 3NF
F1: Person, Shop Type → Nearest Shop
F2: Nearest Shop → Shop Type
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
Is the relation in BCNF?
Let’s simplify the notation:
AB → C and C → B.
©Silberschatz, Korth and Sudarshan 8.63 Database System Concepts - 6th Edition
QUIZ: BCNF and 3NF
No, b/c C → B is a violation: C is not superkey.
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
Is the relation in 3NF?
Let’s simplify the notation:
AB → C and C → B.
©Silberschatz, Korth and Sudarshan 8.64 Database System Concepts - 6th Edition
QUIZ: BCNF and 3NF
No, b/c C → B is a violation: C is not superkey.
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
Remember: 3NF has the following condition in addition to BCNF:
Each attribute A in – is contained in a candidate key for R. (NOTE: each attribute may be in a different candidate key)
Let’s simplify the notation:
AB → C and C → B.
©Silberschatz, Korth and Sudarshan 8.65 Database System Concepts - 6th Edition
QUIZ: BCNF and 3NF
Do the BCNF decomposition
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
B is part of the candidate key AB.
This shows that C → B is not a 3NF violation, so the relation is
in 3NF!
Let’s simplify the notation:
AB → C and C → B.
©Silberschatz, Korth and Sudarshan 8.66 Database System Concepts - 6th Edition
QUIZ: BCNF and 3NF
Is the decomposition above dependency-preserving?
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
R1 = {B, C} R2 = {A, C}
Let’s simplify the notation:
AB → C and C → B.
©Silberschatz, Korth and Sudarshan 8.67 Database System Concepts - 6th Edition
More FD theorems, a.k.a. rules or
results
Exercise 8.26
Exercise 8.4
Exercise 8.5
©Silberschatz, Korth and Sudarshan 8.68 Database System Concepts - 6th Edition
Naïve Algorithm for Computing F+
Apply as many axioms and theorems to derive new FDs!
Can you find 3 more FDs in this manner?
Do you see a problem with this approach?
©Silberschatz, Korth and Sudarshan 8.69 Database System Concepts - 6th Edition
Extra-credit QUIZ
©Silberschatz, Korth and Sudarshan 8.70 Database System Concepts - 6th Edition
Algorithm for Computing F+
To compute the closure F+ of a set of FDs F:
Assign F+ = F
repeat
for each FD f in F+
apply reflexivity and augmentation rules on f
add the resulting FDs to F +
for each pair of FDs f1and f2 in F +
if f1 and f2 can be combined using transitivity,
add the resulting FD to F +
until F + does not change any further
©Silberschatz, Korth and Sudarshan 8.71 Database System Concepts - 6th Edition
QUIZ: Algorithm for Computing F+
To compute the closure F+ of a set of FDs F:
Assign F+ = F
repeat
for each FD f in F+
apply reflexivity and augmentation rules on f
add the resulting FDs to F +
for each pair of FDs f1and f2 in F +
if f1 and f2 can be combined using transitivity,
add the resulting FD to F +
until F + does not change any further
Apply the algorithm to R = {A, B, C}, with AB → C
©Silberschatz, Korth and Sudarshan 8.72 Database System Concepts - 6th Edition
Closure of Attribute Sets
Since computing the entire closure F+ is in general a formidable
task, we set ourselves first a more modest goal:
Given a set of attributes , define the closure of under F
(denoted by +) as the set of attributes that are functionally
determined by under F
Algorithm to compute +, the closure of under F
result := ;
while (changes to result) do
for each in F do
begin
if result then result := result
end
©Silberschatz, Korth and Sudarshan 8.73 Database System Concepts - 6th Edition
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A B A C CG H CG I B H}
(AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Is AG a candidate key?
1. Is AG a super key?
1. Does AG R? Yes, b/c (AG)+ = R.
2. Is any subset of AG a superkey?
1. Does A R? No, b/c (A)+ ≠ R
2. Does G R? No, b/c (G)+ ≠ R
Stop (Why?)
©Silberschatz, Korth and Sudarshan 8.74 Database System Concepts - 6th Edition
Uses of Attribute Closure Algorithm
Testing for superkey:
To test if is a superkey, we compute +, and
check if + contains all attributes of R
Testing if certain FDs hold:
To check if holds (is in F+), just check if
+
Another algorithm for computing closure F+ of F:
For each R, find the closure +
for each S +, we output the FD S
Still very expensive, but at least
we have a more systematic way
of doing it!
©Silberschatz, Korth and Sudarshan 8.75 Database System Concepts - 6th Edition
Uses of Attribute Closure Algorithm
Testing for superkey:
To test if is a superkey, we compute +, and
check if + contains all attributes of R
Testing if a certain FDs holds:
To check if holds (is in F+), just check if
+
Another algorithm for computing closure F+ of F:
For each R, find the closure +
for each S +, we output the FD S
©Silberschatz, Korth and Sudarshan 8.76 Database System Concepts - 6th Edition
QUIZ: Uses of Attribute Closure Alg.
Testing for superkey:
To test if is a superkey, we compute +, and
check if + contains all attributes of R
R = (A, B, C, D)
F = {A BC, B C, A B, AB C, BC → D}
Is AD a superkey?
Is AD a candidate key?
©Silberschatz, Korth and Sudarshan 8.77 Database System Concepts - 6th Edition
QUIZ: Uses of Attribute Closure Alg.
Testing if a certain FDs holds:
To check if holds (is in F+), just check if
+
R = (A, B, C, D)
F = {A BC, B C, A B, AB C, BC → D}
Does AC → D hold ?
©Silberschatz, Korth and Sudarshan 8.78 Database System Concepts - 6th Edition
Why is closure important for BCNF?
To check if a non-trivial dependency causes a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R, that is, it is a superkey of R.
Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.
If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.
However, simplified test using only F is incorrect when testing a relation in a decomposition of R
Consider R = (A, B, C, D, E), with F = { A B, BC D}
Decompose R into R1 = (A, B) and R2 = (A, C, D, E)
Neither of the dependencies in F contain only attributes from (A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF.
In fact, dependency AC D in F+ shows R2 is not in BCNF.
©Silberschatz, Korth and Sudarshan 8.79 Database System Concepts - 6th Edition
SKIP: Canonical Cover
and everything until …
©Silberschatz, Korth and Sudarshan 8.80 Database System Concepts - 6th Edition
8.8 Overall DB Design Process
We have assumed that the schema R is given, but how
does R appear in practice?
R can be generated when converting E-R diagram to a
set of tables.
R can be a single relation containing all attributes that are
of interest (called universal relation).
Normalization then breaks R into smaller relations.
R can be the result of some ad hoc design of relations,
which we then test/convert to normal form.
©Silberschatz, Korth and Sudarshan 8.81 Database System Concepts - 6th Edition
ER Model and Normalization
When an E-R diagram is carefully designed, identifying all entities
correctly, the tables generated from the E-R diagram should not need
further normalization.
However, in a real (imperfect) design, there can be:
FDs from non-key attributes of an entity set to other attributes of the
same entity set, e.g.:
employee entity with attributes including department_name and
building, and the FD department_name building
Good design would have made department a separate entity
FDs from non-key attributes of a relationship set to other …
It’s possible, but rare, since most relationships are binary.
©Silberschatz, Korth and Sudarshan 8.82 Database System Concepts - 6th Edition
Denormalization for Performance
May want to use non-normalized schema for performance
For example, displaying prereqs along with course_id, and title requires
join of course with prereq
Alternative 1: Use denormalized relation containing attributes of course
as well as prereq with all above attributes
faster lookup
extra space and extra execution time for updates
extra coding work for programmer and possibility of error in extra code
Alternative 2: use a materialized view defined as
course prereq
Benefits and drawbacks same as above, except no extra coding work
for programmer and avoids possible errors
©Silberschatz, Korth and Sudarshan 8.83 Database System Concepts - 6th Edition
Other Design Issues
Some aspects of DB design are not caught by normalization.
Examples of bad DB design, to be avoided: Instead of
earnings (company_id, year, amount ), use
Separate tables: earnings_2004, earnings_2005, earnings_2006, etc. All
these tables are in BCNF, but:
querying across years is difficult
a new table needs to be created each year
One table, but with a separate column for each year:
company_year (company_id, earnings_2004, earnings_2005, earnings_2006)
It’s also in BCNF, but also makes querying across years difficult and requires new
attribute each year.
Is an example of a crosstab, where values for one attribute become
column names
Used in spreadsheets, and other data analysis tools
©Silberschatz, Korth and Sudarshan 8.84 Database System Concepts - 6th Edition
SKIP
8.9 Modeling Temporal Data
©Silberschatz, Korth and Sudarshan 8.85 Database System Concepts - 6th Edition
Homework for Ch.8
8.4, 8.5
8.6 (Derive only 6 new FDs, using the
closure algorithm in Fig. 8.7)
8.26
8.29 except (d) – 3NF decomposition
©Silberschatz, Korth and Sudarshan 8.86 Database System Concepts - 6th Edition
The next slides are a collection of the
algorithms we need to know from this chapter
©Silberschatz, Korth and Sudarshan 8.87 Database System Concepts - 6th Edition
Algorithm for F+
Assign F+ = F
repeat
for each FD f in F+
apply reflexivity and augmentation rules on f
add the resulting FDs to F +
for each pair of FDs f1and f2 in F +
if f1 and f2 can be combined using transitivity,
add the resulting FD to F +
until F + does not change any further
©Silberschatz, Korth and Sudarshan 8.88 Database System Concepts - 6th Edition
Algorithm for +
result := ;
while (changes to result) do
for each in F do
begin
if result then result := result
end
©Silberschatz, Korth and Sudarshan 8.89 Database System Concepts - 6th Edition
Decomposing a Schema into BCNF
Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.
We decompose R into:
• (U )
• ( R - ( - ) )