lecture no 14 functional dependencies & normalization ( iii ) mar 04 th 2011 database systems
TRANSCRIPT
Lecture No 14Functional Dependencies &
Normalization ( III )Mar 04th 2011
Database SystemsDatabase Systems
Database Development Requirements Analysis
Collect and Analyze the requirements of the users. Conceptual Design
Design a conceptual model, e.g., ER model. Logical Design
Translate the ER model into the relational model. Normalization.
Database Building Build the database and write application programs.
Operation, Maintenance, & Tuning Use, maintain, and “tune” the database.
Normalization
Decides which attributes should be grouped together in a relation
Validates and improves the logical design to the point before proceeding to physical design.
Based on the concept of Functional dependency.
Good Database Design
no redundancy of FACT (!)
no inconsistency
no insertion, deletion or update anomalies
no information loss
no dependency loss
Normalization and Normal Forms
Normalization: Decompose a “bad” table into several “good” ones. Move from a lower normal form to a higher (better) one.
Normal Forms: First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) *Higher Normal Forms (BCNF, 4NF, 5NF)
In practice, 3NF is often good enough.
First Normal Form (1NF)
A table is in 1NF, if every row contains exactly one value for each attribute.
In other words, a table is in 1NF if it contains no multi-valued attribute.
By definition, any table must be in 1NF.
Second Normal Form (2NF)
A table is in 2NF, if it is in 1NF and every non-key attribute is irreducibly (fully) dependent
on the primary key.
2NF prohibits partial dependencies. If {A, B} is the primary key of a table, A -> C is a
partial dependency (i.e., a non-key attribute is dependent on part of the primary key.)
Example
Insert - cannot insert the fact that a supplier is located in a particular city until supplier supplies at least one part
Delete - delete the value P3, and we also lose the information that S3 is located in London
Update - S1 appears in the table more than once, and we must change all of them -- Possibility of producing an inconsistent result.
Status City P# Qty
20 London P1 1600
20 London P2 2000
S#
S1
S1
S2
S3
10
20
Paris
London
P2
P3
500
3000
Primary Key (Supplier#, Part#)
Example Note: City is a non-key field and only Note: City is a non-key field and only
dependent on part of the primary key(i.e. dependent on part of the primary key(i.e. supplier#). So this relation is not in 2NF.supplier#). So this relation is not in 2NF.
By analyzing a given relation, we can identify some dependencies between its attributes.
In principle, we want to make attributes independent of each other as far as possible, so that updating one attribute will have no impact on the others.
Supplier#
Part# Status
City
Quantity
Example
2NF Example
FIRST {S#, STATUS, CITY, P#, QTY} Primary key: {S#, P#} {S#} -> {CITY}, {CITY} -> {STATUS}
Test of 2NF {S#} -> {STATUS, CITY}: partial dependencies. FIRST is in 1NF, but not in 2NF.
Decomposition: SECOND {S#, STATUS, CITY} SP {S# (FK: references SECOND), P#, QTY}
2NF-Solution to Problems
The solution is to replace the relation by two new relations (called One and Two)
Status City
20 London
10 Paris
Supplier#
S1
S2
S3 20 London
One: Primary Key (Supplier#)
Part# Quantity
P1 1600
P2 2000
Supplier#
S1
S1
S2
S3
P2
P3
500
3000
Two: Primary Key (Supplier#, Part#)
This revised structure overcomes all the problems sketched earlier, but:This revised structure overcomes all the problems sketched earlier, but:• Insert Insert - cannot insert the fact that a particular city has a status until a - cannot insert the fact that a particular city has a status until a
supplier is actually located in that city.supplier is actually located in that city.• DeleteDelete - delete the value S2, and we also lose the information that - delete the value S2, and we also lose the information that
Paris has the status value 10.Paris has the status value 10.• UpdateUpdate - Status value 20 appears in the table more than once. - Status value 20 appears in the table more than once.
Third Normal Form (3NF)
A table is in 3NF if and only if it is in 2NF and every non-key attribute is non-transitively dependent
on the primary key.
3NF prohibits transitive dependencies.
If A -> B and B -> C, C is transitively dependent on A.
3NF Example
SECOND {S#, STATUS, CITY} Primary key: {S#} {S#} -> {CITY}, {CITY} -> {STATUS}
Test of 3NF: {S#} -> {CITY} -> {STATUS}: transitive dependency SECOND is in 2NF, but not 3NF.
Decomposition: CS {CITY, STATUS} SC {S#, CITY (FK: references CS)}
Third Normal Form (3NF)
A relation is in 3NF if and only if it is 2NF and all non-key fields are mutually independent.
Status City
20 London
10 Paris
Supplier#
S1
S2
S3 20 London
One: Primary Key (Supplier#)
Part# Quantity
P1 1600
P2 2000
Supplier#
S1
S1
S2
S3
P2
P3
500
3000
Two: Primary Key (Supplier#, Part#)
Note: Note: Relation Two is in 3NF (with only one non-key field), Relation Two is in 3NF (with only one non-key field), but relation One is not in 3NF because Status is dependent but relation One is not in 3NF because Status is dependent on City (both are non-key fields)on City (both are non-key fields)
3NF - Converting 2NF into 3NF
We can now normalize relation One into two new relations (called three and four), which both are in 3NF.
City
London
Paris
Supplier#
S1
S2
S3 London
Three: Primary Key (Supplier#)
Status
20
10
City
London
Paris
Four: Primary Key (City)
Higher Normal Forms (e.g. 4NF and 5NF) do exist, Higher Normal Forms (e.g. 4NF and 5NF) do exist, but they are mainly of interest in academic societies but they are mainly of interest in academic societies rather than in the practical applications of database rather than in the practical applications of database design.design.
Relationships of Normal Forms
1NF
2NF
3NF/BCNF
4NF
5NF
DKNF
*Higher Normal Forms
Boyce/Codd Normal Form (BCNF) A more restrictive 3NF.
Fourth normal forms Concerned with Multi-valued dependencies
Fifth normal forms Concerned with Join dependencies.
Other normal forms Domain-key normal form “Restriction-union” normal form
In practice, 3NF is often good enough.
How to Decompose a Table
Let R{A, B, C} be a table, where A, B, and C are sets of attributes. If R satisfies the FD A -> B, then We can decompose R on {A, B} and {A, C} Heath’s Theorem
Why do we keep A in both tables?
Strategy: Move a problematic (partial or transitive) FD into a separate table. Keep common attributes so we can join the tables back.
Decomposition Example FIRST {S#, STATUS, CITY, P#, QTY}
Primary key: {S#, P#} {S#} -> {CITY}, {CITY} -> {STATUS}
Decompose FIRST into: SECOND {S#, STATUS, CITY} SP {S# (FK: references SECOND), P#, QTY}
Decompose SECOND into: CS {CITY, STATUS} SC {S#, CITY (FK: references CS)}
C. J. Date 6C. J. Date 6thth Edition Ch.10 page (292 Edition Ch.10 page (292305)305)
Exercise: Rescue a "Bad" design
Original Design: SPDB {S#, Sname, City, Status, P#, Pname, Weight,
QTY} S# -> Sname, City, Status City -> Status P# -> Pname, Weight S#, P# -> QTY
Problems: data redundancies data anomalies
Exercise: Check 2NF
SPDB is in 1NF. There are partial dependencies:
S# -> Sname, City, Status P# -> Pname, Weight
So SPDB is not in 2NF.
Decomposition: S {S#, Sname, City, Status} P {P#, Pname, Weight} SP {S# (references S), P# (references P), QTY}
Exercise: Check 3NF
In S {S#, Sname, City, Status}: S# -> City -> Status: transitive dependency. So S is not in 3NF.
Decomposition: C {City, Status} S {S#, Sname, City (references C)}
Now all the tables, C, S, P, and SP, are in 3NF.
Example 2: University Database Table
StdSSN StdClass StdCity OfferNo OffYear Offterm EnrGrade CourseNo CrsDesc
S1 JUN ISB O1 2003 1 3.5 C1 DB S1 JUN ISB O2 2003 2 3.3 C2 VB S2 JUN ISB O3 2003 1 3.1 C3 OO S2 JUN ISB O2 2003 1 3.4 C2 VB
StdSSN StdCity, StdClass
OfferNo OffTerm, OffYear, CourseNo, CrsDesc
CourseNo CrsDesc
StdSSN, OfferNo EnrGrade
Problems
Anomalies: - PK: combination of StdSSN and OfferNo - Insert: cannot insert a new student without enrolling in an
offering (OfferNo part of PK) - Update: change a course description; change every
enrollment of the course - Delete: remove third row; lose information about course C3
Table has obvious redundancies Easier to query: no joins More difficult to change: can work around problems
(dummy PK) but tedious to do
FD Diagrams and Lists
StdSSN StdCity StdClass OfferNo OffTerm OffYear EnrGradeCourseNo CrsDesc
StdSSN StdCity, StdClass
OfferNo OffTerm, OffYear, CourseNo, CrsDesc
CourseNo CrsDesc
StdSSN, OfferNo EnrGrade
Normalization
Process of removing unwanted redundancies
Apply normal forms
Identify FDs Determine whether FDs meet normal form Split the table to meet the normal form if there is a violation
1NF
No repeating groups: flat rows
StdSSN StdClass OfferNo OffYear EnrGrade CourseNo CrsDesc
S1 JUN O1 2003 3.5 C1 DB O2 2003 3.3 C2 VB S2 JUN O3 2003 3.1 C3 OO O2 2003 3.4 C2 VB
Combined Definition of 2NF/3NF
Key column: candidate key or part of candidate key
Analogy to the traditional justice oath Every non key depends on a key, the whole
key, and nothing but the key Usually taught as separate definitions
2NF
Every nonkey column depends on a whole key, not part of a key
Violations Part of key nonkey Violations only for combined keys
2NF Example
Many violations for the big university database table StdSSN StdCity, StdClass OfferNo OffTerm, OffYear, CourseNo,
CrsDesc Splitting the table
UnivTable1 (StdSSN, StdCity, StdClass) UnivTable2 (OfferNo, OffTerm, OffYear,
CourseNo, CrsDesc)
3NF
Every nonkey column depends only on a key not on non key columns
Violations: Nonkey Nonkey Alterative formulation
No transitive FDs A B, B C then A C OfferNo CourseNo, CourseNo CrsDesc then
OfferNo CrsDesc
3NF Example
One violation in UnivTable2 CourseNo CrsDesc
Splitting the table UnivTable2-1 (OfferNo, OffTerm, OffYear,
CourseNo) UnivTable2-2 (CourseNo, CrsDesc)
Example 2