lecture 3 functional dependency and normal forms prof. sin-min lee department of computer science
TRANSCRIPT
Lecture 3 Functional Dependency and Lecture 3 Functional Dependency and Normal FormsNormal Forms
Prof. Sin-Min Lee
Department of Computer Science
©Silberschatz, Korth and Sudarshan3.2Database System Concepts
Database Design ProcessDatabase Design Process
ConceptualModel
LogicalModel
External Model
Conceptual requirements
Conceptual requirements
Conceptual requirements
Conceptual requirements
Application 1
Application 1
Application 2 Application 3 Application 4
Application 2
Application 3
Application 4
External Model
External Model
External Model
Internal Model
©Silberschatz, Korth and Sudarshan3.3Database System Concepts
Relational Database ModelRelational Database Model
Relations
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.4Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.5Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.6Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.7Database System Concepts
Georelational Database ModelGeorelational Database Model
©Silberschatz, Korth and Sudarshan3.8Database System Concepts
Attribute Relationships
Functional Dependency:refers to the relationships between attributes within a relation.
If the value of attribute A determines the value ofattribute B, then attribute B is functionally dependentupon attribute A.
©Silberschatz, Korth and Sudarshan3.9Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.10Database System Concepts
Functional DependenciesFunctional Dependencies
X -> Y means:
X functionally determines Y
Y depends on X
Values of Y component depend on, determined by values of X component
©Silberschatz, Korth and Sudarshan3.11Database System Concepts
Functional DependenciesFunctional Dependencies
Given t1 and t2:
if t1[X] = t2 [X] then t1[Y] = t2 [Y] (1)
In other words if the values of X are equal, then Y value are equal
Values of X component uniquely (functionally) determine values of Y component iff (1)
©Silberschatz, Korth and Sudarshan3.12Database System Concepts
Data NormalizationData Normalization
Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data.
The process of decomposing relations with anomalies to produce smaller, well-structured relations.
Primary Objective: Reduce Redundancy,Reduce nulls,
Improve “modify” activities: insert,
update,
delete,
but not read
Price: degraded query, display, reporting
©Silberschatz, Korth and Sudarshan3.13Database System Concepts
Normal FormsNormal Forms
First Normal Form (1NF)
Second Normal Form (2NF)
Third Normal Form (3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)
©Silberschatz, Korth and Sudarshan3.14Database System Concepts
NormalizationNormalization
Boyce-Codd and
Higher
Functional dependencyof nonkey attributes on the primary key - Atomic values only
Full Functional dependencyof nonkey attributes on the primary key
No transitive dependency between nonkey attributes
All determinants are candidate keys - Single multivalued dependency
©Silberschatz, Korth and Sudarshan3.15Database System Concepts
Unnormalized RelationsUnnormalized Relations
First step in normalization is to convert the data into a two-dimensional table
In unnormalized relations data can repeat within a column
©Silberschatz, Korth and Sudarshan3.16Database System Concepts
Unnormalized RelationUnnormalized Relation
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drugDrug side effects
1111145 311
Jan 1, 1995; June 12, 1995 John White
15 New St. New York, NY
Beth Little Michael Diamond
Gallstones removal; Kidney stones removal
Penicillin, none-
rash none
1234243 467
Apr 5, 1994 May 10, 1995 Mary Jones
10 Main St. Rye, NY
Charles Field Patricia Gold
Eye Cataract removal Thrombosis removal
Tetracycline none
Fever none
2345 189Jan 8, 1996 Charles Brown
Dogwood Lane Harrison, NY
David Rosen
Open Heart Surgery
Cephalosporin none
4876 145Nov 5, 1995 Hal Kane
55 Boston Post Road, Chester, CN Beth Little
Cholecystectomy Demicillin none
5123 145May 10, 1995 Paul Kosher
Blind Brook Mamaroneck, NY Beth Little
Gallstones Removal none none
6845 243
Apr 5, 1994 Dec 15, 1984 Ann Hood
Hilton Road Larchmont, NY
Charles Field
Eye Cornea Replacement Eye cataract removal
Tetracycline Fever
©Silberschatz, Korth and Sudarshan3.17Database System Concepts
First Normal FormFirst Normal Form
To move to First Normal Form a relation must contain only atomic values at each row and column. No repeating groups
A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation.
©Silberschatz, Korth and Sudarshan3.18Database System Concepts
First Normal FormFirst Normal Form
Patient # Surgeon #Surgery DatePatient NamePatient AddrSurgeon Name Surgery Drug adminSide Effects
1111 145 01-Jan-95 John White
15 New St. New York, NY Beth Little
Gallstones removal Penicillin rash
1111 311 12-Jun-95 John White
15 New St. New York, NY
Michael Diamond
Kidney stones removal none none
1234 243 05-Apr-94 Mary Jones10 Main St. Rye, NY Charles Field
Eye Cataract removal
Tetracycline Fever
1234 467 10-May-95 Mary Jones10 Main St. Rye, NY Patricia Gold
Thrombosis removal none none
2345 189 08-Jan-96Charles Brown
Dogwood Lane Harrison, NY David Rosen
Open Heart Surgery
Cephalosporin none
4876 145 05-Nov-95 Hal Kane
55 Boston Post Road, Chester, CN Beth Little
Cholecystectomy Demicillin none
5123 145 10-May-95 Paul Kosher
Blind Brook Mamaroneck, NY Beth Little
Gallstones Removal none none
6845 243 05-Apr-94 Ann Hood
Hilton Road Larchmont, NY Charles Field
Eye Cornea Replacement
Tetracycline Fever
6845 243 15-Dec-84 Ann Hood
Hilton Road Larchmont, NY Charles Field
Eye cataract removal none none
©Silberschatz, Korth and Sudarshan3.19Database System Concepts
Second Normal FormSecond Normal Form
A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key. That is, every nonkey attribute needs the full primary key for unique
identification
©Silberschatz, Korth and Sudarshan3.20Database System Concepts
Second Normal FormSecond Normal Form
Patient # Patient Name Patient Address
1111 John White15 New St. New York, NY
1234 Mary Jones10 Main St. Rye, NY
2345Charles Brown
Dogwood Lane Harrison, NY
4876 Hal Kane55 Boston Post Road, Chester,
5123 Paul KosherBlind Brook Mamaroneck, NY
6845 Ann HoodHilton Road Larchmont, NY
©Silberschatz, Korth and Sudarshan3.21Database System Concepts
Second Normal FormSecond Normal Form
Surgeon # Surgeon Name
145 Beth Little
189 David Rosen
243 Charles Field
311 Michael Diamond
467 Patricia Gold
©Silberschatz, Korth and Sudarshan3.22Database System Concepts
Second Normal FormSecond Normal Form
Patient # Surgeon # Surgery Date Surgery Drug Admin Side Effects
1111 145 01-Jan-95Gallstones removal Penicillin rash
1111 311 12-Jun-95
Kidney stones removal none none
1234 243 05-Apr-94Eye Cataract removal Tetracycline Fever
1234 467 10-May-95Thrombosis removal none none
2345 189 08-Jan-96Open Heart Surgery
Cephalosporin none
4876 145 05-Nov-95Cholecystectomy Demicillin none
5123 145 10-May-95Gallstones Removal none none
6845 243 15-Dec-84Eye cataract removal none none
6845 243 05-Apr-94Eye Cornea Replacement Tetracycline Fever
©Silberschatz, Korth and Sudarshan3.23Database System Concepts
Third Normal FormThird Normal Form
A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes When one nonkey attribute can be determined with one or more
nonkey attributes there is said to be a transitive functional dependency.
The side effect column in the Surgery table is determined by the drug administered Side effect is transitively functionally dependent on drug so Surgery
is not 3NF
©Silberschatz, Korth and Sudarshan3.24Database System Concepts
Third Normal FormThird Normal Form
Patient # Surgeon # Surgery Date Surgery Drug Admin
1111 145 01-Jan-95 Gallstones removal Penicillin
1111 311 12-Jun-95Kidney stones removal none
1234 243 05-Apr-94 Eye Cataract removal Tetracycline
1234 467 10-May-95 Thrombosis removal none
2345 189 08-Jan-96 Open Heart Surgery Cephalosporin
4876 145 05-Nov-95 Cholecystectomy Demicillin
5123 145 10-May-95 Gallstones Removal none
6845 243 15-Dec-84 Eye cataract removal none
6845 243 05-Apr-94Eye Cornea Replacement Tetracycline
©Silberschatz, Korth and Sudarshan3.25Database System Concepts
Third Normal FormThird Normal Form
Drug Admin Side Effects
Cephalosporin none
Demicillin none
none none
Penicillin rash
Tetracycline Fever
©Silberschatz, Korth and Sudarshan3.26Database System Concepts
Functional Dependency and KeysFunctional Dependency and Keys
Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute.
Candidate Key: Each non-key field is functionally dependent on every candidate key.
©Silberschatz, Korth and Sudarshan3.27Database System Concepts
Steps in NormalizationSteps in Normalization
©Silberschatz, Korth and Sudarshan3.28Database System Concepts
Normalization – most usedNormalization – most used
Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF).
Based on functional dependencies among the attributes of a relation.
A relation can be normalized to a specific form to prevent possible occurrence of update anomalies.
©Silberschatz, Korth and Sudarshan3.29Database System Concepts
First Normal FormFirst Normal Form
No multi-valued attributes.
Every attribute value is atomic.
Why are the following tables not in 1NF
Employee (ssn, Name, Salary, Address, ListOfSkills)
Department (Did, Dname, ssn)
©Silberschatz, Korth and Sudarshan3.30Database System Concepts
Second Normal FormSecond Normal Form
1NF and every non-key attribute is fully functionally dependent on the primary key.
Every non-key attribute must be defined by the entire key, not by only part of the key.
No partial functional dependencies.
Assuming that we have a composite PK (LicensePlate, OwnerSSN) for the Vechicle
Table below, why is the table not in 2NF ?
Vehicle (LicensePlate, Brand, Model, PurchasePrice, Year, OwnerSSN, OwnerName)
©Silberschatz, Korth and Sudarshan3.31Database System Concepts
Third Normal Form & Third Normal Form & BCNFBCNF
2NF and no transitive dependencies (functional dependency between non-key attributes = BCNF)
Why are the following tables not in 3NF or BCNF ? Why is Employee [ssn, name, salary, did, dname]
Customer
©Silberschatz, Korth and Sudarshan3.32Database System Concepts
3NF & BCNF3NF & BCNF
It is very rare for a Table to be in 3NF and not be in BCNF (violation of BCNF).
Given a Relation R with attributes A, B and C where A and B are together the composite PK,
IF A, B -> C and C -> B
THEN R is in 3NF and is not in BCNF
Example: Student, course -> Instructor
Instructor -> Course
©Silberschatz, Korth and Sudarshan3.33Database System Concepts
Steps in NormalizationSteps in Normalization
1NF: a table, without multivalued attributes if not, then decompose
2NF: 1NF and every non-key attribute is fully functionally dependent on the primary key if not, then decompose
3NF: 2NF and no transitive dependencies if not, then decompose
GENERAL: Each table should describe a single theme
Modification anomalies are minimized
Hint: THE KEY, THE WHOLE KEY AND NOTHING BUT THE KEY
©Silberschatz, Korth and Sudarshan3.34Database System Concepts
EXAMPLE - OBTAIN CANDIDATE KEYSConsider the following scheme from an airline database system:( P (pilot) , F (flight# ), D (date), T (scheduled time to depart) )We have the following FD's :
F ----> T PDT ----> F FD ----> PProvide some superkeys:
PDT is a superkey, and FD is a superkey. Is PDT a candidate key?
PD is not a superkey, nor is DT, nor is PT. So, PDT is a candidate key.
FD is also a candidate key, since neither F or D are superkeys.
©Silberschatz, Korth and Sudarshan3.35Database System Concepts
CLOSURE OF A SET OF FD'S If F is a set of functional dependencies for a relation R, the set of all functional dependencies that can be derived from F, denoted by F+, is called the CLOSURE of F.
We can use Armstrong's axioms, and the 3 derived rules, to compute the closure of F, F+.
©Silberschatz, Korth and Sudarshan3.36Database System Concepts
WORKING TO GET THE CLOSURE F+
GIVEN: scheme (A, B, C, G, H, I)GIVEN: FD set (A--->B, A--->C, CG--->H,
CG--->I, B--->H) Some members of F+ are
A--->H {Transitivity Rule applied to A--->B and B--->H)
CG--->HI {Union Rule applied to CG--->H and CG--->I}
AG--->I {By Augmentation Rule, AG--->CG; then Transitivity}
©Silberschatz, Korth and Sudarshan3.37Database System Concepts
THE CLOSURE OF A SET OF ATTRIBUTES
GIVEN: FD set F and a given attribute A (or set of attributes A)
FIND : The set of attributes functionally dependent on A, called the closure of A, and denoted by A+
IMPORTANT USE FOR THIS: To determine if A is a superkey, we compute A+, the set of attributes functionally dependent on A. If A+ consists of ALL the attributes in the relation, then A is a superkey
HOW DO WE FIND A+? The following algorithm does the trick!
©Silberschatz, Korth and Sudarshan3.38Database System Concepts
ALGORITHM TO FIND THE CLOSURE OF ATTRIBUTE A, DENOTED BY A+
result := A;while { result changes }
for each functional dependency B--->C begin
if B is contained in result, then result := result U C ' end
endwhile A+ := result
©Silberschatz, Korth and Sudarshan3.39Database System Concepts
EXAMPLE TO FIND THE CLOSURE A+ OF AN ATTRIBUTE A
GIVEN: Relation R with attributes W, X, Y, Z and FD's W ---> Z YZ ---> X WZ ---> Y
FIND : WZ+PSEUDO TRACE OF THE ALGORITHM:
result := WZfrom first 2 FD's, no change to "result"from WZ ---> Y, since WZ is contained in result, we
get result := WZYsince YZ is contained in result, we get result :=
WZYXThus, every attribute in R is in WZ+, so WZ is a
superkey!
©Silberschatz, Korth and Sudarshan3.40Database System Concepts
NormalizationNormalization
Normalization of data - method for analyzing schemas
Unsatisfactory schemas decomposed into smaller ones with desirable properties
Objectives of normalization good relation schemas disallowing update anomalies
©Silberschatz, Korth and Sudarshan3.41Database System Concepts
Formal frameworkFormal framework
database normalized to any degree (1, 2, 3, 4, 5, etc.) normalization is not done in isolation need:
lossless join dependency preservation additional normal forms meet other desirable criteria
©Silberschatz, Korth and Sudarshan3.42Database System Concepts
Normal FormsNormal Forms
1st, 2nd, 3rd, BCNF consider only FD and key constraints
constraints must not be hard to understand or detect
need not normalize to highest form (e.g. for performance reasons)
©Silberschatz, Korth and Sudarshan3.43Database System Concepts
1NF - 1st normal form1NF - 1st normal form
part of the formal definition of a relation
disallow multivalued attributes, composite attributes and their combination
In 1NF single (atomic, indivisible) values
©Silberschatz, Korth and Sudarshan3.44Database System Concepts
Normalize into 1NF?Normalize into 1NF?
How to normalize nested relations into 1NF? Remove nested relation attributes into new relation
propagate PK
combine PK and partial PK
recursively unnest - multilevel nesting
useful in converting hierarchical schemes into 1NF
©Silberschatz, Korth and Sudarshan3.45Database System Concepts
Difficulties with 1NFDifficulties with 1NF
insert, delete, update
Determine if describe entity identified by PK?
If not, called non-full FDs
we need full FDs for good inserts, deletes, updates
©Silberschatz, Korth and Sudarshan3.46Database System Concepts
Second Normal Form - 2NFSecond Normal Form - 2NF
Uses the concepts of FDs, PKs and this definition: An FD is a Full functional dependency if:
given Y -> Z
Removal of any attribute from Y means the FD does not hold any more
©Silberschatz, Korth and Sudarshan3.47Database System Concepts
2NF2NF
A relation schema R is in 2NF if: Relation is in 1NF Every non-prime attribute A in R is fully functionally dependent on the
primary key
Prime attribute - attribute that is a member of the primary key K
R can be decomposed into 2NF relations via the process of 2NF normalization Remove partial dependenciesRemove partial dependencies create new relations where partials are fullcreate new relations where partials are full
©Silberschatz, Korth and Sudarshan3.48Database System Concepts
Simplifying Functional Dependencies through Normalization
Normalization:the identification of functional dependenciesand the modifications required to structurally change the database to remove undesirable dependencies
©Silberschatz, Korth and Sudarshan3.49Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.50Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.51Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.52Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.53Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.54Database System Concepts
Source: ESRI Advanced ArcInfo
©Silberschatz, Korth and Sudarshan3.55Database System Concepts
September 2 ,2004
Read the following article:IBM's early relational database scientists:http://www.mcjones.org/System_R/SQL_Reunion_95/sqlr95.html
Chapter 3 3.1. And Chapter 7,7.1-7.3.2
Work on problems:
7.12.7.13,7.14,7.15