copyright, harris corporation & ophir frieder, 19981 normal forms “why be normal?” - author...
TRANSCRIPT
Copyright, Harris Corporation &
Ophir Frieder, 19981
Normal Forms
“Why be normal?”
- Author unknown
Normal
Copyright, Harris Corporation &
Ophir Frieder, 19982
Objectives
• To define first, second, third, and Boyce-Codd normal forms.
• To discuss the motivation for normal forms, and the implications for database design.
Copyright, Harris Corporation &
Ophir Frieder, 19983
Normal Forms
• A relational scheme is said to be in first normal form (1NF) if and only if each of it’s domains contains only scalar values.
– No “repeating groups” or unbounded lists.
– A field cannot itself be a table, as in Oracle8!
Question: Do CITY and STATE violate 1NF in relation scheme R=(CITY,STATE,POPULATION)?
Copyright, Harris Corporation &
Ophir Frieder, 19984
Motivation For 1NF
• Representation - with a repeating group, some method must be devised for specifying the end or length of the list.
• Space Allocation - how is space allocated on a per record basis if repeating groups are allowed?
SS# NAME HOBBIES
032446254 Mary Smith jogging, wrestling
045242453 Bob Jones cooking, cycling,gardening
932415223 Sue Clark quilting,hiking,
...
Copyright, Harris Corporation &
Ophir Frieder, 19985
Motivation For 1NF
• Operations - without 1NF, all operations become more complex, and this propagates throughout the database management system.
• Theory - 1NF simplifies the theoretical basis of the relational model (e.g., proof of algorithmic correctness).
SS# NAME HOBBIES
032446254 Mary Smith jogging, wrestling
045242453 Bob Jones cooking, cycling,gardening
932415223 Sue Clark quilting,hiking,
...
Copyright, Harris Corporation &
Ophir Frieder, 19986
Converting A RelationalScheme To 1NF
• A repeating group is typically eliminated by “flattening” the table. This makes most things simpler.
SS# NAME HOBBIES
032446254 Mary Smith jogging
032446254 Mary Smith wrestling
045242453 Bob Jones cooking
045242453 Bob Jones cycling
045242453 Bob Jones gardening
:
Copyright, Harris Corporation &
Ophir Frieder, 19987
Single v.s. Multiple Keys
• For the sake of simplicity we will assume initially that each relational scheme has exactly one key.
• Multiple keys do occur, but less frequently.
• In what follows, the formal definitions do not change in the case of multiple keys.
• Given a relational scheme R, it will be helpful throughout the following to divide the attributes up into two sets, those that are part of the key, and those that are not.
Copyright, Harris Corporation &
Ophir Frieder, 19988
Example #1:Key And Non-Key Attributes
• Consider the following relational scheme for a department store chain (e.g., Walmart):
– Attributes:STORE_ID# - A store identification number.
CITY - The city in which the store is located.
STATE - The state in which the store is located.
ITEM - An item sold by the store.
PRICE - The price of the item.
– Functional Dependencies:STORE_ID# => CITY
STORE_ID# => STATE
STORE_ID#, ITEM => PRICE
Copyright, Harris Corporation &
Ophir Frieder, 19989
Example #1, Cont.
STORE_ID# CITY STATE ITEM PRICE
W001 Orlando FL Duck Tape 3.95
W001 Orlando FL Rope 3.75
W002 Savannah GA Paint 8.75
W002 Savannah GA Plywood 12.50
W002 Savannah GA Insulation 9.95
Copyright, Harris Corporation &
Ophir Frieder, 199810
Example #1, Cont.
• The only key is: STORE_ID#,ITEM
Key Attributes Non-Key Attributes
STORE_ID# CITY
ITEM STATE
PRICE
Copyright, Harris Corporation &
Ophir Frieder, 199811
Partial Dependency
• A functional dependency X=>A is a partial dependency if:
– X is a proper subset of the key attributes, and
– A is a non-key attribute.
Copyright, Harris Corporation &
Ophir Frieder, 199812
Example #1, Cont.
Key Attributes Non-Key Attributes
STORE_ID# CITY
ITEM STATE
PRICE
• STORE_ID# => CITY is a partial dependency.
• Similarly, STORE_ID# => STATE is a partial dependency.
Copyright, Harris Corporation &
Ophir Frieder, 199813
Normal Forms - 2NF
• A relational scheme is said to be in second normal form (2NF) if and only if it is in 1NF and contains no partial dependencies.
Question: Why eliminate partial dependencies?
Copyright, Harris Corporation &
Ophir Frieder, 199814
Example #1, Cont.
• Note the redundancy in the following legal relation:
STORE_ID# CITY STATE ITEM PRICE
W001 Orlando FL Duck Tape 3.95
W002 Charleston SC Duck Tape 3.75
W003 Savannah GA Paint 8.75
W003 Savannah GA Plywood 12.50
W003 Savannah GA Paint 9.95
Copyright, Harris Corporation &
Ophir Frieder, 199815
Motivation for 2NFExample #2
• Consider the following relational scheme for maintaining information associated with students at a university:
– Attributes:STUDENT_ID# - The social security number of a student.
NAME - The students last name.
COURSE_ID# - The ID # of a course the student is registered in.
DEPT_ID# - The ID # of the department that offers the course.
– Functional Dependencies:STUDENT_ID# => NAME
COURSE_ID# => DEPT_ID#
Copyright, Harris Corporation &
Ophir Frieder, 199816
Example #2, Cont.
STUDENT_ID# NAME COURSE_ID# DEPT_ID#
032-34-3455 Smith CS1001 8353
032-34-3455 Smith MS4025 9255
892-64-8372 Jones MS4025 9255
892-64-8372 Jones MS4145 9255
892-64-8372 Jones MS5023 9255
Copyright, Harris Corporation &
Ophir Frieder, 199817
Example #2, Cont.
• The only key is: STUDENT_ID#, COURSE_ID#
Key Attributes Non-Key Attributes
STUDENT_ID# NAME
COURSE_ID# DEPT_ID#
• STUDENT_ID# => NAME is a partial dependency, so it is not in 2NF.
• Similarly, COURSE_ID# => DEPT_ID# is a partial dependency.
Copyright, Harris Corporation &
Ophir Frieder, 199818
Example #2, Cont.
• Note the redundancy in the following legal relation:
STUDENT_ID# NAME COURSE_ID# DEPT_ID#
032-34-3455 Smith CS1001 8353
032-34-3455 Smith MS4025 9255
892-64-8372 Jones MS4025 9255
892-64-8372 Jones MS4145 9255
892-64-8372 Jones MS5023 9255
Copyright, Harris Corporation &
Ophir Frieder, 199819
Anomalies Resulting FromPartial Dependencies
• Insertion Anomalies - A new student cannot be added unless they are currently registered for at least one course.
• Deletion Anomalies - If a student drops their last, or only course, then there is no record left of the student.
• Update Anomalies - Changing a students’ name requires all their records to be updated. Similarly, for changing a course ID #, or for assigning a course to a different department.
Note that the first two assume null values are not desirable, which is of considerable debate in the database community.
Copyright, Harris Corporation &
Ophir Frieder, 199820
Transitive Dependency
• A functional dependency X=>A is a transitive dependency if:
– X is a proper subset of the non-key attributes, and
– A is a non-key attribute.
Copyright, Harris Corporation &
Ophir Frieder, 199821
Motivation For 3NFExample #1
• Consider the following relational scheme for National Football League (NFL) athletes:
– Attributes:PLAYER_ID# - The social security number for an NFL athlete.
TEAM - The name of the team the athlete plays for.
STATE - The state in which the team is located.
– Functional Dependencies:PLAYER_ID# => TEAM
TEAM => STATE
Copyright, Harris Corporation &
Ophir Frieder, 199822
Example #1, Cont.
PLAYER_ID# TEAM STATE
024-34-3455 Bills NY
195-63-6775 Dolphins FL
892-64-8372 Vikings MN
934-84-7271 Vikings MN
783-83-9879 Vikings MN
Copyright, Harris Corporation &
Ophir Frieder, 199823
Example #1, Cont.
Key Attributes Non-Key Attributes
PLAYER_ID# TEAM STATE
• TEAM => STATE is a transitive dependency.
Copyright, Harris Corporation &
Ophir Frieder, 199824
Normal Forms - 3NF
• A relational scheme is said to be in third normal form (3NF) if and only if it is in 2NF and contains no transitive dependencies.
Every non-key attribute depends on the key, the whole key and nothing but the key.
Question: Why eliminate transitive dependencies?
Copyright, Harris Corporation &
Ophir Frieder, 199825
Motivation For 3NF, Cont.
Key Attributes Non-Key Attributes
PLAYER_ID# TEAM
STATE
• There are no partial dependencies, so it is in 2NF.
• However, it is not in 3NF because of the transitive dependency TEAM=>STATE.
Copyright, Harris Corporation &
Ophir Frieder, 199826
Example #1, Cont.
• Note the redundancy in the following legal relation:
PLAYER_ID# TEAM STATE
024-34-3455 Bills NY
195-63-6775 Dolphins FL
892-64-8372 Vikings MN
934-84-7271 Vikings MN
783-83-9879 Vikings MN
Copyright, Harris Corporation &
Ophir Frieder, 199827
Motivation for 3NFExample #2
• Consider the following relational scheme for a university student database:
– Attributes:STUDENT_ID# - A students’ social security number.
CITY - The city of the students’ home address.
STATE - The state of the students’ home address.
ZIP - The zip code of the students’ home address.
– Functional Dependencies:STUDENT_ID# => CITY
STUDENT_ID# => STATE
STUDENT_ID# => ZIP
ZIP => STATE
ZIP => CITY
Copyright, Harris Corporation &
Ophir Frieder, 199828
Example #2, Cont.
STUDENT_ID# CITY STATE ZIP
024-34-3455 Melbourne FL 32934
195-63-6775 Melbourne FL 32934
892-64-8372 Orlando FL 32816
934-84-7271 Orlando FL 32816
783-83-9879 Oswego NY 13126
Copyright, Harris Corporation &
Ophir Frieder, 199829
Example #2, Cont.
Key Attributes Non-Key Attributes
STUDENT_ID CITY
STATE
ZIP
• There are no partial dependencies, so it is in 2NF.
• ZIP => STATE and ZIP=>CITY are transitive dependencies, so it is not in 3NF.
Copyright, Harris Corporation &
Ophir Frieder, 199830
Example #2, Cont.
• Note the redundancy in the following legal relation:
STUDENT_ID# CITY STATE ZIP
024-34-3455 Melbourne FL 32934
195-63-6775 Melbourne FL 32934
892-64-8372 Orlando FL 32816
934-84-7271 Orlando FL 32816
783-83-9879 Oswego NY 13126
Copyright, Harris Corporation &
Ophir Frieder, 199831
Motivation for 3NFExample #3
• Consider a relational scheme for tracking software licenses:
– Attributes:LICENSE_ID# - The license ID number for a piece of software.
MACHINE_ID# - The ID number of the machine on which the software is installed.
EMPLOYEE_ID# - The social security number of the employee to which the machine is assigned.
LOCATION - The location of the employee’s office.
– Functional Dependencies:LICENSE_ID# => MACHINE_ID#
MACHINE_ID# => EMPLOYEE_ID#
EMPLOYEE_ID# => LOCATION
Copyright, Harris Corporation &
Ophir Frieder, 199832
Example #3, Cont.
LICENSE_ID# MACHINE_ID# EMPLOYEE_ID# LOCATION
3243452 3357 023-45-7886 B1-234
4432451 5342 094-34-0033 B3-9867
4452312 4256 143-95-7453 B2-6774
5343231 4256 143-95-7453 B2-6774
5534242 1124 143-95-7453 B2-6774
Copyright, Harris Corporation &
Ophir Frieder, 199833
Example #3, Cont.
Key Attributes Non-Key Attributes
LICENSE_ID# MACHINE_ID#
EMPLOYEE_ID#
LOCATION
• There are no partial dependencies, so the relational scheme is in 2NF.
• MACHINE_ID# => EMPLOYEE_ID# and EMPLOYEE_ID# => LOCATION are both transitive dependencies, so it is not in 3NF.
Copyright, Harris Corporation &
Ophir Frieder, 199834
Example #3, Cont.
• Note the redundancy in the following legal relation:
LICENSE_ID# MACHINE_ID# EMPLOYEE_ID# LOCATION
3243452 3357 023-45-7886 B1-234
4432451 5342 094-34-0033 B3-9867
4452312 4256 143-95-7453 B2-6774
5343231 4256 143-95-7453 B2-6774
6634324 1124 143-95-7453 B2-6774
7534242 1124 143-95-7453 B2-6774
Copyright, Harris Corporation &
Ophir Frieder, 199835
Anomalies Resulting FromTransitive Dependencies
• Insertion Anomalies - A license cannot be added until it is installed on a machine, and until that machine is assigned to an employee.
• Deletion Anomalies - Deleting all of the records for a particular employee would delete any record of the machines or licenses assigned to that employee.
• Update Anomalies - Changing the employee assigned to a particular machine requires multiple record updates. Similarly for changing an employees’ location.
As with partial dependencies, the first two assume null values are not desirable.
Copyright, Harris Corporation &
Ophir Frieder, 199836
Normal Forms - BCNF
• A relation is said to be in Boyce/Codd normal form (BCNF) if every attribute depends on the key, the whole key, and nothing but the key.
Copyright, Harris Corporation &
Ophir Frieder, 199837
BCNFExample #1
• Consider a relational scheme for tracking employee salary adjustments:
– Attributes:EMPLOYEE_ID# - An employee identification number
DATE - A date on which the employee’s salary was adjusted
AMOUNT - The amount of the salary adjustment
EXPLANATION - An explanation for the adjustment
– Functional Dependencies:EMPLOYEE_ID#,DATE => AMOUNT
EMPLOYEE_ID#,DATE => EXPLANATION
Copyright, Harris Corporation &
Ophir Frieder, 199838
Example #3, Cont.
Key Attributes Non-Key Attributes
EMPLOYEE_ID# AMOUNT
DATE EXPLANATION
• There are no partial dependencies, so the relational scheme is in 2NF.
• There are no transitive dependencies, so the relational scheme is in 3NF.
• Each of the non-key attributes depends on both of the key attributes (the key, the whole key, and nothing but the key), so the relational scheme is in BCNF.
Copyright, Harris Corporation &
Ophir Frieder, 199839
Normal Forms - BCNF, Cont.
• Note that the definition of BCNF does not reference that for 3NF.
• This raises a couple of questions:– If a relational scheme is in BCNF, is it also in 3NF?
– If a relational scheme is in 3NF, is it also in BCNF?
• The answer to the first is yes (proof is left as an exercise).
• The answer to the second question depends...
Copyright, Harris Corporation &
Ophir Frieder, 199840
Normal Forms - BCNF, Cont.
• Relational Scheme Has Only One Key:– A relational scheme is in 3NF if and only if it is in BCNF.
• Relational Scheme Has Multiple Keys:– If the relational scheme is in BCNF, then it is in 3NF (already stated).
– If the relational scheme is in 3NF, however, it is not necessarily in BCNF.
• If the case of multiple keys, 1NF, 2NF, and 3NF definitions are still the same.
Copyright, Harris Corporation &
Ophir Frieder, 199841
Motivation For BCNFExample #1
• Consider the following relational scheme:
– Attributes:STUDENT_ID# - A student ID number.
COURSE_ID# - The ID# of a course being taken by the student.
FACULTY_ID# - The ID# of the faculty member who teaches the course taken by the
student.
– Functional Dependencies:STUDENT_ID#,COURSE_ID# => FACULTY_ID#
FACULTY_ID# => COURSE_ID#
Copyright, Harris Corporation &
Ophir Frieder, 199842
Example #1, Cont.
STUDENT_ID# COURSE_ID# FACULTY_ID#
023-45-7886 CSE1001 998-32-4124
094-34-0033 CSE4257 087-32-8797
143-95-7453 CSE2130 774-92-3889
034-42-8009 CSE2130 774-92-3889
Copyright, Harris Corporation &
Ophir Frieder, 199843
Example #1, Cont.
• There are two keys:STUDENT_ID#,COURSE_ID#
STUDENT_ID#,FACULTY_ID#
• The relation is in 1NF, 2NF, and 3NF (why?)
• The relation is not in BCNF because of the dependency:FACULTY_ID# => COURSE_ID#
Copyright, Harris Corporation &
Ophir Frieder, 199844
Example #1, Cont.
• Note the redundancy in the following legal relation:
STUDENT_ID# COURSE_ID# FACULTY_ID#
023-45-7886 CSE1001 998-32-4124
094-34-0033 CSE4257 087-32-8797
143-95-7453 CSE2130 774-92-3889
034-42-8009 CSE2130 774-92-3889
Copyright, Harris Corporation &
Ophir Frieder, 199845
Motivation For BCNFExample #2
• Consider the following relational scheme:
– Attributes:LICENSE_ID# - A Florida state driver’s license number.
SS# - The social security number of the person holding the license.
CODE - A traffic violation code.
QTY - The number of times the person has been issued the violation within the past year.
– Functional Dependencies:LICENSE_ID# => SS# LICENSE_ID#,CODE => QTY
SS# => LICENSE_ID# SS#,CODE => QTY
Copyright, Harris Corporation &
Ophir Frieder, 199846
Example #2, Cont.
LICENSE_ID# SS# CODE QTY
B424-341-52-415 023-45-7886 Speeding 3
C893-402-52-435 094-34-0033 Tail-Gating 4
M345-310-61-875 143-95-7453 Speeding 1
M345-310-61-875 143-95-7453 Double-Parking 6
Copyright, Harris Corporation &
Ophir Frieder, 199847
Example #2, Cont.
• There are two keys:LICENSE_ID#,CODE
SS#,CODE
• The relation is in 1NF, 2NF, and 3NF (why?)
• The relation is not in BCNF because of the dependencies:LICENSE_ID# => SS#
SS# => LICENSE_ID#
Copyright, Harris Corporation &
Ophir Frieder, 199848
Example #2, Cont.
• Note the redundancy in the following legal relation:
LICENSE_ID# SS# CODE QTY
B424-341-52-415 023-45-7886 Speeding 3
C893-402-52-435 094-34-0033 Tail-Gating 4
M345-310-61-875 143-95-7453 Speeding 1
M345-310-61-875 143-95-7453 Double-Parking 6
Copyright, Harris Corporation &
Ophir Frieder, 199849
Update Anomolies
• Insertion Anomalies - The fact that a license ID# has been assigned to a particular person cannot be recorded unless they have at least one violation.
• Deletion Anomalies - Deleting all of the violations for a particular driver would delete any record of the license ID# for that person.
• Update Anomalies - Changing a driver’s name requires changing all the records for each type of violation the driver has committed.
As with partial dependencies, the first two assume null values are not desirable.
Copyright, Harris Corporation &
Ophir Frieder, 199850
Normal Forms Summary
• A relational scheme is said to be in first normal form (1NF) if and only if each of it’s domains contains only scalar values.
• A relational scheme is said to be in second normal form (2NF) if and only if it is in 1NF and contains no partial dependencies.
• A relational scheme is said to be in third normal form (3NF) if and only if it is in 2NF and contains no transitive dependencies.
• A relational scheme is said to be in Boyce/Codd normal form (BCNF) if and only if the only nontrivial dependencies for the relational scheme are those in which a key functionally determines one or more attributes (“every attribute depends on the key, the whole key, and nothing but the key”).
Copyright, Harris Corporation &
Ophir Frieder, 199851
Normal Forms - 4NF & 5NF
Currently Beyond The Scope Of This Course:
• A relational scheme R is said to be in fourth normal form (4NF) if and only if whenever there is a multivalued dependency X=>>Y, where Y is not empty or a subset of X, and XY does not include all the attributes of R, then X is a superkey of R.
• A relational scheme R is said to be in fifth normal form (5NF) - also called projection-join normal form (PJ/NF) - if and only if every join dependency in R is implied by the candidate keys of R.