r. j. daigle normalization concepts cis 507 database programming

29
R. J. Daigle Normalization Concepts CIS 507 Database Programming

Upload: katrina-todd

Post on 13-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Normalization Concepts

CIS 507Database Programming

Page 2: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

• Two types of Anomalies associated with databases.

• Modification Anomaly

– an unexpected consequence of changing the actual data in a database

• Design Anomaly

– a flaw in the logical design of the database itself

• Basic Principles:

– For each modification anomaly there is a design anomaly

– For each design anomaly there are associated modification anomalies

Introduction

EmployeeID

Last Name

First Name

111222 Jones Samual

111333 Smith Sandra

222444 Edwards Edwin

EmployeeID

EmployeeID

EmployeeID

111222 111222

Page 3: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleTypes of modification anomalies:

• Insertion: – add a new student—since the key is STUDENT-ID + COURSE, a student can

only be added when the course has been completed.• Deletion:

– of student 112233 results in the loss of information about course CIS 503.• Update.

– Student 112244 to Tonya Marshall requires the change to take place in several places.

STUDENT-ID STUDENT-NAME COURSE COURSE NAME GRADE

112211 John Smith CIS 501 Accelerated Programming B

112244 Tonya Tucker CIS 501 Accelerated Programming B

112244 Tonya Tucker CIS 502 Architecture and OS A

112233 Michael Boyd CIS 503 Data and File Structure A

112255 George Jones CIS 502 Architecture and OS B

112244 Tonya Tucker CIS 504 Networks & Communications C

Page 4: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleNormalization

• Design anomalies have been classified and criteria for removal of the anomalies have been developed.

• The process of removing design anomalies is called Normalization.

• A Normal Form is associated with the removal of a specific type of anomaly.

• The known normal forms from lowest to highest are: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, DKNF

• Any design which is evaluated as a higher form automatically satisfies the lower forms.

Page 5: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleNormalization

• Theory of Normalization Contributors– Dr. E. F. Codd introduced the first three normal forms

in the same paper in which the Relational Model was introduced ( A Relational Model of Data for Large Shared Databanks, CACM, Vol 13, No 6, June, 1970.)

– Dr. R. F. Boyce extended Codd's original three forms.

– Dr. R. Fagin extended the theory as proposed by Codd and introduced another way of evaluating a design.

– Dr. D. M. Kroenke has been instrumental in clarifying the theory of normal forms in his role as educator.

Page 6: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleKnown Normal Forms.

• The normal forms in order from lowest to highest are– First Normal Form (1NF). Elimination of repeating field types– Second Normal Form (2NF). Elimination of partial key

dependencies.– Third Normal Form (3NF). Elimination of transitive key

dependencies among non-key attributes.– Boyce-Codd Normal Form (BCNF). Elimination of partial key

dependencies upon non-key attributes.– Fourth Normal Form (4NF). Elimination of multi-valued

dependencies.– Fifth Normal Form (5NF). Elimination of join anomalies.– Domain Key Normal Form. Elimination of all modification

anomalies.

Page 7: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleBasic Definitions.

• Assumptions:

– e is an entity type

– ε is the set of attributes for e

– A, B, C, ... are non-empty subsets of ε

Page 8: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleBasic Definitions.

• Functional Dependence: B is Functionally Dependent on A if for each value of A there is exactly one value of B.– A is said to Functionally Determine B– A is called a Determinant.– The relationship between A and B is represented as A --> B– If A --> ε, A is said to be an Identifier for the entity type e

• Examplee = STUDENT-DORM-FEE

ε = {STUD-ID, STUD-NAME, DORM, DORM-FEE}A = {STUD-ID}B = {DORM}C = {DORM-FEE} A --> ε, B --> C

Page 9: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleKey

• K is a Key for the entity e if and only if

1. K --> ε and

2. no non-empty subset of K determines ε.• Example (From last example)

– A = {STUD-ID} is a key for e = STUDENT-DORM-FEE

– {STUD-ID, DORM-FEE} is not a key for e.

• An attribute which belongs to the selected key A is called a Key Attribute; all other attributes are called Non-Key Attributes.

Page 10: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

First Normal Form (1NF) (Repeating Attribute Types)

• A is a repeating attribute type or repeating field type if for each occurrence of e there may be 0, 1, or more occurrences of values for A.

• The data structure used for a repeating attribute types gives rise to maintenance difficulties. – Static approach: embedding repeating field type is within the entity

type (implemented as an array)• allocation for maximal perceived use results in unused space or• insufficient storage for some entities if the maximal perceived

use is underestimated– Dynamic approach: data structure which requires more complex

functions for management.• Example.

– e = {STUD-ID, STUD-NAME, COURSE, COURSE-GRADE}COURSE and COURSE-GRADE form a repeating type pair of attributes since any student will have completed 0, 1, or more courses.

Page 11: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleFirst Normal Form (1NF) (Atomic Attributes)

• An attribute is Atomic if the attribute defines the lowest level of usage of data collected for the attribute.

• Example.e = {STUD-ID, STUD-NAME, COURSE, COURSE-GRADE}

– No guarantee that LAST-NAME will available since the design does not define it as an attribute.

– Examples for Johnson Albert Gilbert:JOHNSON GILBERT, GILBERT JOHNSON, J. A. GILBERT, J. GILBERT, GILBERT J., GILBERT J. A., JOHNSON A. GILBERT, J. ALBERT GILBERT, GILBERT JOHNSON A., GILBERT JOHNSON ALBERT,JOHNSON ALBERT GILBERT, GILBERT J. ALBERT, etc.

– The forms shown above might all be present for different entities in the data collection.

– Any algorithm designed to extract the desired information would have to consider all possibilities.

– Avoided by properly identifying the lowest level of use during the design phase rather than relying on an application to obtain the desired data.

Page 12: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. DaigleFirst Normal Form (1NF) (Definition)

• An entity type e is in FIRST NORMAL FORM (1NF) if and only if

1. e has no repeating attribute types AND

2. all attribute types of e are atomic.

Page 13: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Token Diagram (Abstraction).

A

B

C

D1 D2

E

Page 14: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Symbolic Table.

Insertion Anomaly. b. Deletion Anomaly. c. Update Anomaly. 

A B C D GRADE

112211 John Smith CIS 501 Accelerated Programming B

112244 Tonya Tucker CIS 501 Accelerated Programming B

112244 Tonya Tucker CIS 502 Architecture and OS A

112233 Michael Boyd CIS 503 Data and File Structure A

112255 George Jones CIS 502 Architecture and OS B

112244 Tonya Tucker CIS 504 Networks & Communications C

Page 15: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Second Normal Form (Partial Key Dependency)

An entity type e with key, K, has a Partial Key Dependency if and only if a collection of non-key attributes is determined by (or functionally dependent on) a non-empty proper subset of K.

Page 16: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Second Normal Form (Definition)

An entity type e is in SECOND NORMAL FORM (2NF) if and only if a. e is in 1NF ANDb. e has no partial key dependencies.

Page 17: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Example.

Difficulties. 

A

B

C

D

Page 18: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Token Diagram (Abstraction).

A

B

C

D

A

B

C

D

A

Page 19: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Symbolic Table.

a. Insertion Anomaly.

b. Deletion Anomaly.

c. Update Anomaly.

Page 20: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Third Normal Form (3NF).

For A and C, attribute collections for an entity type e, there is a Transitive Dependency of C upon A if there is an attribute collection, B, of e for whicha. A --> B andb. B --> C.

Page 21: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Third Normal Form (Definition)

An entity type e is in THIRD NORMAL FORM (3NF) if and only if a. e is in 2NF ANDb. e has no transitive dependencies of one non-key attribute collection upon another non-key attribute collection

Page 22: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Example

Difficulties.

Page 23: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Token Diagram (Abstraction)

Page 24: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Symbolic Table.

Insertion Anomaly.Deletion Anomaly.Update Anomaly.

Page 25: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Boyce-Codd Normal Form (BCNF).

Attribute collections A and B of an entity type e are Candidate Keys for e if and only ifa. A is a key for e andb. B is a key for e andc. A is not equal to B.

Page 26: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Boyce-Codd Normal Form (BCNF) (Definition)

An entity type e is in BOYCE-CODD NORMAL FORM (BCNF) if and only if a. e is in 3NF ANDb. all determinants of e are candidate keys.

Page 27: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Example.

Difficulties.

Page 28: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Token Diagram (Abstraction) Second Normal Form (Partial Key Dependency)

Page 29: R. J. Daigle Normalization Concepts CIS 507 Database Programming

R. J. Daigle

Symbolic Table.

a. Insertion Anomaly. b. Deletion Anomaly. c. Update Anomaly.