normalization what is it? it is the process for assigning attributes to entities. normalization...

Post on 15-Dec-2015

228 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Normalization What is it? It is the process for assigning

attributes to entities. Normalization reduces data redundancies and , by extension, helps eliminate the data anomalies that result from those redundancies.

Goal of Normalization Organize data element in such a

way that they are stored in one place and one place only (with the exception of foreign keys, which are shared).

Unnormalized DataPuppy NumberPuppy NameKennel CodeKennel NameKennel LocationBreederBreedTrick ID 1…nTrick Name 1…nTrick Where Learned 1…nSkill Level 1…nCostume 1…n

No normalization Trick ID, Trick

Name, Trick Where Learned, Skill Level, and Costume all repeat multiple times

First Normal Form A relation R is in 1NF if and only if

all underlying domains contain atomic values only.

First Normal Form Eliminate repeating groups Make a separate table for each set

of related attributes, and give each table a primary key

1st Normal Form Trick (along with

skill and costume, assuming that skill and costume relate to trick) is a repeating group

Form new table to hold trick information

Puppy / 1

Puppy NumberPuppy NameKennel CodeKennel NameKennel LocationBreederBreed

Trick / 2

Trick IDPuppy Number (FK)Trick NameTrick Where LearnedSkill LevelCostume

Second Form Normal A relation R is in 2NF if it is in 1NF

and every non-key attribute is fully dependent on the primary key.

Second Form Normal Eliminate Redundant Data If an attribute depends on only

part of a multi-valued key, remove it to a separate table

2nd Normal Form Trick Name is only

partially Dependent on Puppy Number, Trick ID

Trick Name is fully dependent on Trick ID

Change Trick Table so it only holds information dependent on Trick ID

Form new table to hold information about the Puppy and Trick

Trick / 2Trick IDTrick Name

Puppy / 1Puppy NumberPuppy NameKennel CodeKennel NameKennel LocationBreederBreed

PuppyTrick / 3Puppy Number (FK)Trick ID (FK)Trick Where LearnedSkill LevelCostume

Third Form Normal A relation R is in 3NF if it is in 2NF

and every non-key attribute is non-transitively dependent on the primary key.

A relation R is in 3NF if and only if it is in 2NF and every determinant is a candidate key.

Third Normal Form Eliminate columns not dependant

on primary key If attributes do not contribute to a

description of the key, remove them to a separate table

Third Normal FormKennel / 4Kennel CodeKennel NameKennel LocationBreeder

Trick / 2Trick IDTrick Name

PuppyTrick / 3Puppy Number (FK)Trick ID (FK)Trick Where LearnedSkill LevelCostume

Puppy / 1Puppy NumberPuppy NameBreedKennel Code (FK)

Z

•Kennel Information is not dependent on the puppy number

•Kennel Name, Kennel Location, and Breeder are dependent on Kennel Code

•Form a Kennel table, with Kennel Code as key

Fourth Normal Form A relation R is in 4NF if and only if

all multi-valued dependencies are functional dependencies

Fourth Normal Form Isolate Independent Multiple

Relationships No table may contain two or more

1:n or n:n relationships that are not directly related

Fourth Normal Applied Trick and Costume are currently in

the same table Are Trick and Costume directly

related? Does the Costume dictate the Trick

the puppy does? Does the Trick dictate the Costume

the Puppy wears? If not, separate them

Fourth Normal FormCustumes / 5Costume IDCostume Name

Kennel / 4Kennel CodeKennel NameKennel LocationBreeder

Trick / 2Trick IDTrick Name

PuppyTrick / 3Puppy Number (FK)Trick ID (FK)Trick Where LearnedSkill Level

Puppy / 1Puppy NumberPuppy NameBreedKennel Code (FK)

PuppyCostumes / 6Puppy Number (FK)Costume ID (FK)

Z

•Trick and Costume are two different 1:n relations that are not directly related to each other. Separate them into two tables

Fifth Normal Form A relation R is in 5NF if and only if

every join dependency in R is implied by the candidate keys

Fifth Normal Form Isolate Semantically related

Multiple Relationships There may be practical constrains

on information that justify separating logically related many-to-many relationships

Why Fifth Form Normal Suppose the database will support

which breeds are available at each kennel and which breeders supplies those breeds

We could satisfy this with a Kennel-Breeder-Breed table

Kennel-Breeder-Breed

Kennel Number

Breeder

Breed

Kennel-Breeder-Breeds

Kennel Number

Breeder Breed

5 Acme Spaniel

5 Acme Dachshund

5 Acme Banana-Biter

5 Puppy Factory Spaniel

5 Puppy Factory Dachshund

5 Puppy Factory Banana-Biter

5 Whatapuppy Spaniel

5 Whatapuppy Dachshund

5 Whatapuppy Banana-Biter

What’s The ProblemNow suppose a kennel selling any breed must offer that breed from all breeders it deals with. In other words, if Khabul Khennels sells Afghans and wants to sell any Daisy Hill puppies, it must sell Daisy Hill Afghans.

The need for fifth normal form becomes clear when we consider inserts and deletes. Suppose that a kennel (whose number in the database happens to be 5) decides to offer three new breeds: Spaniels, Dachshunds, and West Indian Banana-Biters. Suppose further that this kennel already deals with three breeders that can supply those breeds. This will require nine new rows in the database, one for each breeder-and-breed combination.

Breaking up the table reduces the number of inserts to six. Here are the tables necessary for fifth normal form, shown with the six newly inserted rows.

Fifth Form NormalCustumes / 5

Costume IDCostume Name

Kennel / 4

Kennel CodeKennel NameKennel Location

Trick / 2

Trick IDTrick Name

PuppyTrick / 3

Puppy Number (FK)Trick ID (FK)Trick Where LearnedSkill Level

Puppy / 1

Puppy NumberPuppy NamePuppy BreedKennel Code (FK)

KennelBreeder / 8

Kennel Code (FK)Breeder

KennelBreed / 7

Kennel Code (FK)Breed

PuppyCostumes / 6

Puppy Number (FK)Costume ID (FK)

Fifth Normal Form If significant update is involved,

Fifth Normal Form can mean significant savings

It is possible to lose information with Fifth Normal Form

Normalization (summary) Take projections of original 1NF relation to

eliminate non-full functional dependencies Take projections of these 2NF relations to

eliminate transitive functional dependencies

Take projections of these 3NF relations to eliminate any remaining functional dependencies that do not arise from candidate keys

Normalization (summary) Take projections of these 3NF

relations to eliminate multi-dependencies that are not also functional dependencies

Take projections of these 4NF relations to eliminate any remaining join dependencies that are not also multi-dependencies

Normalization Guide Single membership of an instance in a

set is recognized by a stable, unique identifier (key)

All the attributes in an entity depend on all the key attributes of that entity

None of the attributes depend on any other attributes other than the keys

Any attributes which can be recognized as a separate set have their own entity and key

Normalization (simplified) The key, the whole key, and

nothing but the key, so help me Codd.

Denormalization Derived Columns Deliberate Duplication Removal or Disabling of

Constraints

Derived Columns Calculated fields, such as Total

Amount (Qty x Unit Price) While useful, are not part of a fully

normalized model May be added back into the

physical database

Deliberate Duplication Duplicating the same column in 2 or

more tables It might seem desirable to duplicate a

column(s) to avoid joins, such as duplicating an employee name where the employee number is a foreign key

This would require the update of multiple tables if that employee changed their name

Removal of Constraints The removal of referential integrity

(relationship) constraints to speed up update processes

The goal of the logical data model is to translate the business model (CDM) into a fully normalized database design. Part of that is the relationships

Constraints may be removed from the physical database, but not the LDM

Denormalization Denormalization may be done to

the physical database design Any denormalization is deliberate

and for rational and supportable reasons

DBA’s dirty little secret Normalization is over-valued by

those that do it. Normalization is under-valued by

those that don’t.

top related