part 6 normalization - university of st....

32
Part 6 Normalization

Upload: vudieu

Post on 08-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Part 6

Normalization

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 2

Normal Form Overview

Universe of All DataRelations (normalized / unnormalized

1st Normal Form

2nd Normal Form

3rd Normal FormBoyce-Codd Normal Form (BCNF)

4th Normal Form5th Normal Form (PJ/NF)

Domain/KeyNormal Form

(DK/NF)

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 3

Universe of Relations

Any sequential file is a relation Not all relations are “well formed” Normalization provides a set of criteria to evaluate the

“well formedness” of a relation Normal form is only one criterion for determining a

“good” model In general, a sequential file may have repeating groups Example 1 - suppliers:

part suppliers diode (GE, TRW, Mot) bulb (GE, Syl)

Implemented as: part supplier1 supplier2 supplier3 diode GE TRW Mot bulb GE Syl

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 4

Problem with a Relation Not in First Normal Form

Retrieval:

• which supplier of a part should be retrieved?

• in which order should suppliers be retrieved?

Storage: • how many suppliers do you allow for? • which spaces are kept blank, and how?

Insert: • to add a supplier, need to retrieve all

suppliers, add the new supplier to an empty slot, and replace the record

Delete: • to delete a supplier, need to “adjust” the

vector (read, move around, erase, re-write) Update:

• to update a supplier name, need to retrieve all suppliers, find the one to alter, and re-write entire set of suppliers

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 5

Solution to Repeating Group Problem Eliminate repeating groups by repeating the key

Example 1 - suppliers:

part supplier diode GE diode TRW diode Mot bulb GE bulb Syl

This new table has a different key than the

old one. It is part plus supplier.

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 6

First Normal Form

All underlying domains contain atomic values only (no vectors / repeating groups)

Example 2 - inventory:

part # warehouse # wh_address quantity 100 05 Mpls 200 100 08 StPaul 300 200 05 Mpls 250 200 10 Madison 400 300 08 StPaul 350

Update Anomalies:

UPDATE • address of warehouse stored in many rows • if address changes, must change all rows

DELETE • if the last row for a warehouse is deleted, the address is

lost INSERT

• to insert a new row, warehouse address must be known

The problem occurs because this table is not focused on one primary key - it is “about” two things - warehouses and parts in warehouses.

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 7

Solution to Multiple Focus Problems A relation that is in 1NF but not in a higher normal

form has a composite key (more than one attribute in the key)

Establish 2 relations via projection

Example 2 - inventory:

One table about warehouses: warehouse# wh_address 05 Mpls 08 StPaul 10 Madison One table about inventory with a composite key: part# warehouse# quantity 100 05 200 100 08 300 200 05 250 200 10 400 300 08 350

The original table in 1NF can be reconstructed by a join

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 8

Second Normal Form 1NF + every non-key attribute is fully functionally

dependent on the primary key

Example 3 - departments: name dept dept_loc smith 402 100 jones 401 200 king 402 100 turner 400 200 olson 401 200

Problem: Functional dependency is transitive

• The primary key is name • dept is functionally dependent on name • dept_loc is also functionally dependent on name, but it is

transitive because dept functionally determines dept_loc

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 9

Problems with 2NF Relations Update Anomalies:

UPDATE - location appears many times - if location of

a department changes, must fetch and change all rows containing that location

DELETE - if the last row for a department is deleted,

the department location information is lost INSERT - to insert a new row, department location

must be known

Solution: Establish 2 relations via projection Example 3 - departments:

name dept and dept dept_loc smith 402 400 200 jones 401 401 200 king 402 402 100 turner 400 olson 401

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 10

Third Normal Form 2NF + every non-key attribute is non-transitively

functionally dependent on the primary key

OR Every non-key attribute is

• mutually independent (none is functionally dependent

on any of the others) • fully functionally dependent on the primary key

OR

(Kent) Each attribute in the relation is

functionally dependent on the key, the whole key, and nothing but the key

A relation that is 2NF but not 3NF

• can be split into a collection of 3NF relations by

projection • can be reconstructed by join

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 11

3NF Examples Example 4 - locations:

dept# dept_name dept_loc 400 programming 200 401 financial 200 402 academic 100 403 support 300

• dept# and dept_name are candidate keys • dept_loc is the only non-key attribute, and is, by

default, non-transitively functionally dependent on the primary key

• This table is fine - it is only about departments

Example 5 - stock: s# sname p# qty 10 GE 102 1000 10 GE 103 625 10 GE 104 2000 20 TRW 102 500 20 TRW 105 1200 30 Syl 103 1300

• technically in 3NF

• qty is the only non-key attribute (like example 1) • candidate keys are (s#, p#) and (sname, p#) • didn't require components of an alternate key to be fully

functionally dependent on the primary key

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 12

Problems with 3NF Relations The problems associated with alternate key

components were not recognized in the early formulations of the relational model.

Have the same update anomalies as second normal

form Solution: Establish 2 relations via projection

Example 5 - stock: s# sname and s# p# qty 10 GE 10 102 1000 20 TRW 10 103 625 30 Syl 10 104 2000 20 102 500 20 105 1200 30 103 1300

or [s#, sname] and [sname, p#, qty] Because of this problem, 3NF (as we have

described it) is sometimes referred to as “early 3rd Normal Form”

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 13

BCNF Boyce-Codd Normal Form

3NF + every determinant is a candidate key

(Determinant: any attribute on which some other

attribute is fully functionally dependent) • In example 4, dept# determined dept_name; in

example 5, s# determined sname • In example 4, dept# was a candidate key • In example 5, s# (by itself) was not a candidate key

A relation that is 3NF but not BCNF • Can be split into a collection of BCNF relations by

projection • Can be reconstructed by join

BCNF is sometimes referred to as late 3rd Normal Form, or even just as 3rd Normal Form

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 14

3NF to BCNF Example Example 6 - enrollment

Rules: 1. For each subject, each student is taught by 1 teacher 2. Each teacher teaches only 1 subject (don't I wish) 3. Each subject is taught by several teachers

Student Subject Teacher Smith Math Dr. White Smith English Dr. Brown Jones Math Dr. White Jones English Dr. Brown Doe Math Dr. Green a. Teacher dependent on Student + Subject b. Subject dependent on Teacher c. Teacher not dependent on Subject d. (Student, Subject) is a candidate key e. (Student, Teacher) is also a candidate key

Update anomalies, e.g., Dr. White changes name Relation in 3NF, but not in BCNF Teacher is a determinant (b.), but not a candidate key (d.

and e.)

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 15

Solution to Example 6 Solution: Form two relations:

Student Teacher and Teacher Subject Smith Dr. White Dr. White Math Smith Dr. Brown Dr. Brown English Jones Dr. White Dr. Green Math Jones Dr. Brown Doe Dr. Green

Question: How did we know to break it up this way? Answer: The rules help us make this decision. In this

case, rule 2 gives us the crucial information - once you know the teacher, you know the subject. Therefore, we need two tables to enforce the rule.

• The [Teacher, Subject] table tells us which one subject each teacher teaches.

• Students, in general, need both a subject and a teacher • If we specify only subject, we don't know the teacher • If we specify teacher, however, we do know the subject

because of the rule and the first table

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 16

Fourth Normal Form 4NF and 5NF are relevant only when all attributes

in the relation are parts of the key • if in BCNF and have a non-key attribute,

also in 5NF

Example 7 - skills: Suppose we wish to store employee job skills and

language skills. (An employee may have many of each.) employee skill language Jones electrical French Jones electrical German Jones mechanical French Jones mechanical German Smith plumbing Spanish

In general: if Jones x A and Jones y B then Jones x B and Jones y A

The relation is in BCNF - because it is all key ...

but there is redundancy

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 17

Converting to 4NF Ask the following questions:

• Could the relation have non-key attributes? • Could any combination be missing?

If either answer is NO, need to break up relation to achieve 4NF

Example 7 - skills:

employee skill language should be broken up into two relations:

employee skill and employee language Jones electrical Jones French Jones mechanical Jones German Smith plumbing Smith Spanish

if job skill and language are independent

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 18

Problems without 4NF Problem occurs when dealing with multiple,

independent facts How do we represent them in a single relation?

Disjoint: Jones electrical Jones mechanical Jones French Jones German Smith plumbing Smith Spanish Random mix: Jones electrical French Jones mechanical German Smith plumbing Spanish (do extras - repeat, - blank, - anything?) Cross product: Jones electrical French Jones mechanical French Jones electrical German Jones mechanical German Smith plumbing Spanish

Check for independence!

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 19

Fifth Normal Form PJ/NF or Projection-Join Normal Form

(Kent) - Deals with cases where information can be reconstructed from smaller pieces of information which can be maintained with less redundancy

Example 8 - dealerships: 1. Agents represent Companies 2. Companies make Products 3. Agents sell Products

Which Agent sells which Product for which Company? Agent Company Product smith ford car smith gm truck jones ford car

• this form is necessary in the general case

BUT if we put a rule into effect that reads: 4. if an agent sells a product, and an agent represents a

company, then the agent must sell the product made by the company

So, to obey the rule, we must add smith ford truck smith gm car

NOW, with the rule and the new rows, we have REDUNDANCY

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 20

Converting to 5NF This time, we must break the relation into three

parts (will not break in two) Example 8 - dealerships:

Agent Company Product smith ford car smith gm truck jones ford car smith ford truck smith gm car

BREAK INTO 3

Agent Company Agent Product Company Product smith ford smith car ford car smith gm smith truck ford truck jones ford jones car gm car gm truck

A relation is already in 5NF if its information content cannot be reconstructed from several smaller record types (having different keys)

• Only have 5NF problems if there are symmetry constraints (a pair of rows requires the existence of one or more additional rows)

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 21

Domain/Key Normal Form No insertion/deletion anomalies Impossible to make an insertion/deletion that

violates a constraint Constraint types:

• domain constraints • key constraints Example 9 - customers

cust# branch 1234 west 1325 south 1421 east 1511 south

where valid branches are west, east, and south

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 22

Enforcing Domain Integrity in DK/NF

Example 9 - customers: cust# branch 1234 west 1325 south 1421 east 1511 south 1600 north

If this update is possible, not in DK/NF

One possibility for prohibiting this update is to maintain a

table of legal branches and write code to prohibit the entry of a branch not in the table legal branch west south east

Problem: What's to stop someone from placing north in

the legal branch table? Possible partial solution: Restrict access to the legal

branch table

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 23

Normalization Example: AutoCAD Database

Manufacturing plant electrical wiring

specifications Blueprints contain:

• parts at locations • wired connections • attributes for each wire and location

AutoCAD transmits variable-length records

• only transmits data for “smart parts” • one record per part • all data must be related to one or more parts

Objectives:

• number of wires from any source to any destination • sub-classified by voltage, shielding, and intrinsic

safety characteristics • obtain conduit count from wire count (by hand)

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 24

Wiring Problem Statement

Problem:

Count the wires going from panel (3) to panel (4)

Part: AS 303

Part: AS 404

Part: AS 405

Panel (3)

Panel (4)

Wire 52

Wire 53 Wire

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 25

Wiring Database Normalization

0th Normal Form Part# Loc Loc_Desc Wire1 Volt IS Wire2 Volt IS Wire3 ... AS303 (3) Panel 3 52 240 53 24 IS 54 AS404 (4) Panel 4 52 240 55 120 AS405 (4) Panel 4 53 24 IS

1st Normal Form (no repeating groups) – but the 2-part key creates partial

dependencies Part# Loc Loc_Desc Wire# Volt IS AS303 (3) Panel 3 52 240 AS303 (3) Panel 3 53 24 IS AS303 (3) Panel 3 54 240 AS404 (4) Panel 4 52 240 AS404 (4) Panel 4 55 120 AS405 (4) Panel 4 53 24 IS

2nd Normal Form (no partial dependencies) Part# Loc Loc_Desc Wire# AS303 (3) Panel 3 52 AS303 (3) Panel 3 53 AS303 (3) Panel 3 54 AS404 (4) Panel 4 52 AS404 (4) Panel 4 55 AS405 (4) Panel 4 53

Wire# Volt IS 52 240 53 24 IS 54 240 55 120

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 26

Wiring Database Normalization (Continued)

2nd Normal Form (apply rule again) – with the step-by-step approach, you only

eliminate one partial dependency at a time Part# Wire# AS303 52 AS303 53 AS303 54 AS404 52 AS404 55 AS405 53

Wire# Volt IS 52 240 53 24 IS 54 240 55 120

Part# Loc Loc_Desc AS303 (3) Panel 3 AS404 (4) Panel 4 AS405 (4) Panel 4

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 27

Wiring Database Normalization (Concluded)

3rd Normal Form (no transitive dependencies) Part# Wire# AS303 52 AS303 53 AS303 54 AS404 52 AS404 55 AS405 53

Wire# Volt IS 52 240 53 24 IS 54 240 55 120

Part# Loc AS303 (3) AS404 (4) AS405 (4)

Loc Loc_Desc (3) Panel 3 (4) Panel 4

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 28

A Case Study (March) Population:

• MBA students • MIS majors • last quarter in the program

Language: • Nomad 2 4GL

Case: • Customer, with attributes • Dealer, with attributes • Manufacturer, with attributes • Contracts - customer, dealer, manufacturer, with

symmetry constraints

Task: • Given case description fully analyzed • Use existing Nomad database • Perform 8 queries • Perform 11 updates

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 29

Relations Given in Case Study (March) Design:

• 3 groups of 14 students • same case, same queries, same updates • different schemas (1NF, 3NF, and 5NF)

1NF Schema: (d#, m#, c#, mfgr_attr, cust_attr, dealer_attr)

3NF Schema: (c#, cust_attr) (d#, dealer_attr) (m#, mfgr_attr) (d#, c#, m#)

5NF Schema: (c#, cust_attr) (d#, dealer_attr) (m#, mfgr_attr) (d#, c#) (d#, m#) (c#, m#)

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 30

Preliminary Case Results (March) Tasks Correctly Performed

Normal Form: Queries (8) Updates (11) First 7.21 5.07 (90%) (46%) Third 4.50 3.64 (56%) (33%) Fifth 4.42 3.21 (55%) (29%)

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 31

Database Design Issues Revisited Ease of query formulation Ease of enforcing referential integrity constraints Ease of avoiding update anomalies Normalization focuses only on avoiding update

anomalies “Being normal is not enough” Possible solutions:

1. Don't normalize 2. Don't normalize beyond BCNF 3. Normalize to 5NF, but back off Problems with 1-3: update anomalies, bad data,

knowledge of database storage needed 4. Don't let users at base tables 5. Create views that are in low normal forms 6. Pre-define joins that give users the data they need Solutions 4-6 are more work, but generally worth the

effort

Copyright 1971-2002 Thomas P. Sturm Normalization Part 6, Page 32