3nf and boyce-codd normal form prof. sin-min lee department of computer science san jose state...

63
3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Post on 19-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

3NF and Boyce-Codd Normal Form

Prof. Sin-Min Lee

Department of Computer Science

San Jose State University

Page 2: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

What it’s all about• Given a relation, R, and a set of functional

dependencies, F, on R. • Assume that R is not in a desirable form for enforcing

F.

• Decompose relation R into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk, such that R1,..., Rk are in a more desirable form, 3NF or BCNF.

• While decomposing R, make sure to preserve the dependencies, and make sure not to lose information.

Page 3: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 4: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Primitive DomainsFLT-SCHEDULE

flt# weekday airline dtime from atime to

DL242 MO WE FR DELTA 10:40 ATL 12:30 BOSSK912 SA SU SAS 12:00 CPH 15:30 JFKAA242 MO FR AA 08:00 CHI 10:10 ATL

Attributes must be defined over domains with atomic values

FLT-SCHEDULE

flt# weekday airline dtime from atime to

DL242 MO DELTA 10:40 ATL 12:30 BOS

SK912 SA SAS 12:00 CPH 15:30 JFK

AA242 MO AA 08:00 CHI 10:10 ATL

DL242 WE DELTA 10:40 ATL 12:30 BOSDL242 FR DELTA 10:40 ATL 12:30 BOS

SK912 SU SAS 12:00 CPH 15:30 JFK

AA242 FR AA 08:00 CHI 10:10 ATL

Page 5: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Bad Database Design- redundancy of fact

FLIGHTS

flt# date airline plane#DL242 10/23/00 Delta k-yo-33297

DL242 10/24/00 Delta t-up-73356

DL242 10/25/00 Delta o-ge-98722

AA121 10/24/00 American p-rw-84663

AA121 10/25/00 American q-yg-98237

AA411 10/22/00 American h-fe-65748

• redundancy: airline name repeated for same flight• inconsistency: when airline name for a flight

changes, it must be changed many places

Page 6: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Bad Database Design- fact clutter

• insertion anomalies: how do we represent that SK912 is flown by Scandinavian without there being a date and a plane assigned?

• deletion anomalies: cancelling AA411 on 10/22/00 makes us lose that it is flown by American.

• update anomalies: if DL242 is flown by Sabena, we must change it everywhere.

FLIGHTS

flt# date airline plane#DL242 10/23/00 Delta k-yo-33297

DL242 10/24/00 Delta t-up-73356

DL242 10/25/00 Delta o-ge-98722

AA121 10/24/00 American p-rw-84663

AA121 10/25/00 American q-yg-98237

AA411 10/22/00 American h-fe-65748

Page 7: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Bad Database Design- information loss

FLIGHTS

flt# date airline plane#

DL242 10/23/00 Delta k-yo-33297

DL242 10/24/00 Delta t-up-73356

DL242 10/25/00 Delta o-ge-98722

AA121 10/24/00 American p-rw-84663

AA121 10/25/00 American q-yg-98237

AA411 10/22/00 American h-fe-65748FLIGHTS-AIRLINE

flt# airlineDL242 Delta

AA121 American

AA411 American

DATE-AIRLINE-PLANE

date airline plane#10/23/00 Delta k-yo-33297

10/24/00 Delta t-up-73356

10/25/00 Delta o-ge-98722

10/24/00 American p-rw-84663

10/25/00 American q-yg-98237

10/22/00 American h-fe-65748

Page 8: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Bad Database Design- information loss

FLIGHTS

flt# date airline plane#DL242 10/23/00 Delta k-yo-33297

DL242 10/24/00 Delta t-up-73356

DL242 10/25/00 Delta o-ge-98722

AA121 10/24/00 American p-rw-84663

AA121 10/25/00 American q-yg-98237

AA211 10/22/00 American h-fe-65748

AA411 10/24/00 American p-rw-84663

AA411 10/25/00 American q-yg-98237

AA411 10/22/00 American h-fe-65748

DATE-AIRLINE-PLANE

date airline plane#10/23/00 Delta k-yo-33297

10/24/00 Delta t-up-73356

10/25/00 Delta o-ge-98722

10/24/00 American p-rw-84663

10/25/00 American q-yg-98237

10/22/00 American h-fe-65748

FLIGHTS-AIRLINE

flt# airlineDL242 Delta

AA121 American

AA411 American

• information loss: we polluted the database with false facts; we can’t find the true facts.

Page 9: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Bad Database Design- dependency loss

DATE-AIRLINE-PLANE

date airline plane#10/23/00 Delta k-yo-33297

10/24/00 Delta t-up-73356

10/25/00 Delta o-ge-98722

10/24/00 American p-rw-84663

10/25/00 American q-yg-98237

10/22/00 American h-fe-65748

FLIGHTS-AIRLINE

flt# airlineDL242 Delta

AA121 American

AA411 American

• dependency loss: we lost the fact that (flt#, date) plane#

Page 10: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Good Database Design

• no redundancy of FACT (!)• no inconsistency• no insertion, deletion or update anomalies• no information loss• no dependency loss

FLIGHTS-DATE-PLANE

flt# date plane#DL242 10/23/00 k-yo-33297

DL242 10/24/00 t-up-73356

DL242 10/25/00 o-ge-98722

AA121 10/24/00 p-rw-84663

AA121 10/25/00 q-yg-98237

AA411 10/22/00 h-fe-65748

FLIGHTS-AIRLINE

flt# airlineDL242 Delta

AA121 American

AA411 American

Page 11: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Let X and Y be sets of attributes in R

• Y is functionally dependent on X in R iff for each x R.X there is precisely one y R.Y

• Y is fully functional dependent on X in R if Y is functional dependent on X and Y is not functional dependent on any proper subset of X

• We use keys to enforce functional dependencies in relations:

X Y

X Y

Functional Dependencies and Keys

Page 12: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

FLIGHTS

flt# date airline plane#

FLIGHTS

flt# date airline plane#

FLIGHTS

flt# date airline plane#

Functional Dependencies and Keys

plane# is not determined by flt# alone

airline is not determined by flt# and date

the FLIGHT relation will not allow the FDs to be enforced by keys

Page 13: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Functional Dependencies and Keys

real world database

name

address

cust# name address

cust# name address

Consider the meaning

cust# name address

cust# name address

cust# name address combined

separate

Page 14: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

How to Compute Meaning- Armstrong’s inference rules

Rules of the computation:– reflexivity: if YX, then XY– Augmentation: if XY, then WXWY– Transitivity: if XY and YZ, then XZ

Derived rules:– Union: if XY and XZ, the XYZ– Decomposition: if XYZ, then XY and XZ– Pseudotransitivity: if XY and WYZ, then XWZ

Armstrong’s Axioms:– sound– complete

Page 15: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Overview of NFsNF2

1NF

2NF

3NF

BCNF

Page 16: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Normal Forms- definitions

• NF: non-first normal form• 1NF: R is in 1NF. iff all domain values are

atomic2• 2NF: R is in 2. NF. iff R is in 1NF and every

nonkey attribute is fully dependent on the key• 3NF: R is in 3NF iff R is 2NF and every nonkey

attribute is non-transitively dependent on the key• BCNF: R is in BCNF iff every determinant is a

candidate key

• Determinant: an attribute on which some other attribute is fully functionally dependent.

Page 17: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example of Normalization

flt# date plane# airline from to milesFLT-INSTANCE

flt#date

plane#

airlinefrom

to

miles

Page 18: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

flt#date

plane#

airlinefrom

to

miles

flt#date

plane# flt#airlinefrom

to

miles

fromto

miles

flt#airlinefrom

toflt#

dateplane#

Example of Normalization1NF:

3NF & BCNF:

2NF:

Page 19: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

3NF that is not BCNF

A

B C

Candidate keys: {A,B} and {A,C} Determinants: {A,B} and {C}

A decomposition:

Lossless, but not dependency preserving!

A B CR

C BR1

A CR2

Page 20: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

• When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.

• 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys– i.e. composite candidate keys with at least one attribute in

common.

• BCNF is based on the concept of a determinant.– A determinant is any attribute (simple or composite) on which

some other attribute is fully functionally dependent.

• A relation is in BCNF is, and only if, every determinant is a candidate key.

Page 21: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

The theory• Consider the following relation and determinants.

Example 1. Given R(a,b,c,d)a,c -> b,da,d -> b

• To be in BCNF, all valid determinants must be a candidate key. In the relation R, a,c->b,d is the determinate used, so the first determinate is fine.

• Example 2. If {a, b} is not a key, a,d->b suggests that a,d can be the primary key, which would determine b. However this would not determine c. This is not a candidate key, and thus R is not in BCNF.

Page 22: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 1

Patient No

Patient Name Appointment Id Time Doctor

1 John 0 09:00 Zorro

2 Kerr 0 09:00 Killer

3 Adam 1 10:00 Zorro

4 Robert 0 13:00 Killer

5 Zane 1 14:00 Zorro

Page 23: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Two possible keys

• DB(Patno,PatName,appNo,time,doctor)• Determinants:

– Patno -> PatName– Patno,appNo -> Time,doctor– Time -> appNo

• Two options for 1NF primary key selection:– DB(Patno,PatName,appNo,time,doctor) (example 1a)– DB(Patno,PatName,appNo,time,doctor) (example 1b)

Page 24: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 1a

• DB(Patno,PatName,appNo,time,doctor)

• No repeating groups, so in 1NF

• 2NF – eliminate partial key dependencies:– DB(Patno,appNo,time,doctor)– R1(Patno,PatName)

• 3NF – no transient dependences so in 3NF

• Now try BCNF.

Page 25: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

BCNF Every determinant is a candidate key

DB(Patno,appNo,time,doctor)R1(Patno,PatName)

• Is determinant a candidate key?– Patno -> PatName

Patno is present in DB, but not PatName, so irrelevant.

Page 26: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Continued…

DB(Patno,appNo,time,doctor)R1(Patno,PatName)

– Patno,appNo -> Time,doctorAll LHS and RHS present so relevant. Is this a candidate key? Patno,appNo IS the key, so this is a candidate key.

– Time -> appNoTime is present, and so is appNo, so relevant. Is this a candidate key? If it was then we could rewrite DB as: DB(Patno,appNo,time,doctor)This will not work, so not BCNF.

Page 27: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Rewrite to BCNF

• DB(Patno,appNo,time,doctor)R1(Patno,PatName)

• BCNF: rewrite to DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo)

• time is enough to work out the appointment number of a patient. Now BCNF is satisfied, and the final relations shown are in BCNF

Page 28: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 1b

• DB(Patno,PatName,appNo,time,doctor) • No repeating groups, so in 1NF• 2NF – eliminate partial key dependencies:

– DB(Patno,time,doctor)– R1(Patno,PatName) – R2(time,appNo)

• 3NF – no transient dependences so in 3NF• Now try BCNF.

Page 29: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

BCNF Every determinant is a candidate key

DB(Patno,time,doctor)R1(Patno,PatName) R2(time,appNo)

• Is determinant a candidate key?– Patno -> PatName

Patno is present in DB, but not PatName, irrelevant.– Patno,appNo -> Time,doctor

Not all LHS present so not relevant– Time -> appNo

Time is present, but not appNo, so not relevant. – Relations are in BCNF.

Page 30: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Summary - Example 1

This example has demonstrated three things:

• BCNF is stronger than 3NF, relations that are in 3NF are not necessarily inBCNF

• BCNF is needed in certain situations to obtain full understanding of the data model

• there are several routes to take to arrive at the same set of relations in BCNF.– Unfortunately there are no rules as to which route will

be the easiest one to take.

Page 31: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 2

Grade_report(StudNo,StudName,(Major,Adviser,(CourseNo,Ctitle,InstrucName,InstructLocn,Grade)))

• Functional dependencies– StudNo -> StudName– CourseNo -> Ctitle,InstrucName– InstrucName -> InstrucLocn– StudNo,CourseNo,Major -> Grade– StudNo,Major -> Advisor– Advisor -> Major

Page 32: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 2 cont...

• UnnormalisedGrade_report(StudNo,StudName,(Major,Advisor, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade)))

• 1NF Remove repeating groups– Student(StudNo,StudName)– StudMajor(StudNo,Major,Advisor)– StudCourse(StudNo,Major,CourseNo,

Ctitle,InstrucName,InstructLocn,Grade)

Page 33: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 2 cont...

• 1NFStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade)

• 2NF Remove partial key dependenciesStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName,InstructLocn)

Page 34: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 2 cont...• 2NF

Student(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName,InstructLocn)

• 3NF Remove transitive dependenciesStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName)Instructor(InstructName,InstructLocn)

Page 35: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 2 cont...

• BCNF Every determinant is a candidate key– Student : only determinant is StudNo– StudCourse: only determinant is StudNo,Major– Course: only determinant is CourseNo– Instructor: only determinant is InstrucName– StudMajor: the determinants are

• StudNo,Major, or• AdvisorOnly StudNo,Major is a candidate key.

Page 36: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Example 2: BCNF

• BCNF

Student(StudNo,StudName)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName)Instructor(InstructName,InstructLocn)StudMajor(StudNo,Advisor)Adviser(Adviser,Major)

Page 37: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Problems BCNF overcomes

• If the record for student 456 is deleted we lose not only information on student 456 but also the fact that DARWIN advises in BIOLOGY

• we cannot record the fact that WATSON can advise on COMPUTING until we have a student majoring in COMPUTING to whom we can assign WATSON as an advisor.

STUDENT MAJOR ADVISOR

123 PHYSICS EINSTEIN

123 MUSIC MOZART

456 BIOLOGY DARWIN

789 PHYSICS BOHR

999 PHYSICS EINSTEIN

Page 38: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Split into two tables

In BCNF we have two tables

STUDENT ADVISOR

123 EINSTEIN

123 MOZART

456 DARWIN

789 BOHR

999 EINSTEIN

ADVISOR MAJOR

EINSTEIN PHYSICS

MOZART MUSIC

DARWIN BIOLOGY

BOHR PHYSICS

Page 39: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Returning to the ER Model

• Now that we have reached the end of the normalisation process, you must go back and compare the resulting relations with the original ER model

– You may need to alter it to take account of the changes that have occurred during the normalisation process Your ER diagram should always be a prefect reflection of the model you are going to implement in the database, so keep it up to date!

– The changes required depends on how good the ER model was at first!

Page 40: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Video Library Example

• A video library allows customers to borrow videos.• Assume that there is only 1 of each video.• We are told that:

video(title,director,serial)customer(name,addr,memberno)hire(memberno,serial,date)

title->director,serialserial->titleserial->directorname,addr -> membernomemberno -> name,addrserial,date -> memberno

Page 41: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

What NF is this?

• No repeating groups therefore at least 1NF

• 2NF – A Composite key exists:hire(memberno,serial,date)

– Can memberno be found with just serial or date?

– NO, therefore the relations are already in 2NF.

• 3NF?

Page 42: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Test for 3NF

• video(title,director,serial)– title->director,serial– serial->director

• Director can be derived using serial, and serial and director are both non keys, so therefore this is a transitive or non-key dependency.

• Rewrite video…

Page 43: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Rewrite for 3NF

• video(title,director,serial)– title->director,serial– serial->director

• Becomes:• video(title,serial)• serial(serial,director)

Page 44: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

Check BCNF

• Is every determinant a candidate key?• video(title,serial) - Determinants are:

– title->director,serial Candidate key– serial->title Candidate key – video in BCNF

• serial(serial,director) Determinants are: – serial->director Candidate key– serial in BCNF

Page 45: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

• customer(name,addr,memberno) Determinants are: – name,addr -> memberno Candidate key– memberno -> name,addr Candidate key – customer in BCNF

• hire(memberno,serial,date) Determinants are:– serial,date -> memberno Candidate key– hire in BCNF

• Therefore the relations are also now in BCNF.

Page 46: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University

R( A B C D) 1 2 3 4 1 2 4 3 1 3 4 1 2 3 2 4 1 2 3 5

A B C D

*

*

*

Q1. For which keys R is 2NF? key {AD} R is 2NF key {BD} R is 2NF key {CD} R is not 2NF

Q2. For which keys R is 3NF? Since prime-attributes are {A, B, C, D} R with key {AD} is 3NF R with key {BD} is 3NF

Page 47: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 48: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 49: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 50: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 51: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 52: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 53: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 54: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 55: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 56: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 57: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 58: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 59: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 60: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 61: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 62: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
Page 63: 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University