department of computer science and engineering, hkust slide 1 7. relational database design

28
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design

Post on 22-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Department of Computer Science and Engineering, HKUST Slide 1

7. Relational Database Design

Department of Computer Science and Engineering, HKUST Slide 2

Pitfalls in Relational Database DesignPitfalls in Relational Database Design

• Relational database design requires that we find a “good” collection of relation schemas. A bad design may lead to– Repetition of information.– Inability to represent certain information without

resorting to the use of lots of NULL values

• Design Goals:– Avoid redundant data– Ensure that relationships among attributes are

represented – Facilitate the checking of updates for violation of

database integrity constraints

Department of Computer Science and Engineering, HKUST Slide 3

ExampleExample

• Null values– cannot store information about a branch if no loans

exist– Can use null values, but they are difficult to handle

• Redundancy:– Data for branch-name, branch-city, assets are

repeated for each loan that a branch makes– Waste space and complicates updating

• Consider the relation schema:Lending-schema

branch-name, branch-city, assets, customer-name, loan-number, amount

Department of Computer Science and Engineering, HKUST Slide 4

DecompositionDecomposition

• Decompose the relation schema Lending-schema into:

Branch-customer-schemabranch-name, branch-city, assets, customer-name

Customer-loan-schemacustomer-name, loan-number, amount

• All attributes of an original schema (R) must appear in the decomposition (R1,R2):

R = R1 R2

• Lossless-join decomposition:For all possible relations r on schema R

r = R1 (r ) R2 (r )

Department of Computer Science and Engineering, HKUST Slide 5

Example of a Lossy-Join DecompositionExample of a Lossy-Join Decomposition

• Decompose R = (A,B,C) into R1 = (A,B) and R2 = (B,C)

,B(r) B,C(r)It is a lossy decomposition:An extraneous tuple is obtained.You get more, not less!!

,B(r) B,C(r)A B 1 2 1

B C1 m2 n1 p

r A B C 1 m 2 n 1 p

A B C 1 m

2 n

1 p

1 p

1 m

Department of Computer Science and Engineering, HKUST Slide 6

An Example of Lossy-Join DecompositionAn Example of Lossy-Join Decomposition

STUDENT_COURSESID Name Cnum Sem1 Jim comp231 fall951 Jim comp111 spring942 Alice comp111 spring942 Alice comp211 fall95

COURSECnum Semcomp231 fall95comp111 spring94comp111 spring94comp211 fall95

STUDENTSID Name Sem1 Jim fall951 Jim spring942 Alice spring942 Alice fall95

STUDENT_COURSE_SPURIOUSSID Name Cnum_ Sem1 Jim comp231 fall951 Jim comp211 fall951 Jim comp111 spring942 Alice comp231 fall95

2 Alice comp211 fall95

2 Alice comp111 spring94

It is clearly a bad decompositionsince Sem is not a foreign key of any table. How would you decompose it???

Department of Computer Science and Engineering, HKUST Slide 7

Goal - Devise a Theory for the Following:Goal - Devise a Theory for the Following:

• Decide whether a particular relation R is in “good” form.

• In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, …,Rn} such that– each relation is in good form– the decomposition is a lossless-join decomposition

• Our theory is based on:– functional dependencies– multivalued dependencies

Department of Computer Science and Engineering, HKUST Slide 8

Why are FDs involved?Why are FDs involved?

• We can’t tell if a relation scheme is good or not without first knowing the functional dependencies.

Lending-schema

(branch-name, branch-city, assets, customer-name, loan-number, amount)

How do you know this scheme is not good?

Because you know the functional dependencies. Try to name a few of them.

Department of Computer Science and Engineering, HKUST Slide 9

Normalization using Functional DependenciesNormalization using Functional Dependencies

When we decompose a relation schema R with a set of functional dependencies F into R1 and R2 we want:

• Lossless-join decomposition: At least one of the following dependencies is in F+:– R1 R2 R1

– R1 R2 R2

• No redundancy: The relations R1 and R2 preferably should be in either Boyce-Codd Normal Form or Third Normal Form.

• Dependency preservation: Let Fi be the set of dependencies in F+ that include only attributes in Ri. Test to see if:– (F1 F2 )+ = F+

otherwise, checking updates for violation of functional dependencies is expensive.

The attributes with which you can joinR1 and R2 is either a key of R1 or R2

Department of Computer Science and Engineering, HKUST Slide 10

• R = (A, B, C) F= {A B, B C}

• R1 = (A, B), R2 = (B, C)– Lossless-join decomposition:

R1 R2 = {B} and B R2(BC)

– Dependency preserving:F1 = A B; F2 = B C (F1 F2 )+ = F+

• R1 = (A, B), R2 = (A, C)– Lossless-join decomposition:

R1 R2 = {A} and A R1(AB)

– Not dependency preserving: F1 = A B; F2 = B C and A C are lost (cannot check B C without computing R1 R2)

ExampleExample

Department of Computer Science and Engineering, HKUST Slide 11

A relation schema R is in BCNF with respect to a set of F of functional dependencies if for all functional dependencies in F+ of the form , where R and R, at least one of the following holds:

is trivial (i.e., )

is a superkey for R

Boyce-Codd Normal FormBoyce-Codd Normal Form

i.e. contains a candidate key of R

Department of Computer Science and Engineering, HKUST Slide 12

• R = (A, B, C)F = {A B

B C}Key = {A}

• R is not in BCNF. Why?

• Decomposition R1 = (A,B), R2 = (B,C)– R1 and R2 in BCNF

– Lossless-join decomposition– Dependency preserving

B C where B is not a superkey

ExampleExample

Department of Computer Science and Engineering, HKUST Slide 13

I.e, violates

the BCNF definition

BCNF Decomposition AlgorithmBCNF Decomposition Algorithm

Each Ri is in BCNF, and decomposition is lossless-join.

I.e. isnot trivial

result := {R};done:= false;compute F+;while (not done) do if (there is a schema Ri in result that is not in BCNF)

then beginlet be a nontrivial functional dependency that holds on Ri

such that Ri is not in F+, and = ;

result := (result - Ri) (Ri - ) (, ) endelse done:=true;

Remove from the originalscheme and include a newscheme R’()

Department of Computer Science and Engineering, HKUST Slide 14

Example of BCNF Decomposition

• R = (branch-name, branch-city, assets,customer-name, loan-number, amount)

F = { branch-name assets, branch-cityloan-number amount, branch-name}

Key = { loan-number, customer-name }

• Decomposition– First FD violates BCNF

• R1 = (branch-name, branch-city, assets)• R2 = (branch-name, customer-name, loan-number, amount)

– Second FD violates BCNF in R2 • R3 = (branch-name, loan-number, amount)• R4 = (customer-name, loan-number)

• Final decomposition R1, R3, R4

Department of Computer Science and Engineering, HKUST Slide 15

BCNF and Dependency PreservationBCNF and Dependency Preservation

It is not always possible to get a BCNF decomposition that is dependency preserving.

• R = (J, K, L)F = {JK L

L K}Two candidate keys = JK and JL

• R is not in BCNF; decompose into R1(J, L) and R2(L,K)

• Any decomposition of R will fail to preserve JK L

So, sometimes we need to step back to a weaker requirement

Department of Computer Science and Engineering, HKUST Slide 16

This additional alternatives makes3NF weaker than BCNF

• A relation schema R is in third normal form (3NF) if for all: in F+

at least one of the following holds: is trivial (i.e., ) is superkey for R– Each attribute A in is contained in a candidate key of

R.

• If a relation is in BCNF it is in 3NF (since in BCNF one of the first two conditions above must hold).

Third Normal FormThird Normal Form

Department of Computer Science and Engineering, HKUST Slide 17

Third Normal FormThird Normal Form

• Same example as in BCNF– R = (J, K, L)

F = {JK L, L K}– Two candidate keys: JK and JL– R is in 3NF

JK L JK is a superkeyL K K is contained in a candidate key

• Algorithm to decompose a relation schema R into a set of relation schemas {R1, R2,…, Rn} such that:– each relation schema Ri is in 3NF– lossless-join decomposition – dependency preserving

Department of Computer Science and Engineering, HKUST Slide 18

3NF Decomposition Algorithm3NF Decomposition Algorithm

• Let Fc be a canonical cover for F;i := 0;for each functional dependency in Fc do

if none of the schemas Rj, 1<= j <= i contains then begin

i:=i+1;Rj:= ;

endif none of the schemas Rj, 1<= j <= i contains a candidate key for R

then begini:=i+1;Ri:= any candidate key for R;

endreturn (R1, R2, …, Ri)

Department of Computer Science and Engineering, HKUST Slide 19

• Relation schema:Banker-info-schemabranch-name, customer-name, banker-name, office-number

• The functional dependencies for this relation schema are:

banker-name branch-name, office-number

customer-name, branch-name banker-name

• The key is:{customer-name, branch-name}

Example

Department of Computer Science and Engineering, HKUST Slide 20

Applying 3NF to banker - info - schemaApplying 3NF to banker - info - schema

• Go through the for loop in the algorithm:

banker-name branch-name, office-numberis not in any decomposed relation (no decomposed relation so far) Create a new relation:Banker-office-schema ( banker-name, branch-name, office-number )

customer-name, branch-name banker-nameis not in any decomposed relation (one decomposed relation so far) Create a new relation:Banker-schema ( customer-name, branch-name, banker-name )

• Since Banker-schema contains a candidate key for Banker-info-schema, we are done with the decomposition process.

Department of Computer Science and Engineering, HKUST Slide 21

Comparison of BCNF and 3NFComparison of BCNF and 3NF

• It is always possible to decompose a relation into relations in 3NF and – the decomposition is lossless– dependencies are preserved

• It is always possible to decompose a relation into relations in BCNF and – the decomposition is lossless– it may not be possible to preserve dependencies

Department of Computer Science and Engineering, HKUST Slide 22

Comparison of BCNF and 3NFComparison of BCNF and 3NF

Prof-office ( Department, Prof-name , Room)

F = {Department, Prof-name Room Room Prof-name }

Dept Prof-name Room

CS D. Lee 3534

EE D. Lee 3534

BIOL D. Lee 1234

null K. Chan 1231

• Key = Department, Prof-name• The relation is in 3NF but not in BCNF

– repetition of Prof/room information– need to use null values if a Prof has a

room but no department assigned

A professor may be affiliated with more than one department and assigned more than one office.

Prof-name -- RoomProf-name -- Dept

Nearby departments may give him the same office, whereas department far away may give him a different one.

Room -- Dept

Office won’t be sharedRoom Prof-name

Each department will assign only one office to a professorDepartment, Prof-name Room

Department of Computer Science and Engineering, HKUST Slide 23

Problems with DecompositionProblems with Decomposition

• Algorithms need to identify all candidate keys and the canonical cover; it is a rather difficult process

• Decomposition algorithms are not deterministic. E.g., if there are several functional dependencies violating the normal form, the order of selecting the problem FD for decomposition may give different relation schemes

• Algorithms may result in relation schemes which are not intuitive

Department of Computer Science and Engineering, HKUST Slide 24

Relational Database Design ReviewRelational Database Design Review

• Two approaches to DB design:1) Design ER model, then translate to relation schemes2) Put every attribute together in one relation, identify all the funct

ional dependencies, and then decompose into 3NF at least.• The first approach is more popular, but relational t

heory helps formalizing some concepts such as key (what does it mean by “A key uniquely identifies the tuples?”)

• Identifying the FDs is part of the DB design process; it helps you understand the requirements better.

Department of Computer Science and Engineering, HKUST Slide 25

An Example of Bad Relation SchemeAn Example of Bad Relation Scheme

• Project ( Emp-no, Proj-no, Emp-name, Hours )• From ER point of view, it is bad since it embodies a

n entity type and an N:M relationship type– But your (ignorant) manager may ask: “Why can’t I?”

• From a relational theory point of view, you know:– Emp-no Emp-name– Emp-no, Proj-no Hours Emp-no, Proj-no is the only key in Project

• Project is not in 3NF, why? Decompose:– Employee ( Emp-no, Emp-name )– Works-on ( Emp-no, Proj-no, Hours )

Department of Computer Science and Engineering, HKUST Slide 26

A Difficult Example for ER ApproachA Difficult Example for ER Approach

• Cars_all ( Make, Engine-size, Origin, Fee )• Difficult to see how many entities or relationships

are there• From a relational theory point of view, you know:

– Make, Engine-size Origin– Engine-size Fee Make, Engine-size is the only key in Cars_all

• Cars_all is not in 3NF, why? Decompose:– Cars ( Make, Engine-size, Origin )– License ( Engine-size, Fee )

Toyota Camry, 2.2, Japan, 5600

Department of Computer Science and Engineering, HKUST Slide 27

Another Difficult ExampleAnother Difficult Example

• Cars2 ( Make, Engine-size, Plant)• From a relational theory point of view, you know:

– Make, Engine-size Plant– Plant Make

(A plant makes the same engine of a given model) Make, Engine-size is the only key in Cars2

• Cars2 is in 3NF but not in BCNF, why? Decompose:– Car_plant ( Make, Plant )– Car_engine( Make, Engine-size)

Question: but what have we lost in BCNF?

Department of Computer Science and Engineering, HKUST Slide 28

First Normal FormFirst Normal Form

• A relation R is in First Normal Form if every value in R is atomic

• Atomicity is actually a property of how the elements of the domain are used.– E.g. Strings would normally be considered indivisible – Suppose that students are given enrollment numbers

which are strings of the form CS0012 or EE1127– If the first two characters are extracted to find the

department, the domain of enrollment numbers is not atomic.

– Doing so is a bad idea: leads to encoding of information in application program rather than in the database.