ch 4: relational database design -...

Ch 4: Relational Database

Design

4.1 Features of Good Relational

designs

Four Informal measures

•Semantics of the relation attributes

•Reducing the redundant values in tuples.

•Reducing the null values

•Disallowing the possibility of generating spurious(wrong) tuples

1. Semantics of the relation attributes

• Design a reln schema so that it is easy to explain its

meaning .Do not combine attributes from multiple entity

types & relationship into single relation.

• In general ,the easier it is to explain the semantics of

relation, the better the relation schema design will be.

3

2. Redundant Information in Tuples & Update Anomalies

• Goal of dbase is to reduce storage space used by

relationship.

• Grouping attributes into relation schemas has a

significant effect on storage space.

• For e.g. if we combine Employee with department &

project , works_on will result into EMP_DEPT &

EMP_PRJ.

• Resultant relation shows repetition of several values

leading to higher storage.

• Other serious problems is of Update anomalies which is

classified as insert, delete & modification anomalies.

5

Insertion Anomalies

• Insert a new emp , we must include value for dept or need to

place NULL (if emp doesnot work for dept yet.) & need to enter

correctly so consistency problem donot occur.

• It is difficult to enter new dept that has no employee as we

cannot insert Null in ENO as its is primary key.

• Consider the relation:

– EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

• Insert Anomaly:

– Cannot insert a project unless an employee is assigned to it.

• Conversely

– Cannot insert an employee unless a he/she is assigned to a

project.

6

Delation Anomalies

• If we delete from EMP_DEPT an employee that happens

to represent the last employee in that dept , the info abt

dept is also lost from dbase.

Consider the relation:

EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours)

Delete Anomaly:

When a project is deleted, it will result in deleting all the

employees who work on that project.

Alternately, if an employee is the sole employee on a

project, deleting that employee would result in deleting

the corresponding project.

9

Modification Anomalies

• If we change the value of one of attributes say that of

manager of dept 5 , we need to change in each tuple

where dept no is 5, else it will lead to inconsistent.

• Design Dbase so that no insertion, deletion &

modification anomalies are present . 11

3. Null values in tuples

• NULL have multiple interpretations such as:

• Attributes does not apply to this tuple.

• Attributes value for this tuple is unknown.

• Value is known but absent, i.e. it has not been

recorded yet.

• Problem may occur in JOIN , n aggregrate operations.

14

4. Generation of Spurious Tuples

• Design relation schemas so that they can be joined with

equality condition on attributes that are primary or

foreign key.

• E.g. cross join leads to spurious tuples

15

4.2 Functional Dependencies (FD)

• A functional dependency, denoted by X Y (Read

X functionally determines Y), between two sets of

attributes X and Y that are subsets of R specifies a

constraint on possible tuples that can be form a

relation state r of R.

• The constraint is that for for all pairs of tuples t1

and t2 in r such that

t1 [X ] = t2 [X ] , they must also have,

t1 [Y] = t2 [Y].

In other words

Whenever two tuples of r agree on their X value, they

also agree on their Y value.

18

Functional Dependency

• Main concept associated with normalization.

• Functional Dependency

– Describes relationship between attributes in a

relation.

– If A and B are attributes of relation R, B is

functionally dependent on A (denoted A B), if

each value of A in R is associated with exactly one

value of B in R.

Example: Consider r(A,B ) with the following instance of r.

On this instance, A B does NOT hold, but B A does hold.

1 4

1 5

3 7

19

Functional Dependency

• Property of the meaning (or semantics) of the

attributes in a relation.

• Diagrammatic representation:

Determinant of a functional dependency refers

to attribute or group of attributes on left-hand

side of the arrow.

20

Example - Functional Dependency

21

Functional Dependencies (Cont.)

• K is a superkey for relation schema R if and only if K R

• K is a candidate key for R if and only if

– K R, and

– for no K, R

• Functional dependencies allow us to express constraints that

cannot be expressed using superkeys. Consider the schema:

bor_loan = (customer_id, loan_number, amount ).

We expect this functional dependency to hold:

loan_number amount

but would not expect the following to hold:

amount customer_name

22


• Main use of FD is to describe further relation

schema R by specifying constraints on its

attributes that must hold all times.

• Certain FD can be specified without referring to

specific relation.

• {state, driving_licence} ENO

• {pincode) area

• {telephone code}city

23


ENOEname

Pnumber{pname,plocation}

{eno,pnumber}hours

• Eno uniquely determines emp name

• Pnumber uniquely determines project name & location

• Combination of eno , pnumber uniquely determines

numbers of hours that employee had worked on that

project.

24

EXAMPLE :: TEACH

TEACHER COURSE TEXT

GUPTA CHEMISTRY SAHANI

GUPTA MATHS NAVATHE

KUMAR BIOLOGY HOFFMAN

GOYAL CHEMISTRY KAHATE

Possible FD’s

TEXTCOURSE hold

But,

TEACHERCOURSE is ruled out.

25

Trivial & Non trivial FD

• Trivial FD

– A FD XY is trivial if Y , right hand side of FD is subset of X.

• Non Trivial FD

– A FD XY is non trivial if Y , right hand side of FD is not subset

of X.

• F= ENO{ ENAME,DOB,ADDRESS,DNUM}

DNUMBER{DNAME,MGRNO}

ENOENO trivial

DUMBERDNAME trivial

ENO Ename trivial

DUMBER mgrno trivial

ENO DNAME,MGRNO NON Trivial

26

• In practice, trivial dependencies are not really

interesting, we are interested in practice non

trivial FD only. And goal should be to reduce

trivial FD from given set of FD.

• FIND FD

27

A B C

a1 b1 c1

a1 b1 c2

a2 b1 c1

a2 b1 c3

Closure (of set of dependencies)

• The set of all FDs that include F as well as all

dependencies that are implied by a given set F of

FDs is called the closure of F, denoted by F+.

• F= ENO{ ENAME,DOB,ADDRESS,DNUM}

DNUMBER{DNAME,MGRNO}

SOME ADDITIONAL FD’S ARE

ENO DNAME,MGRNO

ENOENO

DUMBERDNAME

28

Inference rules for FD Or AXIOMS

1. Reflexive: if B is a subset of A, then A B.

2. Augmentation: if A B then AC BC

3. Transitivity: it A B and B C then A C.

4. Self – determination: A A.

5. Decomposition: If A BC, then AB, AC.

6. Union: it A B and A C, then A BC

7. Composition: if A B, C D then AC BD.

8. Pseudo transitive : if A B and rBC then

Ar C.

• To find closure need to apply this rules

repeatedly until we stop producing new FD

29

ARMSTRONG AXIOMS

• First three axioms

1. Reflexive: if B is a subset of A, then A B.

2. Augmentation: if A B then AC BC

3. Transitivity: it A B and B C then A C.

are sound & complete

By sound, we mean that given a set of FD on relation R,

any dependency that can infer from F holds in

every reln satisfies the dependencies. They do not

generate incorrect FD.

By complete, we mean that using 3 FD repeatedly to a

complete set of all possible dependencies that can

be inferred from F.

30

Example: find closure of F

• R = (A, B, C, G, H, I)

F = { A B

A C

CG H

CG I

B H}

• some extra members of F+

– A H

• by transitivity from A B and B H

– AG I

• by augmenting A C with G, to get AG CG

and then transitivity with CG I

– CG HI

• by union CG I & CG H

• OR

• by augmenting CG I to infer CG CGI,

and augmenting of CG H to infer CGI HI,

and then transitivity 31

Example: find closure of F

• R = (A, B, C, D,E,F)

F = { A BC

B E

CD EF

}

• some extra members of F+

– A BC GIVEN

– A C DECOMPOSITION

– AD CD AUGMENTATION

– CD EF GIVEN

– AD EF TRANSITIVITY

– AD F DECOMPOSITION

32

Example of Attribute Set Closure

• R = (A, B, C, G, H, I)

• F = {A B A C CG H CG I B H}

• (AG)+

1. result = AG

2. result = ABCG (A C and A B)

3. result = ABCGH (CG H)

4. result = ABCGHI (CG I)

33

result := ;

while (changes to result) do

for each in F do

begin

if result then result :=result

end

Example of Attribute Set Closure

• R = (A, B, C, D ,E ,F)

• F = {A BC E CF B E

CD EF}

COMPUTE (AB)+

1. result = AB

2. result = ABC (A BC)

3. result = ABC (E CF ) LHS RESULT

4. result = ABCE (B E)

5. result = ABCE (CD EF) LHS RESULT

6. result = ABCEF (E CF ) AGAIN LOOP

(AB)+ = {A,B,C,E,F}

34

result := ;

while (changes to result) do

for each in F do

begin

if result then result :=result

end

Example

• F={A,B,C,D,E,F}

FD={ ABC , BE, CDEF}

PROVE THAT ADF

** AB AC FROM ABC

CDE CDF FROM CDEF

AB , BE == AE

AE CDF

A BC

ADBCD AUGMENTATION ADDING C

ADEEF FROM CDEF & BE

ADEF ADE

ADF 35

EXAMPLE

• F={A,B,C,D}

FD={ AB , AC, BCD}

PROVE THAT AD

• R={A,B,C,D,E,F}

FD={ AB , AC, CDE, CDF, BE}

FIND CLOSURE OF FD

• R={A,B,C,D,E}

FD={ ABC , BD, EA, CDE}

FIND CLOSURE OF FD

• R={A,B,C,D,E,F}

FD={ ABC , ECF, BE, CDEF}

FIND CLOSURE OF {A,B}+

36

Canonical Cover

• Sets of functional dependencies may have redundant

dependencies that can be inferred from the others

– For example: A C is redundant in: {A B, B C}

– Parts of a functional dependency may be redundant

• E.g.: on RHS: {A B, B C, A CD} can be

simplified to

{A B, B C, A D}

• E.g.: on LHS: {A B, B C, AC D} can be

simplified to

{A B, B C, A D}

• Intuitively, a canonical cover of F is a ―minimal‖ set of

functional dependencies equivalent to F, having no

redundant dependencies or redundant parts of

dependencies

37

Extraneous Attributes

• Consider a set F of functional dependencies and the functional

dependency in F.

– Attribute A is extraneous in if A

and F logically implies (F – { }) {( – A) }.

– Attribute A is extraneous in if A

and the set of functional dependencies

(F – { }) { ( – A)} logically implies F.

• Note: implication in the opposite direction is trivial in each of the

cases above, since a ―stronger‖ functional dependency always

implies a weaker one

• Example: Given F = {A C, AB C }

– B is extraneous in AB C because {A C, AB C} logically

implies A C (I.e. the result of dropping B from AB C).

• Example: Given F = {A C, AB CD}

– C is extraneous in AB CD since AB C can be inferred even

after deleting C

38

Testing if an Attribute is Extraneous

• Consider a set F of functional dependencies and the functional

dependency in F.

• To test if attribute A is extraneous in

1. compute ({} – A)+ using the dependencies in F

2. check that ({} – A)+ contains ; if it does, A is extraneous in

• To test if attribute A is extraneous in

1. compute + using only the dependencies in

F’ = (F – { }) { ( – A)},

2. check that + contains A; if it does, A is extraneous in

39

Computing a Canonical Cover

• R = (A, B, C)

F = {A BC

B C

A B

AB C}

• Combine A BC and A B into A BC

– Set is now {A BC, B C, AB C}

• A is extraneous in AB C

– Check if the result of deleting A from AB C is implied by the other

dependencies

• Yes: in fact, B C is already present!

– Set is now {A BC, B C}

• C is extraneous in A BC

– Check if A C is logically implied by A B and the other

dependencies

• Yes: using transitivity on A B and B C.

– Can use attribute closure of A in more complex cases

• The canonical cover is: A B

B C 40

e.g.

• Now we define a set of FD to be irreducible as minimal; if and only if it satisfies the following two properties.

(1) The right hand side of every FD in S involve just one attribute (i.e., it is a singleton set)

(2) The left hand side of every FD in S is irreducible in turn meaning that no attribute can be discarded from the determinant without changing the CLOSURE S+.

41

Minimal Sets of FDs

• A set of FDs is minimal if it satisfies the following conditions:

1. Every dependency in F has a single attribute for its RHS.

2. We cannot remove any dependency from F and have a set of dependencies that is equivalent to F.

3. We cannot replace any dependency X -> A in F with a dependency Y -> A, where Y proper-subset-of X ( Y subset-of X) and still have a set of dependencies that is equivalent to F.

42

Example

• A BC,

• B C

• A B

• AB C

• AC D

Compute an irreducible set of FD that is equivalent

to this given set.

Bring answer in reducible

form

43

Solution

(1) The step is to rewrite the FD such that each has a singleton right hand side.

• A B

• A C

• B C

• A B

• AB C

• AC D

We observe that the FD A B occurs twice. So one occurrence will be eliminated.

A BC,

B C

A B

AB C

AC D

44

Solution

2. Next, attributed C can be eliminated from the

left hand side of the FD AC D

• Because we have A C,

• By augmentation AA AC

• A AC

(Augmentation: if X Y then XZ YZ)

• And we are given AC D,

• A AC AC D

• So A D by transitivity;

Thus C on the left hand side is redundant.

A C B C A B

AB C AC D

45

Solution

3. Next, we observe that the FD AB C can be eliminated, because again we have

A C

By augmentation AB CB

By decomposition AB C AB B

4. Finally, the FD A C is implied by the FD A B and B C, by transitivity so it can be eliminated.

Now we have A B

B C

A D

This set is irreducible.

A C B C A B

AB C A D

46

Computing the Minimal Sets of FDs

Let the given set of FDs be

E : {B → A, D → A, AB → D}.

We have to find the minimum cover of E.

47

E : {B → A, D → A, AB → D }

■ All above dependencies are in canonical form; (RHS has single value)so we have completed step 1 and can proceed to step 2.

In step 2 we need to determine

if AB → D has any redundant attribute on the left-hand side; that is, can it be

replaced by B → D or A → D?

■ Since B → A, by augmenting with B on both sides (IR2), we have BB → AB, or

B → AB (i). However, AB → D as given (ii).

■ Hence by the transitive rule (IR3), we get from (i) and (ii), B → D. Hence

AB → D may be replaced by B → D.

48

Old E : {B → A, D → A, AB → D }

NEW:

We now have a set equivalent to original E , say E′ : {B → A, D → A, B → D}.

No further reduction is possible in step 2 since all FDs have a single attribute

on the left-hand side.

■ In step 3 we look for a redundant FD in E′. By using the transitive rule on

B → D and D → A, we derive B → A. Hence B → A is redundant in E’ and can

be eliminated.

■ Hence the minimum cover of E is

{B → D, D → A}.

49

Normalization of Relations

• Normalization:

– The process of decomposing unsatisfactory "bad"

relations by breaking up their attributes into smaller

relations

• Normal form:

– Condition using keys and FDs of a relation to certify

whether a relation schema is in a particular normal form

50

Normalization of Relations

• 2NF, 3NF, BCNF

– based on keys and FDs of a relation schema

• 4NF

– based on keys, multi-valued dependencies : MVDs; 5NF

based on keys, join dependencies : JDs

• Additional properties may be needed to ensure a

good relational design (lossless join, dependency

preservation)

Practical Use of Normal Forms

• Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties

• The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

• The database designers need not normalize to the highest possible normal form – (usually up to 3NF, BCNF or 4NF)

• Denormalization: – The process of storing the join of higher normal

form relations as a base relation—which is in a lower normal form

52

Definitions of Keys and Attributes

Participating in Keys (1)

• A superkey of a relation schema R = {A1, A2, ....,

An} is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

• A key K is a superkey with the additional property

that removal of any attribute from K will cause K

not to be a superkey any more.

Definitions of Keys and Attributes

Participating in Keys (2)

• If a relation schema has more than one key, each

is called a candidate key.

– One of the candidate keys is arbitrarily designated to be

the primary key, and the others are called secondary

keys.

• A Prime attribute must be a member of some

candidate key

• A Nonprime attribute is not a prime attribute—

that is, it is not a member of any candidate key.

First Normal Form

• Disallows

– composite attributes

– multivalued attributes

– nested relations; attributes whose values for an

individual tuple are non-atomic

• Considered to be part of the definition of relation

Normalization into 1NF

57

Normalization of nested relations into 1NF

3.3 Second Normal Form (1)

• Uses the concepts of FDs, primary key

• Definitions

– Prime attribute: An attribute that is member of the primary key K

– Full functional dependency: a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more

• Examples:

– {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold

– {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds

58

Second Normal Form (2)

• A relation schema R is in second normal form

(2NF) if every non-prime attribute A in R is fully

functionally dependent on the primary key

• R can be decomposed into 2NF relations via the

process of 2NF normalization

Normalizing into 2NF

4 General Normal Form Definitions (For

Multiple Keys) (1)

• The above definitions consider the primary key

only

• The following more general definitions take into

account relations with multiple candidate keys

• A relation schema R is in second normal form

(2NF) if every non-prime attribute A in R is fully

functionally dependent on every key of R

General Normal Form Definitions (3NF)

• Definition:

– Superkey of relation schema R - a set of attributes S of

R that contains a key of R

– A relation schema R is in third normal form (3NF) if

whenever a FD X -> A holds in R, then either:

• (a) X is a superkey of R, or

• (b) A is a prime attribute of R

i.e. either LHS or RHS should have some key

attributes.

• NOTE: Boyce-Codd normal form disallows

condition (b) above

3.4 Third Normal Form (1)

• Definition: – Transitive functional dependency: a FD X -> Z that can

be derived from two FDs X -> Y and Y -> Z

• Examples: – SSN -> DMGRSSN is a transitive FD

• Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold

– SSN -> ENAME is non-transitive

• Since there is no set of attributes X where SSN -> X and X -> ENAME

Third Normal Form (2)

• A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

• R can be decomposed into 3NF relations via the process of 3NF normalization

• NOTE:

– In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key.

– When Y is a candidate key, there is no problem with the transitive dependency .

– E.g., Consider EMP (SSN, Emp#, Salary ).

• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

Normal Forms Defined Informally

• 1st normal form

– All attributes depend on the key

• 2nd normal form

– All attributes depend on the whole key

• 3rd normal form

– All attributes depend on nothing but the key

Successive Normalization of LOTS into 2NF and 3NF

SUMMARY OF NORMAL FORMS

based on Primary Keys

5 BCNF (Boyce-Codd Normal Form)

• A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -> A holds in R,

then X is a superkey of R

• Each normal form is strictly stronger than the

previous one

– Every 2NF relation is in 1NF

– Every 3NF relation is in 2NF

– Every BCNF relation is in 3NF

• There exist relations that are in 3NF but not in

BCNF

• The goal is to have each relation in BCNF (or 3NF)

Boyce-Codd Normal Form

Area is not

superkey

A relation TEACH that is in 3NF but not in

BCNF

Achieving the BCNF by Decomposition

(1)

• Two FDs exist in the relation TEACH:

– fd1: { student, course} -> instructor

– fd2: instructor -> course

• {student, course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 10.12 (b).

– So this relation is in 3NF but not in BCNF

• A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations.

Achieving the BCNF by Decomposition

(2)

• Three possible decompositions for relation TEACH

– {student, instructor} and {student, course}

– {course, instructor } and {course, student}

– {instructor, course } and {instructor, student}

• All three decompositions will lose fd1.

– We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-additivity property after decomposition.

• Out of the above three, only the 3rd decomposition will not generate spurious tuples after join.(and hence has the non-additivity property).

74

3. Multivalued Dependencies and Fourth

Normal Form (1)

(a) The EMP relation with two MVDs: ENAME —>>

PNAME and ENAME —>> DNAME.

(b) Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS.

75

3. Multivalued Dependencies and Fourth

Normal Form (1)

(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if

it has the JD(R1, R2, R3). (d) Decomposing the relation SUPPLY

into the 5NF relations R1, R2, and R3.

Fourth Normal Fourth:

definition

• Fourth normal form (or 4NF) requires that

there are no non-trivial multi-valued

dependencies of attribute sets on

something other than a superset of a

candidate key. A table is said to be in 4NF

if and only if it is in the BCNF and multi-

valued dependencies are functional

dependencies. The 4NF removes

unwanted data structures ie.. multi-valued

dependencies

77

Multivalued Dependencies and Fourth

Normal Form (4)

Definition:

• A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X —>> Y in F+, X is a superkey for R.

– Note: F+ is the (complete) set of all dependencies (functional or multivalued) that will hold in every relation state r of R that satisfies F. It is also called the closure of F.

78

Multivalued Dependencies and Fourth Normal

Form (5)

Decomposing a relation state of EMP that is not in 4NF:

(a) EMP relation with additional tuples.

(b) Two corresponding 4NF relations EMP_PROJECTS and

EMP_DEPENDENTS.

Fourth Normal Form (4NF)

• MVD between attributes A, B, and C in a relation

using the following notation:

A B

A C

Example

• The table has no non-key attributes because its

only key is {Restaurant, Pizza Variety, Delivery

Area}. Therefore it meets all normal forms up to

BCNF. If we assume, however, that pizza varieties

offered by a restaurant are not affected by

delivery area, then it does not meet 4NF. The

problem is that the table features two non-trivial

multivalued dependencies on the {Restaurant}

attribute (which is not a superkey). The

dependencies are:

• {Restaurant} →→ {Pizza Variety}

• {Restaurant} →→ {Delivery Area}

Data is given

Restaurant Pizza Variety Delivery Area

A1 Pizza Thick Crust Springfield

A1 Pizza Thick Crust Shelbyville

A1 Pizza Thick Crust Capital City

A1 Pizza Stuffed Crust Springfield

A1 Pizza Stuffed Crust Shelbyville

A1 Pizza Stuffed Crust Capital City

Elite Pizza Thin Crust Capital City

Elite Pizza Stuffed Crust Capital City

Vincenzo's Pizza Thick Crust Springfield

Vincenzo's Pizza Thick Crust Shelbyville

Vincenzo's Pizza Thin Crust Springfield

Vincenzo's Pizza Thin Crust Shelbyville

Restaurant Pizza Variety Restaurant Delivery Area

A1 Pizza Thick Crust A1 Pizza Springfield

A1 Pizza Stuffed Crust A1 Pizza Shelbyville

Elite Pizza Thin Crust A1 Pizza Capital City

Elite Pizza Stuffed Crust Elite Pizza Capital City

Vincenzo's

Pizza

Thick Crust

Vincenzo's

Pizza

Springfield

Vincenzo's

Pizza

Thin Crust

Vincenzo's

Pizza

Shelbyville

Varieties By Restaurant Delivery Areas By

Restaurant

4NF - Example

84

4. Join Dependencies and Fifth Normal

Form (1)

Definition:

• A join dependency (JD), denoted by JD(R1, R2, ...,

Rn), specified on relation schema R, specifies a

constraint on the states r of R.

– The constraint states that every legal state r of R

should have a non-additive join decomposition into

R1, R2, ..., Rn; that is, for every such r we have

– * (R1(r), R2(r), ..., Rn(r)) = r

Note: an MVD is a special case of a JD where n = 2.

• A join dependency JD(R1, R2, ..., Rn), specified on

relation schema R, is a trivial JD if one of the

relation schemas Ri in JD(R1, R2, ..., Rn) is equal to

R.

85

Join Dependencies and Fifth Normal Form

(2)

Definition:

• A relation schema R is in fifth normal form (5NF) (or

Project-Join Normal Form (PJNF)) with respect to a

set F of functional, multivalued, and join

dependencies if,

– for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+

(that is, implied by F),

• every Ri is a superkey of R.

86

Relation SUPPLY with Join Dependency

and conversion to Fifth Normal Form

• Whenever a supplier supply part p, & a project j

uses part p, the supplier s supplies at least one

part to project j, then supplier s will be supplying

part p to project j.

87

Join dependency and 5th

normal form

• There are some relations ,which cannot be

decomposed into two projections. But this

relation can be decomposed into 3 projection,

this is in 5NF. Suppose there is a relation

Dealers,parts,customer which is in 4NF.

Teacher Class Subject

100 4th English

100 5th hindi

• Problem is although it is in 4th NF , it involves join dependency(which is neither FD or MVD), thus it is required to decompose such a relation into smaller ones. All projections are in 5th NF

Teacher Class

100 4th

100 5th

Class Subject

4th English

5th hindi

Subject Teacher

English 100

Hindi 100

DESIGNING A SET OF RELATIONS (2)

• Goals:

– Lossless join property (a must)

– Dependency preservation property

– Additional normal forms

• 4NF (based on multi-valued dependencies)

• 5NF (based on join dependencies)

Why do we preserve the dependency?

• We would like to check easily that updates to the

database do not result in illegal relations being created.

• It would be nice if our design allowed us to check

updates without having to compute natural joins.

• We want to preserve dependencies bcoz each

dependencies in F represents a constraints on

dbase.

•

Dependency Preservation

Decomposition

• Definition: Each FD specified in F either

appears directly in one of the relations in the

decomposition, or be inferred from FDs that

appear in some relation.

For example…….

• R=(A,B,C,D) and F =(AB,AC,CD)

Here R is decomposed into

R1=(A,B,C)with the FDs

F1=(AB,AC)

R2=(C,D) with the FDs

F2=(CD)

F’=F1UF2…U Fn = (AB,AC,CD)

Hence we can say decomposition is DP

Properties of Relational

Decompositions (3)

• Dependency Preservation Property of a Decomposition:

– Definition: Given a set of dependencies F on R, the projection of F on Ri, denoted by pRi(F) where Ri is a subset of R, is the set of dependencies X Y in F+ such that the attributes in X υ Y are all contained in Ri.

– Hence, the projection of F on each relation schema Ri in the decomposition D is the set of functional dependencies in F+, the closure of F, such that all their left- and right-hand-side attributes are in Ri.


Decompositions (4)

• Dependency Preservation Property of a Decomposition (cont.): – Dependency Preservation Property:

• A decomposition D = {R1, R2, ..., Rm} of R is dependency-preserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F; that is ((R1(F)) υ . . . υ (Rm(F)))+ = F+

• (See examples in Fig 10.12a and Fig 10.11)

• Claim 1: – It is always possible to find a dependency-preserving

decomposition D with respect to F such that each relation Ri in D is in 3nf.


Decompositions (5)

• Lossless (Non-additive) Join Property of a Decomposition: – Definition: Lossless join property: a decomposition D

= {R1, R2, ..., Rm} of R has the lossless (nonadditive) join property with respect to the set of dependencies F on R if, for every relation state r of R that satisfies F, the following holds, where * is the natural join of all the relations in D:

* ( R1(r), ..., Rm(r)) = r

– : The word loss in lossless refers to loss of information, not to loss of tuples. In fact, for ―loss of information‖ a better term is ―addition of spurious information‖

97


Decompositions (9)

• Testing Binary Decompositions for Lossless Join Property – Binary Decomposition: Decomposition of a relation R

into two relations.

– PROPERTY LJ1 (lossless join test for binary decompositions): A decomposition D = {R1, R2} of R has the lossless join property with respect to a set of functional dependencies F on R if and only if either

• The f.d. ((R1 ∩ R2) (R1- R2)) is in F+, or

• The f.d. ((R1 ∩ R2) (R2 - R1)) is in F+.

OR

• The f.d. ((R1 ∩ R2) (R1)

• The f.d. ((R1 ∩ R2) (R2 )

NONLOSS DECOMPOSITION AND

FUNCTIONAL DEPENDENCIES

Supplier _no. Status City

S3 30 Chennai

S5 30 delhi

SUPPLIERS:- TABLE(RELATION)

Supplier_ no. Status

S3 30

S5 30

SST:TABLE(A)

Supplier _no. City

S3 Chennai

S5 delhi

SC:TABLE

No information is lost. This first decomposition is indeed

nonloss

Lossy decomposition or lossy-join

decomposition

Supplier_ no. Status

S3 30

S5 30

Status City

30 Chennai

30 delhi

SST: TABLE

(B)

STC:TABL

E

Here we cannot tell which supplier has which city

This decomposition is nonloss but lossy

Example

• Leading_schema= { branchname, city,assets, customer,

loan, amount}

• R1.branch={branchname, city,assets}

• R2. loan={branchname, customer,loan,amount}

• R1 ∩ R2

• =======branch name

• R2===R21 & R22

• R21={loan, branchname, amount}

• R22={customer, loan}

• R21 ∩ R22=====loan

• It’s a lossless join dependencies.

100

Find its lossy or lossless??

101

103


Decompositions (8)

Lossless (nonadditive) join test for n-ary decompositions.

(a) Case 1: Decomposition of EMP_PROJ into

EMP_PROJ1 and EMP_LOCS fails test.

(b) A decomposition of EMP_PROJ that has the lossless

join property.

104

Properties of Relational Decompositions (8)

Lossless (nonadditive)

join test for n-ary

decompositions.

(c) Case 2:

Decomposition of

EMP_PROJ into EMP,

PROJECT, and

WORKS_ON satisfies

test.

ch 4: relational database design -...

Documents