pmit-6102 advanced database systems
DESCRIPTION
PMIT-6102 Advanced Database Systems. By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University. Lecture 04 Relational Database Design Normalization. Outline. Overview of Relational DBMS Normalization. Normalization. - PowerPoint PPT PresentationTRANSCRIPT
PMIT-6102Advanced Database Systems
By-Jesmin Akhter
Assistant Professor, IIT, Jahangirnagar University
The aim of normalization is to eliminate various anomalies (or undesirable aspects) of a relation in order to obtain “better” relations.
The following four problems might exist in a relation scheme: Repetition anomaly Update anomaly Insertion anomaly Deletion anomaly
Slide 4
Normalization
Repetition Anomaly
The NAME,TITLE, SAL attribute values are repeated for each project that the employee is involved in. Waste of space Complicates updates Contrary to the spirit of databases
ENO
EMP
ENAME TITLE SAL
J. Doe Elect. Eng. 40000M. Smith 34000M. Smith
AnalystAnalyst 34000
A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000R. Davis Mech. Eng. 27000
E1E2E2E3E3E4E5E6E7E8 J. Jones Syst. Anal. 34000
24
PNO RESP DUR
P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48P3 Engineer 36P3 Manager 40
Update Anomaly
If any attribute of project (say SAL of an employee) is updated, multiple tuples have to be updated to reflect the change.
ENO
EMP
ENAME TITLE SAL
J. Doe Elect. Eng. 40000M. Smith 34000M. Smith
AnalystAnalyst 34000
A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000
L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2E2E3E3E4E5
E6
E7E8 J. Jones Syst. Anal. 34000
24
PNO RESP DUR
P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48
P3 Engineer 36
P3 Manager 40
Insertion Anomaly
It may not be possible to store information about a new project until an employee is assigned to it.
ENO
EMP
ENAME TITLE SAL
J. Doe Elect. Eng. 40000M. Smith 34000
M. Smith
Analyst
Analyst 34000A. Lee Mech. Eng. 27000
A. Lee Mech. Eng. 27000J. Miller Programmer 24000
B. Casey Syst. Anal. 34000
L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2
E2E3
E3E4
E5
E6
E7
E8 J. Jones Syst. Anal. 34000
24
PNO RESP DUR
P1 Manager 12P1 Analyst
P2 Analyst 6
P3 Consultant 10
P4 Engineer 48P2 Programmer 18
P2 Manager 24
P4 Manager 48
P3 Engineer 36
P3 Manager 40
Deletion Anomaly If an engineer, who is the only employee on a project,
leaves the company, his personal information cannot be deleted, or the information about that project is lost.
May have to delete many tuples.
ENO
EMP
ENAME TITLE SAL
J. Doe Elect. Eng. 40000M. Smith 34000M. Smith
AnalystAnalyst 34000
A. Lee Mech. Eng. 27000A. Lee Mech. Eng. 27000J. Miller Programmer 24000B. Casey Syst. Anal. 34000L. Chu Elect. Eng. 40000
R. Davis Mech. Eng. 27000
E1E2E2E3E3E4E5E6
E7E8 J. Jones Syst. Anal. 34000
24
PNO RESP DUR
P1 Manager 12P1 AnalystP2 Analyst 6P3 Consultant 10P4 Engineer 48P2 Programmer 18P2 Manager 24P4 Manager 48P3 Engineer 36
P3 Manager 40
What to do? Take each relation individually and “improve” it in terms
of the desired characteristics Normal forms
o Atomic values (1NF)o Can be defined according to keys and dependencies.o Functional Dependencies ( 2NF, 3NF, BCNF)o Multivalued dependencies (4NF) o Projection-join dependencies (5NF)
Normalizationo Normalization is a process of concept separation which applies a
top-down methodology for producing a schema by subsequent refinements and decompositions.
o Do not combine unrelated sets of facts in one table; each relation should contain an independent set of facts.
o Universal relation assumption
Normalization Issues How do we decompose a schema into a desirable
normal form? What criteria should the decomposed schemas follow
in order to preserve the semantics of the original schema? Reconstructability: recover the original relation no spurious
joins Lossless decomposition: no information loss Dependency preservation: the constraints (i.e., dependencies)
that hold on the original relation should be enforceable by means of the constraints (i.e., dependencies) defined on the decomposed relations.
Example of Lossless-Join Decomposition
Lossless join decomposition Decomposition of R = (A, B, C)
R1 = (A, B) R2 = (B, C)
A B
1
2
A
B
1
2
r B,C(r)
A (r) B (r)A B
1
2
C
A
B
B
1
2
C
A
B
C
A
B
A,B(r)
Unnormalized
(UDF)
First normal form
(1NF)
Remove repeating groups
Second normal form
(2NF)
Remove partial dependencies
Third normal form
(3NF)
Remove transitive dependencies
Boyce-Codd normal
form (BCNF)
Remove remaining functional
dependency anomalies
Fourth normal form
(4NF)
Remove multivalued dependencies
Fifth normal form
(5NF)
Remove remaining anomalies
Stages of Normalization
Repeating Groups
A repeating group is an attribute (or set of attributes) that can have more than one value for a primary key value.
staffNo job dept dname city contact Number
SL10 Salesman 10 Sales Stratford 018111777, 018111888, 079311122
SA51 Manager 20 Accounts Barking 017111777
DS40 Clerk 20 Accounts Barking Null
OS45 Clerk 30 Operations Barking 079311555
Example We have the following relation that contains staff and department details and a list of telephone contact numbers for each member of staff.
Repeating Groups are not allowed in a relational design, since all attributes have to be ‘atomic’ - i.e., there can only be one value per cell in a table!
Multivalued Attributes (or repeating groups): non-key attributes or groups of non-key attributes the values of which are not uniquely identified by (directly or indirectly) (not functionally dependent on) the value of the Primary Key (or its part).
STUDENT
Stud_ID Name Course_ID Units
101 Lennon MSI 250 3.00
101 Lennon MSI 415 3.00
125 Johnson MSI 331 3.00
Stud_ID Name Course_ID Units101 Lennon MSI 250, MSI 415 3.00
125 Johnson MSI 331 3.00
Repeating Groups
STUDENT
Functional Dependency
Formal Definition: Attribute B is functionally dependant upon attribute A (or a collection of attributes) if a value of A determines a single value of attribute B at any one time.
Formal Notation: A B This should be read as ‘A determines B’ or ‘B is functionally dependant on A’. A is called the determinant and B is called the object of the determinant.
staffNo job dept dname
SL10 Salesman 10 Sales
SA51 Manager 20 Accounts
DS40 Clerk 20 Accounts
OS45 Clerk 30 Operations
Example:
staffNo job
staffNo dept
staffNo dname
dept dname
Functional Dependencies
Functional Dependency
Full Functional Dependency: Only of relevance with composite determinants. This is the situation when it is necessary to use all the attributes of the composite determinant to identify its object uniquely.
order# line# qty price
A001 001 10 200
A002 001 20 400
A002 002 20 800
A004 001 15 300
Example:
(Order#, line#) qty
(Order#, line#) price
Full Functional Dependencies
Compound Determinants: If more than one attribute is necessary to determine another attribute in an entity, then such a determinant is termed a composite determinant.
Functional Dependency
Partial Functional Dependency: This is the situation that exists if it is necessary to only use a subset of the attributes of the composite determinant to identify its object uniquely.
(student#, unit#) grade
Full Functional Dependencies
unit# room
Partial Functional Dependencies
Repetition of data!
student# unit# room grade
9900100 A01 TH224 2
9900010 A01 TH224 14
9901011 A02 JS075 3
9900001 A01 TH224 16
Partial Dependency – when an non-key attribute is determined by a part, but not the whole, of a COMPOSITE primary key.
CUSTOMER
Cust_ID Name Order_ID
101 AT&T 1234
101 AT&T 156
125 Cisco 1250
Partial Dependency
Functional Dependency
Transitive DependencyDefinition: A transitive dependency exists when there is an intermediate functional dependency.
Formal Notation: If A B and B C, then it can be stated that the following transitive dependency exists: A B C
staffNo dept
dept dname
staffNo dept dname
Transitive Dependencies
Repetition of data!
staffNo job dept dname
SL10 Salesman 10 Sales
SA51 Manager 20 Accounts
DS40 Clerk 20 Accounts
OS45 Clerk 30 Operations
Example:
Transitive Dependency – when a non-key attribute determines another non-key attribute.
EMPLOYEE
Emp_ID F_Name L_Name Dept_ID Dept_Name
111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg
Transitive Dependency
Transitive Dependency