Monash University 2004 1
Summer Semester 2004/2005
Lecture 7
Data Modelling and Normalisation
IMS1907 Database Systems
Monash University 2004 2
Transforming an ER diagram into normalised relations, and then merging all the relations into one final, consolidated set of relations can be accomplished in four steps
1. Represent entities as relations
2. Represent relationships as relations
3. Normalise each relation
4. Merge the relations with a common PK
Transforming ER Models to Relations
Monash University 2004 3
Customer (Customer-no, Name, Address, City, State,
Postcode, Discount)
Representing Entities as Relations
Each entity in the ER diagram is transformed into a relation
CUSTOMER
Monash University 2004 4
The multi-valued attribute becomes a relation with a composite key consisting of the primary key of the entity and the multi-valued attribute’s partial identifier
EMPLOYEE
Customer (Emp_ID, Name, Address)
Employee Skill (Emp_ID, Skill)
SkillEmp_ID
Multi-valued Attributes as Relations
Monash University 2004 5
The weak entity becomes a relation with a composite key consisting of the primary key of the base entity and the weak entity’s partial identifier
Movie (Movie-ID, Title, Release-date)
Videotape Copy (Movie-ID, Copy-No, Condition)
Representing Weak Entities as Relations
MOVIEVIDEOTAPE
COPY
Monash University 2004 6
Binary Relationships (1:N, 1:1)
For 1:N, add the primary key of the entity on the ‘one’ side of the relationship as a foreign key in the relation that is on the ‘many’ side
For 1:1 relationship involving entities A and B, choose from- add the primary key of A as a foreign key of B- add the primary key of B as a foreign key of A- both of the above – at a cost!- use business rules and good sense to guide choice
Representing Relationships as Relations
Monash University 2004 7
Binary Relationships (1:N)
CUSTOMER ORDER
Customer (Customer-no, Name, Address, City, State, Postcode, Discount)
Order (Order-no, Order-date, Promised-date, Customer-no)
Representing Relationships as Relations
places
Monash University 2004 8
CARECENTRE
Binary Relationships (1:1)
NURSE
Nurse (Nurse-no, Name, Address, Qualification)
Care Centre (Care-Centre-No, Location, Type, Nurse-no)
Representing Relationships as Relations
manages
Monash University 2004 9
Binary and Higher Degree Relationships (M:N)
If we wish to know the quantity of a product on an order, this attribute is an attribute of the relationship ‘ordered on’
Representing Relationships as Relations
PRODUCT ORDEROrdered on
Qty_ord
Monash University 2004 10
Order (Order-no, Order-date, Promised-date)
Order Line (Order-no, Product-no, Quantity-ordered)
Product (Product-no, description, (other attributes))
For M:N, first create a relation for each for each of the entity types, then create a relation for the relationship, with a composite primary key formed from the primary keys of the participating entity types
PRODUCT ORDERORDER-LINE
Representing Relationships as Relations
Monash University 2004 11
Unary or Recursive Relationships (M:N)
Item (Item-no, Name, Cost)
Item-Bill (Item-no, Component-no, Quantity)
Representing Relationships as Relations
ITEM is part of
Monash University 2004 12
Unary or Recursive Relationships (1:N)
Employee (Emp-id, Name, Birthdate, Manager-id)
Representing Relationships as Relations
EMPLOYEE supervises
Monash University 2004 13
Patient-Treatment (Patient-id, Doctor-id, Treatment-code, Date, Time, Result)
Ternary or Higher Relationships
Patient (Patient-id, Patient-name)
Doctor (Doctor-id, Doctor-name)
Treatment (Treatment-code, Description)
Representing Relationships as Relations
DOCTOR
treats
TREATMENT
PATIENT
Monash University 2004 14
Supertype-subtype relationships
PROPERTY
BEACH PROPERTY
MOUNTAINPROPERTY
Property (Property-ID, Address, City, Postcode, No-rooms, Rent)
Beach (Property-ID, Nearest-beach, Distance-to-beach)
Mountain (Property-ID, Skiing)
Representing Relationships as Relations
Monash University 2004 15
Normalisation is a process for converting complex data structures into simple, stable data structures in the form of relations
Data models consisting of normalised relations- are robust, stable flexible and have minimum
redundancy- simplify enforcement of referential integrity constraints- make data maintenance easier- are an improved representation of real world objects
Normalisation
Monash University 2004 16
Normalisation ensures that each attribute is attached to the appropriate relation
Each attribute is contained in the relation which represents the real world system object or concept that the attribute describes or is a property of- the attribute Student-name should be in the relation
STUDENT which represents the real world object “student” of interest to a student records system
Normalisation
Monash University 2004 17
Originally developed as part of relational database theory
Accomplished in stages, each of which corresponds to a “normal form”
Codd defined first, second and third normal forms– third normal form is adequate for most business
applications
Later extensions include Boyce-Codd, 4th, 5th and domain-key normal forms
Normalisation
Monash University 2004 18
Central to normalisation and well-structured relations
A functional dependency is a particular relationship between attributes in a relation
For any relation R, with attributes ‘a’ and ‘b’, if for each value of ‘a’ there is only ever one value of attribute ‘b’ associated with it, attribute ‘b’ is said to be functionally dependent on attribute ‘a’
Functional Dependency
Monash University 2004 19
If for every valid instance of A, that value of A uniquely determines the value of B, B is functionally dependent on A
A identifies B
A B
Emp# Emp-name
Emp# Salary
Functional Dependency
Monash University 2004 20
Normalisation to third normal form is accomplished in three steps each corresponding to a basic normal form
A normal form is a state of a relation that can be determined by applying simple rules concerning dependencies within that relation
Each step of the normalisation process is applied to a single relation in sequence so that the relation is converted to third normal form
All identified relations are normalised
Normalisation
Monash University 2004 21
First Normal Form 1NF
Second Normal Form
2NF
Third Normal Form 3NF
Unnormalised table
Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
Steps in Normalisation
Monash University 2004 22
A relation is in first normal form if it contains no repeating data – the value of the data at the intersection of each row and
column must be single-valued
Remove any repeating groups of attributes to convert a relation to 1NF – key of the removed group will be a composite key
First Normal Form 1NF
Monash University 2004 23
Identify primary key
Identify repeating groups
Remove repeating groups
Order (Order#, Date, Customer#, Name, (Item#, Desc, Qty))
Order (Order#, Date, Customer#, Name)
Order-Item (Order#, Item#, Desc)
First Normal Form 1NF
Monash University 2004 24
A relation is in 2NF - if it is in 1NF- no part of the primary key is dependent on any other
part of the primary key- no non-key attribute is dependent on only part of the
primary key
Second Normal Form 2NF
Monash University 2004 25
Converting a relation to 2NF involves removing partial dependencies
A partial dependency exists - if one part of a composite primary key is dependent on
another part of the key- if one or more non-key attributes are dependent on only
part of a composite primary key
Second Normal Form 2NF
Monash University 2004 26
Remove any partial dependencies to convert a 1NF relation to 2NF
If the primary key consists of only one attribute then a 1NF relation is automatically in 2NF
If a 1NF relation has no non-key attributes, as long as there are no dependencies between the parts of the primary key, it is automatically in 2NF
Second Normal Form 2NF
Monash University 2004 27
Remove partial dependencies
A non-key attribute cannot be identified by part of a composite key
Order (Order#, Item#, Desc, Qty-ordered)
Order-Item (Order#, Item#, Qty-ordered)
Item (Item#, Desc)
Second Normal Form 2NF
Monash University 2004 28
Part of a composite primary key cannot be identified by another part of the primary key
DEPT (Dept#, Dept-name, (Emp#, Emp-name))
DEPT (Dept#, Dept-name)
DEPT-EMP (Dept#, Emp#, Emp-name)
But
DEPT-EMP (Dept#, Emp#, Emp-name)
Remove Dept# from the key:
EMP (Emp# ,Emp-name, Dept#)
Dependencies Within the Primary Key
Monash University 2004 29
Order# Item# Item-desc Qty
27
28
28
873
402
873
nut
bolt
nut
2
1
10
Order-Item
30 495 washer 50
- UPDATE - change item-desc in many places- DELETE - data for last item lost when last order for that
item is deleted- CREATE - cannot add new item until it is ordered
Partial Dependency Anomalies
Monash University 2004 30
Order# Item# Qty
27
28
28
873
402
873
2
1
10
Order
30 495 50
Item# Desc
873
402
nut
bolt
495 washer
delete last order for item, but item remains
add new item at any time
change item description in one place only
Item
Solution to Anomalies – 2NF
Monash University 2004 31
A relation is in 3NF if it is in 2NF and no transitive dependencies exist
A transitive dependency is a functional dependency between two or more non-key attributes– if a relation has no or only one non-key attribute then by
definition, a transitive dependency cannot exist
Remove any transitive dependencies to convert a 2NF relation to 3NF
Third Normal Form 3NF
Monash University 2004 32
Employee (Emp#, Ename, Dept#, Dname)
Employee (Emp#, Ename, Dept#)
Department(Dept#, Dname)
(look for foreign keys and their attributes)
Remove Transitive Dependencies
A non-key attribute cannot be identified by another non-key attribute
Third Normal Form 3NF
Monash University 2004 33
Emp# Emp-name Dept# Dname
10
20
25
Smith
Jones
Smith
D5
D7
D7
MIS
Finance
Finance
Employee
30 Black D8 Sales
UPDATE - change dept name in many places
DELETE - data for dept lost when last employee for that dept is deleted
CREATE - cannot add new dept until an employee is allocated to it
Transitive Dependency Anomalies
Monash University 2004 34
Emp# Ename Dept#
10
20
25
Smith
Jones
Smith
D5
D7
D7
Employee
30 Black D8
Dept# Dname
D5
D7
MIS
Finance
D8 Sales
delete last emp in dept, but dept remains
add new dept at any time
change dept name in one place
Item
Solution to Anomalies – 3NF
Monash University 2004 35
A relation is normalised to 3NF if all attributes are fully functionally dependent on the primary key- remove repeating groups- remove partial dependencies- remove transitive dependencies
Normalisation to 3NF
Monash University 2004 36
During the normalisation process two or more relations with the same primary key may appear
The set of 3NF relations must not contain any duplicate data
Relations with the same primary key should be merged
Normalisation - Merging Relations
Monash University 2004 37
Synonyms
Two or more attributes may have different names but the same meaning
Either adopt one of the names as a standard or choose a third name
STUDENT1 (Student-id, Name, Phone-no)
STUDENT2 (VCE-no, Name, Address)
STUDENT (Student-id, Name, Address, Phone-no)
Normalisation of Relations
Monash University 2004 38
Homonyms
Two or more attributes may have the same name but different meanings
To resolve the conflict, new attribute names need to be created
STUDENT1 (Student-id, Name, Address)
STUDENT2 (Student-id , Name, Phone, Address)
STUDENT (Student-id, Name, Phone, Campus-address, Permanent-
address )
Normalisation of Relations
Monash University 2004 39
When two 3NF relations are merged, transitive dependencies may result
STUDENT1 (Student-id, Major)STUDENT2 (Student-id , Advisor)
STUDENT (Student-id, Major, Advisor)
But MAJOR ADVISOR (dependency!)
STUDENT (Student-id, Major)MAJOR (Major , Advisor)
Normalisation of Relations
Monash University 2004 40
A set of 3NF relations may be converted to a simple diagrammatic form to begin physical database design
The conversion is simple
1. Draw a named rectangle for each relation
2. Draw a relationship line between rectangles linked by foreign keys with a “many” cardinality at the foreign key end of the relationship
Data Structure Diagrams (DSD)
Monash University 2004 41
CUSTOMER
SALESORDER
ITEM
SALES ORDERLINE
CUSTOMER (Cust#, Cname,Phone number)
SALES ORDER (Sord#, Sord-date, Cust#)
SALES ORDER-ITEM (Sord#, Item#, Qty)
ITEM (Item#, Item-desc)
Data Structure Diagrams (DSD)
Monash University 2004 42
(Tourcode, ...)
(Tourcode, depdate, ...)
(Booking#, ... , Tourcode, depdate)
redundant
TOUR
DEPARTURE
BOOKING
Eliminate redundant relationships
Data Structure Diagrams (DSD)
Monash University 2004 43
Detailed data modelling involves– collecting detailed attributes for each entity and
relationship identified– converting ER models to relations– normalising the relations– merging relations from each user viewpoint– converting the normalised and merged relations to create
a data structure diagram
Detailed Data Modelling - Summary
Monash University 2004 44
Hoffer, J.A., George, J.F. & Valacich, J.S., (2002), 3rd ed., Modern Systems Analysis and Design, Prentice-Hall, New Jersey, Ch 10,12.
Whitten, J.L., Bentley, L.D. & Dittman, K.C., (2001), 5th ed., Systems Analysis and Design Methods, Irwin McGraw-Hill, New York, NY, Ch 7
References