improving the quality of database designs (adapted from david kroenke, dabase processing)
TRANSCRIPT
Improving the Quality of Database Designs
(Adapted from David Kroenke, Dabase Processing)
Improving the Quality of Database Designs
• Minimizing Redundancy in Database• Avoiding Anomalies • Function Dependency• Normal Forms
o First Normal Formo Second Normal Formo Third Normal Form
• Exercise Problems
Minimizing Redundancy in DB
• Redundancyo Wastes spaceo Wastes timeo Causes Anomalies (incorrect data)
Avoiding Anomalies
• Causeso Update Anomalyo Insertion Anomalyo Deletion Anomaly
DVD Table
dvdID acquired title genre length studio country
120 1/25/03 The 39 Steps Mystery 120 ABC USA
150 2/5/03 Elizabeth Drama 105 XYZ England
172 12/31/03 Lady & Tramp Animation 93 DEF Poland
157 3/25/03 Elizabeth Drama 105 XYZ England
110 5/12/02 Annie Hall Comedy 120 ABC USA
125 3/8/03 Elizabeth Drama 105 XYZ England
Back to UA Back to IA Back to DA
Update Anomaly
• Situation in whichUpdate in one record requires update in another record.
• E.g.Suppose for dvdID #150 (Elizabeth),length is changed to 100.If length values in devID #157 and #125 are not changed also, we have anomalies.
To DVD
Insertion Anomaly
• Situation in whichAdding a record results in an inconsistency
• Suppose another copy of The 39 Steps is added to the table. If its values of genre, length, and rating are not the same as those dvdID #120, we have an anomaly.
To DVD
Deletion Anomaly
• Situation in whichDeleting one record results in unintended loss of data
• Suppose dvdID #172 is removed. Then all data items regarding studio DEF and its country (Poland) —will be lost.
To DVD
Functional Dependency
• Definition• Given: A and B are attributes of relation (table) R
Then B is functionally dependent on A if and only if each value in A has associated with it exactly one value of B in R.
• A B ( A determines B)• I.e., any 2 rows with same value for A will have
the same value for B
Functional Dependence (1)
• DVD (title, publisher, length, director, pubAddress)
o publisher pubAddress (yes)o title length (no)o title, publisher length (yes)
Back to 2NF
Functional Dependence (2)
• Books (bkID, ISBN, title, author, pubAddress)o ISBN title (yes)o ISBN author (yes)o bkID title (yes)o bkID author (yes)o bkID pubAddress (yes)o title, publisher length (yes)
• A primary key determine each nonkey attribute
First Normal Form (1NF)
• A relation (table) is in 1NF ifo Each row is unique (with primary key)o All attributes are atomic
Second Normal Form (2NF)
• A relation (table) is in second normal form if
o All nonkey attributes are dependent on all of the key. (This means that the relation is not in 2NF if any nonkey attribute is dependent on only part of the key.)
• E.g., in DVD, length is dependent only on title, but not on publisher. To FD1
2NF? (No)
stdID activities fee
100 Skiing 200
100 Golf 65
150 Swimming 50
175 Squash 50
175 Swimming 50
200 Swimming 50
200 Golf 65
StudentdActivities
Back to Problems
Problems
• Noteo Key: stdID + activitieso Attribute fee is dependent only on activities (partial key).
• Problemso There are obvious redundancies.o If student 175 is removed, fee($50) for Squash is
deleted.o A new activity—say Surfing—cannot be entered until a
student is entered
To 2NF
Solution
• Remove the attribute that is dependent only on part of the key and form a new table
• Create a link between the new and the original tables using a foreign key
• Note: if a relation (table) is 1NF and the primary key consists of a single attribute, the relation is automatically 2NF.
Solution
stdID activities
100 Skiing
100 Golf
150 Swimming
175 Squash
175 Swimming
200 Swimming
200 Golf
Activities fee
Skiing 200
Golf 65
Swimming 50
Squash 50
Activities Fees
Third Normal Form (3NF)
• A relation is in 3NF ifo It is in 2NF ando There are no transitive dependencies. (I.e.,
every nonkey attribute is dependent only on the primary key.)
• Table satisfying 3NF (in common terms)o Should have a field that uniquely identifies
each recordo Each field in the table should describe the
subject that the table represents
3NF? (No)
stdID building fee
100 Randolf 1200
150 Ingersoll 1100
200 Randolf 1200
250 Pitkin 1100
300 Randolf 1200
StudentHousing
Back to Problems
Transitive Dependence
• stdID building (I.e., building is dependent on stdID)
• building fee (I.e., fee is dependent on building)
• Thus,stdID building fee
Problems
• StdHousing is in 2NF, buto Redundant data will introduce modification
anomalyo Removing stdID 150 deletes fee value for
Ingersollo Fee for a new building—say Barrett—cannot
be recorded until a new stdID is entered
To 3NF
Solution
• Remove data that is not dependent on primary key and form new relation
• Create a relationship between the new and the original tables using foreign key
Solution
stdID Building
100 Randolf
150 Ingersol
200 Randolf
250 Pitkins
300 Randolf
Building Fee
Randolf 1200
Ingersoll 1100
Pitkins 1100
ResidenceFeeStudentResidence
Try This (Customers Table)
Back to Problem
Problem
• Note thato custNum ZIP ZIP city, state
I.e., custNum ZIP city, stateo Transitive dependence results in redundancy
and modification, insertion, & deletion anomalies.
To Customers
Solution
Summary
• Examine the attributes of an entity and ask the following questions. If the answer is any “Yes,” an attribute probably belong to another entity.
o Does an attribute or attributes describe an entity other than the current one?
o Does an attribute of the entity depend (functionally dependent) on only part of the primary key?
o Does an attribute depend on something other than the primary key?
empIdempLastNameempFirstNameempMiddleNameempAddressempCityempStateempZipempPhoneempPagerempPositionempPositionDescripempDateHireempPayRateempDateLastRaise
custIdcustNamecustAddresscustCitycustStatecustZipcustPhonecustFaxorderNumorderQuantityorderDate
prodIdprodDescripprodCost
Employees
empIdempLastNameempFirstNameempMiddleNameempAddressempCityempStateempZipempPhoneempPagerempPositionempPositionDescripempDateHireempPayRateempDateLastRaise
Customers
custIdcustNamecustAddresscustCitycustStatecustZipcustPhonecustFaxorderNumorderQuantityorderDate
Products
prodIdprodDescripprodCost
Company Database
Company Database (2)Employees
empIdempLastNameempFirstNameempMiddleNameempAddressempCityempStateempZipempPhoneempPagerempPositionempDateHireempPayRateempDateLastRaise
Employees
empIdempLastNameempFirstNameempMiddleNameempAddressempCityempStateempZipempPhoneempPager
EmployeePaysempIdempPositionempPositionDescripempDateHireempPayRateempDateLastRaise
Company Database
Customers
custIdcustNamecustAddresscustCitycustStatecustZipcustPhonecustFaxorderNumorderQuantityorderDate
Customers
custIdcustNamecustAddresscustCitycustStatecustZipcustPhonecustFax
Orders
custIdorderNumorderQuantityorderDate
QuizNormalization is the process of grouping data into logically related data into tables to reduce redundancy. (T/F)
Having no duplicate or redundant data in a database, and having everything in the database normalized, is always the best way to go. (T/F)
If data is in the third normal form, it is automatically in the first and second normal forms. (T/F)
What is the major advantage of denormalized database versus a normalized database?
What are some major disadvantages of unnormalized database?
Exercise: What Type of Relationships Do the Tables Have?
Positions
os_idpositionposition_descrip
EmployeePays
empPayIdempDateHireempPayRateempDateLastRaise
Orders
orderNumorderQuantityorderDate
Customers
custIdcustNamecustAddresscustcitycustStatecustZipcustPhonecustFax
Employees
empIdempLastNameempFirstNameempMiddleNameempAddressempCityempStateempZipempPhoneempPager
Exercise: Normalize the following data.
Take the following data and normalize it. Keep in mind that, in a real DB, there would be many more items than what is given here.
Employees:
Angela Smith, secretary, RR 1 Box 73, Greensburg, IN, 47890, $9.50/hour, started Jan. 22, 1996, SSN is 323149669
Jack Lee Nelson, salesman, 3334 N. Main St., Brownsburg, IN, 45687, 317-852-9901, $35,000.00/year, data started 10/28/95, SSN is 312567342
Customers:
Robert’s Games & Things, 5612 Lafayette Rd., Indianapolis, IN, 46224, 317-291-7888, customer ID is 432A
Reed’s Dairy Bar, 4556 W 10th St., Indianapolis, IN, 46245, 317-271-9823, customer ID is 117A
CustomerOrders:Customer ID is 117A, date of last order is 2/20/1997, product ordered was napkins, and product ID is 661
Tables
Employees Customers OrdersSsnlastNamefirstNamestreetcitystatezipphoneNumsalaryhourlyRatestartDateposition
customerIDnamestreetcitystatezipphoneNum
orderIDcustomerIDproductIDproductDescripdateOrdered
Solutions