normalizing your database and why you want to do it!

39
06/23/ 22 1 Normalizing Your Database and Why you WANT to do it! INFYS540 Lesson 7 Chapter 5 Appendix

Upload: yael

Post on 09-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

INFYS540. Normalizing Your Database and Why you WANT to do it!. Lesson 7 Chapter 5 Appendix. Data Redundancy Problems. Redundancy breeds errors Same data defined in multiple places is BAD Spelling/typographical error prone Lack of data integrity Inability to perform simple queries - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 1

Normalizing Your Database and

Why you WANT to do it!

INFYS540

Lesson 7

Chapter 5 Appendix

Page 2: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 3

Data Redundancy Problems

• Redundancy breeds errors– Same data defined in multiple places is BAD– Spelling/typographical error prone– Lack of data integrity

• Inability to perform simple queries

• Inflexibility and inscalability

• Impossible to MAINTAIN!

Page 3: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 5

Relational DatabasePROJECTCHIEFProject Project ChiefComputing 333-22-1111Intranet 987-65-4321Contracting 123-45-6789CAT 333-22-1111

DEPARTMENTSDept Dept. Director RoomMLD 181-94-5676 B115C2G 987-65-4321 123M&B 123-45-6789 147

EMPLOYEESLName FName SSN DeptJones Mike 123-45-6789 M&BSmith Tony 987-65-4321 C2GLee Bruce 567-89-1234 MLDDoodle Yankee 333-22-1111 M&B

1

1

What is a candidate key? What is a primary key?What is a foreign key?

Page 4: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 11

Purpose of Normalization• Take advantage of the powerful tools

available in a DBMS• There are five levels of Normalization

– The higher the Normal Form the “better” and more efficient the database

– But, increasing the levels of Normal Form takes time and effort

– For most applications, 3rd Normal Form will solve most potential problems with a DB

Page 5: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 12

Normalizing Database

• Process of creating well-structured tables.

• Improve performance, integrity of data

• 5-step process (w/ 2 rules) to achieve Third Normal Form (3NF)

• First two steps put DB into a form so you can normalize it

Page 6: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 13

Rule #1 in Databases

Never design redundant data into a Databaseduplicate data is not consistent

duplicate data wastes space

Page 7: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 14

Step 1. Primary Keys• A primary key is one or more data fields (columns)

that uniquely identify each record in the table• What would the primary key be below?

– “table of employees, assigned to a department.”

EMPLOYEESLName FName SSN DeptJones Mike 123-45-6789 MathSmith Tony 987-65-4321 M&BLee Bruce 567-89-1234 Science

Page 8: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 15

Step 1. Primary Keys

• Answer: The SSN

• It is the only “guaranteed” unique column in the table. Names are easily repeated.

EMPLOYEESLName FName SSNDeptJones Mike 123-45-6789 MathSmith Tony 987-65-4321 M&BLee Bruce 567-89-1234 Science

Page 9: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 16

Step 1. Primary Keys

• Now try the following example:• “A table of projects assigned to employees, listing

the project name and the employee’s function on the project.”

EmpProjCounter SSN Project Function 1 123-45-6789 Dining Designer 2 123-45-6789 Computing Designer 3 987-65-4321 Contracting Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch

A Counter --The MS Access Default Key

Page 10: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 17

Step 1. Primary Keys

• It is the combination of the SSN and the Project fields. Why?

EMPLOYEES’ PROJECTSCounter SSN Project Function 1 123-45-6789 Dining Designer 2 123-45-6789 Computing Designer 3 987-65-4321 Contracting Designer 4 444-55-6666 Intranet Webmaster 5 222-99-7777 Dining Overwatch

Page 11: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 18

Step 1. Primary Keys

• Because, you can have the following:

EMPLOYEES’ PROJECTSCounter SSN Project Function1 123-45-6789 Dining Designer2 123-45-6789 Dining Designer3 987-65-4321 Intranet Designer4 444-55-6666 Intranet Webmaster5 222-99-7777 Dining Overwatch

• Redundant records! (Redundancy = BAD)

Page 12: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 19

Rule #2 about Databases

NEVER Use a

Counter as a

Primary Key

Page 13: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 20

Step 2: Eliminate Many-to-Many Relationships

• What is wrong with the following table? • “a table of personnel authorized access to a project”

PROJECTS QUERY ACCESSProject Access_1 Access_2 Access_3Dining 222-99-7777 181-94-5676Computing 222-99-7777 181-94-5676Intranet 987-65-4321 818-49-6765 123-45-6789

Page 14: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 21

Step 2: Eliminate Many-to-Many Relationships

• Here’s essentially what this table looks like within the Access relationships diagram:

Projects:Project

Project ChiefDepartmentAccess_1Access_2Access_3

Employees:SSN

Last NameFirst Name

....

has access to info about

Page 15: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 22

Step 2: Eliminate Many-to-Many Relationships

• Here’s how you model it in a database:– Break it up into two one-to-many relationships

Projects:Project

Project ChiefDepartment

....

Employees:SSN

Last NameFirst Name

....Access to

Project Info:ProjectSSN

1 1

Page 16: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 23

Step 2: Eliminate Many-to-Many Relationships

• How to do it:– The primary key of the new table is the

composite of the primary keys of the existing tables.

• Primary key of Projects = Project Name

• Primary key of Employees = SSN

• New table primary key of Project Name and SSN

Page 17: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 24

Step 2: Eliminate Many-to-Many Relationships

– No artificial restrictions on number of people with access

– You can add attributes about the types of access granted

– You can easily query who has access to information about each project

EMPLOYEELName FName SSNJones Mike 123-45-6789 Smith Tony 987-65-4321 Lee Bruce 567-89-1234 Doodle Yankee 333-22-1111

PROJ QUERY ACCESSProject SSNDining 222-99-7777Dining 181-94-5676Computing 222-99-7777Computing 181-94-5676Intranet 987-65-4321Intranet 818-49-6765Intranet 123-45-6789

PROJECTProject ProjectChief DeptComputing 333-22-1111 MATHIntranet 987-65-4321 M&BContracting 123-45-6789 M&BCAT 333-22-1111 Admin

Page 18: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 25

What is wrong with the following?

“A table of PCs, which are loaded with many different applications, and assigned to a user.”

PCSerial# LoadedSoftware Assigned 10291 Word, Powerpoint, ccMail Jones 10301 Word, Powerpoint, Lotus Notes Smith 10311 Word, LotusNotes, Borland C++ Hacker

Page 19: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 26

• “Atomic” - the data occupying a field cannot be further broken down.– i.e., no multi-data entries– i.e., “No attributes can have more than one value for a single

instance of an entity”

PCSerial# LoadedSoftware Assigned 10291 Word, Powerpoint, ccMail Jones

• If not atomic, updating is complex and error prone• If not atomic, can not easily query the database

Step 3: Achieving 1NF:All Data must be Atomic

Page 20: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 27

Step 3 Answer

PCSerial# LoadedSoftware Assigned 10291 Word Jones 10291 Powerpoint Jones 10291 ccMail Jones 10301 Word Smith 10301 Powerpoint Smith 10301 LotusNotes Smith 10311 Word Hacker 10311 LotusNotes Hacker 10311 Borland C++ Hacker

Page 21: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 28

Step 3. Achieving 1NF:All Data must be Atomic

Another source of redundancy: calculated fieldsTotalYTDAgeDaysRemaining

Solution: Use a Query!

Remove all calculated fields from table and create a query

...then use the query whenever you need up-to-date data

Page 22: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 29

Step 4. Achieving 2NF:Eliminate Partial Dependencies

• What is a partial dependency?

– Look at the table. What’s redundant?– “A table of functions an employee is assigned to for a project, and the project chief.”

EMPLOYEES’ PROJECTSSSN Project Function Project Chief123-45-6789 Dining Designer 222-99-7777123-45-6789 Computing Designer 333-88-5656123-45-6789 Intranet Member 987-65-4321987-65-4321 Intranet Designer 987-65-4321444-55-6666 Intranet Webmaster 987-65-4321222-99-7777 Dining Overwatch 222-99-7777

Page 23: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 30

Step 4. Achieving 2NF:Eliminate Partial Dependencies

• Function depends on the entire primary key: SSN and Project.• ProjectChief is dependent on just a portion of the primary key

EMPLOYEES’ PROJECTSSSN Project Function ProjectChief123-45-6789 Dining Designer 222-99-7777123-45-6789 Computing Designer 333-88-5656123-45-6789 Intranet Member 987-65-4321987-65-4321 Intranet Designer 987-65-4321444-55-6666 Intranet Webmaster 987-65-4321222-99-7777 Dining Overwatch 222-99-7777

Page 24: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 31

Step 4. Achieving 2NF:Eliminate Partial Dependencies

• Why is this bad?– Well, what’s wrong with the following?

EMPLOYEES’ PROJECTSSSN Project Function Project Chief123-45-6789 Dining Designer 222-99-7777123-45-6789 Computing Designer 333-88-5656123-45-6789 Intranet Member 987-65-4321987-65-4321 Intranet Designer 987-65-4321444-55-6666 Intranet Webmaster 222-99-7777222-99-7777 Dining Overwatch 222-99-7777

Page 25: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 32

Step 4. Achieving 2NF:Eliminate Partial Dependencies

• A partial dependency (PD) occurs when a non-key field depends on only a part of the primary key, and not the whole primary key.

• PDs are a relation. So, we need a new table.....EMPLOYEES’ PROJECTSSSN Project Function Project Chief123-45-6789 Dining Designer 222-99-7777123-45-6789 Computing Designer 333-88-5656123-45-6789 Intranet Member 987-65-4321987-65-4321 Intranet Designer 987-65-4321444-55-6666 Intranet Webmaster 987-65-4321222-99-7777 Dining Overwatch 222-99-7777

Page 26: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 33

Step 4. Achieving 2NF:Eliminate Partial Dependencies

• Here’s how it should look......EMPLOYEES’ PROJECTSSSN Project Function123-45-6789 Dining Designer123-45-6789 Computing Designer123-45-6789 Intranet Member987-65-4321 Intranet Designer444-55-6666 Intranet Webmaster222-99-7777 Dining OverwatchPROJECTSProject Project ChiefDining 222-99-7777Computing 333-88-5656Intranet 987-65-4321

Page 27: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 34

Step 5: Achieving 3NF:Eliminate Transitive Dependencies

• What is wrong with the following table?

PROJECTSProject Project Chief Dept. Dept. Director RoomDining 222-99-7777 Admin 181-94-5676 B115Computing 333-88-5656 Admin 181-94-5676 B115Intranet 987-65-4321 M&B 818-49-6765 123Contracting 187-87-8787 M&B 818-49-6765 123CAT 333-22-1111 Grounds 123-45-6789 147

Page 28: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 35

Step 5: Achieving 3NF:Eliminate Transitive Dependencies

• We have fields dependent on a non-key field:– The Director and Room fields clearly relate to the Dept., and have nothing

to do with the project. (Dept is a “determinant” that is not a candidate key)

PROJECTSProject Project Chief Dept. Dept. Director RoomDining 222-99-7777 Admin 181-94-5676 B115Computing 333-88-5656 Admin 181-94-5676 B115Intranet 987-65-4321 M&B 818-49-6765 123Contracting 187-87-8787 M&B 818-49-6765 123CAT 333-22-1111 GRND 123-45-6789 147

Page 29: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 36

Step 5: Achieving 3NF:Eliminate Transitive Dependencies

• A transitive dependency occurs when a non-key field depends on another non-key field.

• Why is this bad?.

– A typo appeared in the Contracting line. A database without the transitive dependency would not have allowed this to happen.

PROJECTSProject Project Chief Dept. Dept. Director RoomDining 222-99-7777 Admin 181-94-5676 B115Computing 333-88-5656 Admin 181-94-5676 B115Intranet 987-65-4321 M&B 818-49-6765 123Contracting 187-87-8787 M&B 818-49-6765 124CAT 333-22-1111 GRND 123-45-6789 147

Page 30: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 37

Step 5: Achieving 3NF:Eliminate Transitive Dependencies

• How to do it:a. Which fields are dependent on a non-key field

in the table? (Director, Room)

b. Which fields are these dependent on? (Dept)

c. Create a new table with (b) as the primary key.

d. Put (a) in the new table.

e. Remove (a) from the old table.

Page 31: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 38

Step 5: Achieving 3NF:Eliminate Transitive Dependencies

• Here are the new tables.

PROJECTSProject Project Chief Dept.Dining 222-99-7777 AdminComputing 333-88-5656 AdminIntranet 987-65-4321 M&BContracting 187-87-8787 M&BCAT 333-22-1111 GRNDDEPARTMENTSDept. Name Dept. Director RoomAdmin 181-94-5676 B115M&B 818-49-6765 123GRND 123-45-6789 147

Page 32: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 39

Data Analysis: Normalization• An entity is in first normal form (1NF) if there are no

attributes that can have more than one value for a single instance of the entity.

• An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it.

• An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes.

Page 33: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 41

Conclusion

• Rule1: Never design redundant data into a database • Rule2: Never use a counter as Primary Key• Identify proper primary keys (1NF)• Break up many-to-many relationships (1NF)• 1NF: Break all data into atomic components• 2NF: Identify/eliminate partial dependencies• 3NF: Eliminate transitive dependencies• Common sense test

Page 34: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 42

What is a Good Data Model?– A good data model is simple.

• As a general rule, the data attributes that describe an entity should

describe only that entity.

– A good data model is essentially non-redundant. • This means that each data attribute, other than foreign keys, describes

at most one entity.

– A good data model should be flexible and adaptable to future needs.

• We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs.

Page 35: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 49

• Data and Referential Integrity– There are at least three types of data integrity that

must be designed into any database - key integrity, domain integrity and referential integrity.

– Key Integrity:• Every table should have a primary key (which may be

concatenated). – The primary key must be controlled such that no two records

in the table have the same primary key value.

– The primary key for a record must never be allowed to have a NULL value.

Page 36: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 50

• Data and Referential Integrity– Domain Integrity:

• Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values.

– Referential Integrity:• A referential integrity error exists when a foreign

key value in one table has no matching primary key value in the related table.

Page 37: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 51

• Referential Integrity:• Referential integrity is specified in the form of deletion rules

as follows:– No restriction.

• Any record in the table may be deleted without regard to any records in any other tables.

– Delete:Cascade.

• A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table.

– Delete:Restrict.

• A deletion of a record in the table must be disallowed until any matching records are deleted from a related table.

– Delete:Set Null.

• A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL.

Page 38: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 54

• Database Capacity Planning– A database is stored on disk.

• The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available.

– Database capacity planning can be calculated with simple arithmetic as follows.

1 For each table, sum the field sizes. – This is the record size for the table.

2 For each table, multiply the record size times the number of entity instances to be included in the table.

– This is the table size.

Page 39: Normalizing Your Database  and  Why you  WANT  to do it!

04/21/23 55

• Database Capacity Planning– Database capacity planning can be calculated

with simple arithmetic as follows. (continued)3 Sum the table sizes.

– This is the database size.

4 Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above.

– This is the anticipated database capacity.