normalization 1st to 5th nf with example
TRANSCRIPT
Presented By:-
Deepshikha Jaiswal (2011MCA022)
Sourabh Kumar Kushawha (2011MCA020)
Normalization
Database Management Systems
File SystemFile: A collection of records or documents dealing with one organization, person, area or subject.Manual (paper) filesComputer files
Drawbacks of File SystemsData Redundancy
Data Redundancy means same information is duplicated in several files.
Data InconsistencyThat means different copies of same basic data are existing. This occurs as the result of update operations that are not updating the same data stored at different places.
Difficulty in Accessing DataIt is not easy to retrieve information using a conventional file processing system.
Data IsolationData are scattered in various files, and the files may be in different format, writing new application program to retrieve data is difficult.
Continue…
Continued…
Integrity ProblemsThe data values may need to satisfy some integrity constraints. For example the balance field Value must be grater than 5000. We have to handle this through program code in file processing systems. But in database we can declare the integrity constraints along with definition itself.
Concurrent Access anomaliesIf multiple users are updating the same data simultaneously it will result in inconsistent data state. In file processing system it is very difficult to handle this using program code. This results in concurrent access anomalies.
Security ProblemsEnforcing Security Constraints in file processing system is very difficult as the application programs are added to the system in an ad-hoc manner.
Database SystemTo overcome the drawbacks of ‘File System’ we introduced the concept of ‘Database Systems’.
Definition:-
A Database is a collection of data that represents the part of real world or A collection of similar records with relationships between the records. It is usually managed by a program system called a ‘Database Management Systems or DBMS’.
E.g.:- Bibliographic, statistical, business data, images, etc.
Database AnomaliesDatabase anomalies are the problems in relations that occur due to redundancy in the relations. These anomalies affect the process of inserting, deleting and modifying data in the relations. Some important data may be lost if a relations is updated that contains database anomalies. It is important to remove these anomalies in order to perform different processing on the relations without any problem.
When an attempt is made to modify (update, insert into, or delete from) a table, undesired side-effects may follow. Not all tables can suffer from these side-effects; rather, the side-effects can only arise in tables that have not been sufficiently normalized.
Types of Anomalies:
There are three types of Anomalies, which are: -UPDATION ANOMALY: - Any change made to your
data will require you to scan all records to make the changes multiple time.
INSERTION ANOMALY: - When you insert a record without having it stored on the related record.
DELETION ANOMALY: - When you delete some information and lose valuable related information at the same time.
Example: - Update Anomaly
Each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each of his skills). If the update is not carried through successfully—if, that is, the employee's address is updated on some records but not others—then the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly.
Example: - Insertion Anomaly
Each record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Code—thus we can record the details of any faculty member who teaches at least one course, but we cannot record the details of a newly hired faculty member who has not yet been assigned to teach any courses except by setting the Course Code to null. This phenomenon is known as an insertion anomaly.
Example: - Deletion Anomaly
Under certain circumstances, deletion of data representing certain facts necessitates deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears, effectively also deleting the faculty member. This phenomenon is known as a deletion anomaly.
WHAT IS NORMALISATION??Normalization process has proposed by “E. F. Codd” in (1972).
Normalisation is a technique used for designing relational database tables to minimise duplication of information. Data is normalised in order to redundancy and inconsistency, and to make it easier to maintain.
Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.
NORMALIZATION
Normalisation provides rules that help: Organise data efficiently Ensure that only related data are stored in the
table.
There are two goals of the normalization process: Eliminate redundant data (for example, storing
the same data in more than one table) and Ensure data dependencies make sense.
Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
The entire normalization process is based upon:-
The analysis of tables,
Their keys (Primary, foreign, candidate, etc.),
Their schemas,
Their functional dependency.
Normal form is a state of a relation that result by decomposing that relation for a design to avoid redundancy.
The data base community has developed a series of guidelines for ensuring that database are normalised. These are referred to as normal forms and numbered from one to five.
In practical application we will often see 1NF, 2NF, 3NF and BCNF. Along with the occasionally 4NF and 5NF is very rarely seen.
NORMAL
FORMS
Normal Forms
1st NF
2nd NF
3rd NF
BCNF
4th NF
5th NF
1st Normal Form
A relation scheme is said to be in 1NF if only one value is associated with each attribute and the value of that attribute is not a set of values or a list of values.Atomic data values; eliminates duplicate columns from the same table.
Each row is uniquely identified; Needs a primary key, so each row can be unique.
Each field name is also unique.
Example: -
COURSE STUDENT
DATABASE BOBJOESUE
MATH TIMMARY
COURSE STUDENT
DATABASE BOB
DATABASE JOE
DATABASE SUE
MATH TIM
MATH MARY
In order to be in Second Normal Form, a relation must first fulfill the requirements to be in First Normal Form. Additionally, each non-key attribute in the relation must be functionally dependent upon the primary key.
The rules for second normal form are:The table must already be in first normal form.Non key attributes must depend on every part of the primary key.
2nd Normal Form
Example: -
The relation is in First Normal Form, but not Second Normal Form:
Remove subsets of data that apply to multiple rows of a table and place them in separate tables
Order # Customer Contact Person Total
1 Acme Widgets John Doe $134.23
2 ABC Corporation Fred Flintstone $521.24
3 Acme Widgets John Doe $1042.42
4 Acme Widgets John Doe $928.53
To Tables To Satisfy 2NF
Customer Contact PersonAcme Widgets John Doe
ABC Corporation Fred Flintstone
Order # Customer Total
1 Acme Widgets $134.23
2 ABC Corporation $521.24
3 Acme Widgets $1042.42
4 Acme Widgets $928.53
Comment
The creation of two separate tables eliminates the dependency problem experienced in the previous case.
In the first table, contact person is dependent upon the primary key -- customer name. The second table only includes the information unique to each order.
Someone interested in the contact person for each order could obtain this information by performing a JOIN Operation.
Another Example of 2NF Normalization: -
Normalizing EMP_PROJ into 2NF relations.
EMP_PROJ
SSN PNUMBER HOURS ENAME PNAME PLOCATIONFD1
FD2
FD3
2NF NORMALIZATION
EP1
SSN PNUMBER HOURS
FD1
EP2
FD2
SSN ENAME PNUMBER PNAME PLOCATION
EP3
FD3
Any transitive dependencies have been removed in 3NF.
Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.
3rd Normal Form
Example: -Normalizing EMP_DEPT into 3NF relations.
ENAME SSN BDATE ADDRESS DNUMBER DNAME DMGRSSN
3NF NORMALIZATION
EMP_DEPT
ENAME SSN BDATE ADDRESS DNUMBER DNUMBER DNAME DMGRSSN
ED2ED1
A relation is in BCNF, if it is in 3NF and All of its determinants (i.e. The attributes upon which other attributes depends) are Candidate keys.
OR
To convert a 3NF into BCNF, decompose such, that every determinant becomes a candidate key.
Boyce-Codd Normal
Form
Example: -
Here, above ‘CANDIDATE_ID’, ‘ROOM_NO’ and ‘INTVR_ID’ all have the property of being ‘Primary Key’ or can say all are ‘Candidate Key’ into the “INTERVIEW TABLE”.
So, as definition of ‘BCNF’, we will break it into two different tables, “INTERVIEW TABLE” and “ROOM TABLE”.
CANDIDATE_ID INT_DATE INT_TIME INTVR_ID ROOM_NO
INTERVIEW TABLE
CANDIDATE_ID INT_DATE INT_TIME INTVR_ID
INTERVIEW TABLE
INTVR_ID INT_DATE ROOM_NO
ROOM TABLE
4th Normal Form
Multi-valued dependencies are removed, i.e. isolation of independent multiple relationships.
Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.
Example: -
EMPLOYEENAME PROJECT HOBBY
Alexis Microsoft Reading
Alexis Oracle Music
Alexis Microsoft Music
Alexis Oracle Reading
Mathews Intel Movies
Mathews Sybase Riding
Mathews Intel Riding
Mathews Sybase Movies
EMPLOYEE TABLE
Project and Hobby Table
PROJECTNAME PROJECT
Alexis Microsoft
Alexis Oracle
Mathews Intel
Mathews Sybase
HOBBYNAME HOBBY
Alexis Reading
Alexis Music
Mathews Movies
Mathews Riding
Any remaining anomalies are removed. In this normal form we isolate semantically related multiple relationships.
The criteria of 5thNF is also known as PJNF( Project join normal form) and JPNF(Join Projection Normal Form).
The table must be in 4NFTheir must be no non trivial joint dependency that do not follow
from key constraints .The 4NF table is said to be 5NF if and only if every joint
dependency in it is implied by the candidate keys.
5th Normal Form
Example:- Dealers sell Product which can be manufactured by various
companies. Dealers in order to sell the Product should be registered with the Company. So these three entities have a mutual relationship within them.
The above table shows some sample data. If you observe closely, a single record is created using lot of small information. For instance: JM Associate can sell sweets under the following two conditions:
JM Associate should be an authorized dealer of Cadbury. Sweets should be manufactured by Cadbury company.
Continue…
DEALERS PRODUCT COMPANIES
JM Associate Sweets Cadbury
Shiv networks Shoes Nike
Star Sellers Magazine Times
Hari Publishers Books KM Publication
Continued…
These two smaller bits of information form one record of the above given table. So in order for the above information to be “Fifth Normal Form” all the smaller information should be in three different places. Below is the complete fifth normal form of the database.
DEALERS COMPANIES
JM Associates Cadbury
Shiv Networks Nike
Star Sellers Times
Hari Publishers KM Publications
DEALERS PRODUCT
JM Associate Sweets
Shiv Networks Shoes
Star Sellers Magazine
Hari Publishers Books
PRODUCT COMPANIES
Sweets Cadbury
Shoes Nike
Magazine Times
Books KM Publications