bases de datos - modelo racional

UNIVERSIDAD DE BELGRANO

1


The Relational Model

El Modelo Relacional de Base de Datos.

Autor: Prof. PAULA M. ANGELERI

Ctedra: Diseo de Sistemas

- 2001 -


2

Executive Summary

This paper is, basically, a good introduction to the Relational Databases theory. It

explains the main Relational issues in a easy-to-follow manner. It is basically structured in 5

main sections. The first one explains the title of the paper itself, and defines each of its terms.

The second section is an Introduction that explains the benefits of using a database approach. The

third section explains why it is need to Normalized a Relational Database through an explanatory

example. The next section explains how can we Normalized a Relational Database, and the rules

involved in the Normalization process. Again the theory is accompanied by an explanatory

example. After these examples, another mechanism that can help to understand the way the

database is structured is provided: a king of guidelines for mapping the Entity Relationship Data

Model into the Relational Model. This guideline is explained with the purpose of helping the

readers who are familiar with Data Modeling, but still are not familiar with the Normalization

Theory. The idea is to take advantage of this data modeling knowledge, and providing them a

mechanism they can understand, in order to check if the Normalization process was well applied.

The last section describes the problems associated with the Normalization process (low

performance in queries), and then it provides a mechanism to solve this problem. After the paper

body an Appendixes section was included. In this section the origins of the Relational Model is

explained. In addition, a Glossary describes all the relational terms that are not explaining during

the paper.


3

Table of Contents

Do Relational Databases Need Normalization?.......................................................................... 4

What Is a Database? .................................................................................................................. 4

What Does Relational Mean?................................................................................................ 5

The Relational Database Model ............................................................................................. 5

What Is Normalization?......................................................................................................... 7

Introduction ................................................................................................................................. 8

Why Should we Apply Normalization?................................................................................... 11

Data Redundancy.................................................................................................................. 12

Inconsistency among data..................................................................................................... 12

Advantages of Further Normalization .................................................................................. 12

Explanatory example ............................................................................................................ 13

Major New Concepts ............................................................................................................ 16

How to Apply Normalization in a Relational Database?....................................................... 18

The Normalization Theory ................................................................................................... 18

Mapping the ER Model into the Relational Model: ............................................................. 23

Normalization Implications ...................................................................................................... 27

Query Optimization: ................................................................................................................ 29

The Optimization Process..................................................................................................... 34

Conclusion.................................................................................................................................. 40

Appendixes................................................................................................................................. 41

The Origins of the Relational Model ....................................................................................... 41

Glossary ................................................................................................................................... 43

References .................................................................................................................................. 47


4

Do Relational Databases Need Normalization? The question of whether to apply normalization in relational databases is a controversial

question in the database field of study. Some people believe that relational databases must be

normalized because normalization ensures that the data stored in the database are accurate. This

paper will provide enough evidence to show that a relational database needs normalization in

order to maintain consistent data. However, there is still an important problem that the

normalization theory did not resolve: the bad performance in queries.

Those that do not believe in the normalization theory support their argument by saying

that applying normalization provides two inconvenient disadvantages. The first one is that a

normalized database needs more time to answer a query. The second one is that the programmer

must write a larger code in the query programs.

This paper will discuss, in detail, both positions. In addition, the terms database,

relational and normalization will be defined, for those who are not familiar with them.

What Is a Database?

A database is a collection of computerized data. More precisely, a database is a collection

of files logically related. These files contain data that are stored in a permanent way so that they

can be used whenever it is necessary. The access to these data is given by a database system. A

database system is a computerized system composed of data, hardware, software and users (Date

4). The main goal of a database system is to provide the company centralized control of its data

(9).


5

As an example of a database, we can consider the student database of a university. It has a

file that contains the personal data of the students (name, address, telephone, etc.). There is

another file that contains the information about the courses that each student is taking, and there

is a file with the information about the courses that the students have taken in the past.

What Does Relational Mean? In this context, the term relational implies that two or more files have logical links: one

file has a reference to another file. For instance, one file contains the student name, address and

zip code, and the other file contains the zip code and the name of the city. In the first file we have

a logical link to the second file: the zip code, because it can be used as a connector between both

files.

The Relational Database Model The relational model was developped by Codd in the early 1970's. The power and

completeness of this model is derived by the richness of the relational calculus and algebraic

operators in which this model is based. According to Haderle D.J., "Many users say:"It's

intuitive." The simplicity is derived from Codd's development of the model on a sound

theoretical base."

The strengths of the relational model are based on the fact that the data records are not

physically dependent. They are linked by meaning. In other words, records are referenced by

relational data records. The relational model has logical bonds between data.

According to Atre, one of the major advantages of the relational approach is its

simplicity.[...] The end users do not have to be concerned about the physical storage structure;


6

they can be oriented toward the information content of their data and need not worry about the

physical representation details. The ability to use the database without knowing the

representation details is called data independence. (Atre, 93)

In the relational model, data structure is represented in a table or relation, which consists

of a set of tuples (that are often called "rows" or "records.") intersected by a set of columns

(often called "attributes"). Summarizing, a table is divided horizontally into rows and vertically

into columns.

The intersection of a row and a column is a "field". In this model each field contains a

single data value, which may be null. For a given column all data values are of the same type or

"domain." Consequently, for a given table all rows are of the same type. Another important

concept is that in the relational model, a "primary key" is selected for each table. A primary key

is the attribute or shortest group of attributes that identify each row (Nijssen and Halpin 247-

256).

To clarify all these new terms, I will present an example of a table: ID Name Telephone 1576 Angeleri, Paula 788-3395 1598 Paulon, Daniela 773-6889 => rows or tuples 1783 Rabossi, Marcelo 797-0345 Value of a field: "1576" Domain: numbers Primary key: attribute ID, because it is the only column that identifies a tuple. (We can have two

persons with the same name, or the same telephone, but each person has his/her own


7

identification number.) In the example, above, the primary key value is 1576, and there is no

other record with this value.

What Is Normalization? Normalization is the process of converting the files of a database into a more desirable

form (Date 238). Normalization indicates whether a file needs to be split into two or more files.

The following example will show how a file can be split:

Not Normalized Approach:

ID Name Address Zip Code City 1576 Angeleri, Paula 1111 Dartmouth Ave. 91711 Claremont 1598 Paulon, Daniela 124 Foothill Bld. 91711 Claremont 1783 Rabossi,

Marcelo 7666 Arrow Hwy. 91888 Upland

Normalized Approach:

ID Name Address Zip Code 1576 Angeleri, Paula 1111 Dartmouth Ave. 91711 1598 Paulon, Daniela 124 Foothill Bld. 91711 1783 Rabossi, Marcelo 7666 Arrow Hwy. 91888

Zip Code City 91711 Claremont 91888 Upland

You can see thanks to this example how a file can be split into two files, by storing the name of

the city in another file.


8

Introduction

Today all companies need databases to store their data because they have a large volume

of information. The purpose of this stored information is to be recovered when it is need. So it is

important to have the data organized in order to have easier and quicker access to it. This data

organization is known as "data models." A model is an abstraction of reality, often represented in

a diagram. Data is represented in a model in order to become more clear for the database

designer.

The purpose of this research is to provide enough information to convince the reader that

relational databases must be normalized in order to improve the benefits of having a database.

Some of these benefits are:

- Providing quick access to data.

- Sharing data.

- Having consistent information.

- Reducing redundancy.

- Giving on-line access.

- Having up-to-date information.

- Providing standards. (Date, 1990, 13-22)

- Applying security restrictions.

- Preventing failure condition errors.

Quick access to data


9

The database system is equipped with a Data Query Language (DQL) that provides easy and

quick access to consult data.

Data shared

Sharing data means that the data stored in the database can be accessed by different systems and

users.

Consistent information

The information is consistent when the data are accurate. For understanding this meaning

suppose that the database has duplicated information. There could be some occasions on which

this information is updated only once, leaving the other duplicated data without any change. This

is called update anomaly. (Data inconsistency will be explained later on, in more detail).

Redundancy

In a non-database approach, each computer program has its own private files. That implies

redundant information because data used in two different programs must be stored twice. In a

database, data is shared by many users and many programs, reducing redundancy among data that

can carry to inconsistencies. (Data Redundancy is also explained later on).

On-line access

Information is available on demand at any time, in an interactive way.


10

Up-to-date information

As the information is centralized, data are updated from different system programs operated by

users of different departaments. Therefore data is current.

Standards

Applicable standards are used to make uniform data and documentation. Also the database

approach provides standard reports and forms that help system programmers.

Security restrictions

The database system provides an effective control to data access. Each user of the database has a

limited view of the data and a restricted authorization access. Therefore the database system

provide security (preventing data damage) and confidentiality (data are seing only for authorized

users) (Davis, 42).

Failure condition errors

The database system provide a mecanism to recover data in the event of either hardware or

software failure (Davis, 43).


11

Why Should we Apply Normalization?

Databases help to resolve problems. But it is necessary to use some patterns in order to avoid

other problems. For example, normalization eliminates data redundancy.

Another advantage of applying normalization patterns is that normalization reduces problems

concerning data manipulation: inserting, deleting and updating data (Dutka and Hanson 9).

Finally, Normalization is applied in order to avoid a problem called inconsistency among

data. When data are not consistent, they are not reliable.

Summarizing, the Normalization process is applied in order to achieve the following

objectives:

1. It ensures that the database does not present insert, update, or delete anomalies.

2. It allows that new kinds of data were introduced in the database, without the need of

restructuring it. This is important because it enlarges the lifetime of the applications.

3. It stresses the benefits of having a database, by made it more informative for the user.

4. It ensures that the database is free of any design bias that favors certain queries at the expense

of others.

5. It allows that any relation can be represented as a table, by eliminating repeating groups.

6. It simplifies the data access operations.

However, the normalization process can bring a couple of possible disadvantages of

further normalization(Codd):

1. It incurs in extra relation names.


12

It makes necessary for some queries to use more join operations than a denormalized

database. 1 According to Date, Codd explained that this inconvenience could be solved by

providing predefined views for heavily used queries. (Date, March 1999).

Data Redundancy Data redundancy means that the same data is stored in the database more than once. This is

not recommended because it means a duplication effort (the same update operation must be

performing as many times as duplication of the data exist). Secondly, data redundancy means a

waste of storage space because the data is stored repeatedly. Finally, it is possible that an

inconsistency among data could occur, as data cannot be updated in everyplace where it appears.

(Elmasri and Nevathe, 12)

Inconsistency among data

As I have mentioned before, a database can have multiple values of the same data, but

these values can not agree between each other. This phenomenon is called Inconsistency. It is

clear that an inconsistent database has contradictions. Therefore it will supply incorrect

information to its users (Date 10).

Advantages of Further Normalization Normalizing a Relational Database gives certainly many advantages, as I explained

before, now I want to clarify that these advantages can be accrued when adopting a higher level

of normalization, referring to the use of the Second and Third Normal Forms (NF), rather than

1 The term denormalized is not used by Date in this article, but it is used in at least one of his books.


13

merely the First NF. This explanation is necessary due that a database is considered

Normalized when it tables are in First NF.

Explanatory example

A comparison will prove further the need for "Normalization". I will present an example

of a relational database that is not normalized. I will find out the problems that this database

presents. And I will show the same situation in a normalized relational database.

Suppose we have information about students and courses. And we use an approach that is not

normalized to store them:

Student_name address Course Mark year Fini, Mercedes 343 6th Street Mathematics I A 1997 Fini, Mercedes 343 6th Street Physics I A- 1997 Smith, John 23 Sunset Avenue Mathematics I A- 1997 Roland, Peter 86432 Two Nods Rd. Mathematics I B 1997 Fini, Mercedes 343 6th Street Mathematics II B- 1998 Now, suppose that Mercedes Fini has changed her address. We must update her address in the

database. So the user searches Mercedes Finis register and finds:


14

Student name: Fini, Mercedes Address: 343 6th Street Course: Mathematics I Mark: A Year: 1997

Then, he/she will change Mercedes Finis address: Student name: Fini, Mercedes Address: 567 Main Street Course: Mathematics I Mark: A Year: 1997

And now we have: Student_name Address Course mark year Fini, Mercedes 567 Main Street Mathematics I A 1997 Fini, Mercedes 343 6th Street Physics I A- 1997 Smith, John 23 Sunset Avenue Mathematics I A- 1997 Roland, Peter 86432 Two Nods Rd. Mathematics I B 1997 Fini, Mercedes 343 6th Street Mathematics II B- 1998


15

At this moment, I ask: Which is Mercedes Finis real address?

This database does not provide a good answer for my question. It will tell me that

Mercedes Fini has two different addresses, or it will tell me that Mercedes Fini lives at 343 6th

Street (and this is not true anymore), or it will tell me that Mercedes Fini lives at 567 Main Street

in a first query but subsequent queries could give me a different answer. The answer will depend

on the way the user expresses the query. At this moment you can ask: Why we cannot use an

approach like this below?

Student_name Address Course mark year Fini, Mercedes 343 6th. Street Mathematics I A 1997 Physics I A- 1997 Mathematics II B- 1998 Smith, John 23 Sunset Avenue Mathematics I A- 1997 Roland, Peter 86432 Two Nods Rd. Mathematics I B 1997

In this approach, Mercedes Fini has one address, and the user will not have any problem

to update it. However, the user will be in trouble if he/she tries to update, add or delete a course,

because all Mercedes Finis courses are together in a cell, and we could not use them separately.

At this moment, the question: Why it is needed to apply normalization? is answered. It

is necessary to apply normalization in order to avoid update, delete, and insert anomalies that will

introduce inconsistency in the database.

At this point we have another problem: How to apply normalization? Before answering this

question, I will introduce some database concepts, so the vocabulary used during the next

pages can be easy understood.


16

Major New Concepts

Functional Dependence: having R.A and R.B, where A and B are attributes or set of attributes

of the relation R, the notation R.A R.B means that B is functionally dependent on A or A

functionally determines B in R. This functionally dependent relationship implies that knowing

the value of A, we can obtain the value of B (Kroenke, 112).

We can simplify the notation if R is understood: A B.

Transitive Dependence: A C is a transitive dependence if A B and B C. According to

Codd, C is strictly transitively dependent on A.

Candidate key: a candidate key K is the attribute or set of attributes that identifies a unique row

in a relation (Kroenke, 502). A candidate key is required to be unique, and nonredundant.

This latest term is defined by Date as no attribute can be discarded from K without destroying

the uniqueness property. Date states that the terms unique and irreducible will better suit

the definition of a candidate key. Additionally, Date remarks the importance of Codd

observations regarding the candidate key concept: (a) each attribute of relation R is functionally

dependent on each candidate key of R, and (b) within any given candidate key, no proper

subset of the attributes is functionally dependent on any other.

According to Date, these two statements imply that each attribute of R is functionally dependent

on each superkey of R being a superkey a superset of a candidate key (what implies that a

superkey is obtained by discarding the nonredundant requirement of a candidate key).


17

Primary key: is one of the candidate keys, arbitrarily chosen by the database designer.

The operational difference between a primary key and a candidate key is that a tuple cannot

have an undefined value for any of the attributes that compose the primary key, meanwhile any of

the components of a candidate key can have an undefined value.

Insert/Update/Delete Anomaly: according to Date neither of this term was formally defined,

so he did not define them. For this reason I will take the definition from Kroenke (501):

Anomaly is an undesired consequence of a data modification.

An insertion anomaly occurs when it is need to add facts about two or more different terms to

a single row of a relation. This means that for adding a row in a table, it is need to give values for

some attributes that are not really need for this particular event. For example, for incorporating a

particular Department in a company, data about an employee is required.

A deletion anomaly occurs when some particular information is lost when deleting a single row.

This happens when, for example, it is required to delete an employee and in the process, the

database lost the information regarding his Department (what it is not desired).

An update anomaly occurs when there is more that one entry for a particular data, and the value

of one of these entries changes, meanwhile the rest remains unaltered. For example, I keep the

address of an employee twice, and I change its value only once.

According to Date, Codd stated that the anomalies are due to redundant information. Date

attributes this undesired situation to a low level of normalization in the database. (Date, March

1999).


18

How to Apply Normalization in a Relational Database?

Normalization implies decomposition. That means to transform tables that are too large

into smaller ones that are free of update anomalies. Normalization technique provides a complete

theory for decomposing a table (Nijssen and Halpin 273). To explain how to apply this theory I

will use the same student-courses example:

TABLE: STUDIES

Student_name Address Course mark Year Fini, Mercedes 343 6th. Street Mathematics I A 1997 Physics I A- 1997 Mathematics II B- 1998 Smith, John 23 Sunset Avenue Mathematics I A- 1997 Roland, Peter 86432 Two Nods Rd. Mathematics I B 1997

The Normalization Theory The normalization theory involves 6 general rules. They are called Normal Forms. The

following picture shows the relation among them:

1NF2NF3NFBCNF 4NF 5NF


19

The circles imply that a relational database is in Fifth Normal Form (5NF) if it is in 4NF

(and of course satisfies other conditions).

A relational database is in 4NF, is it is in BCNF (Boyce-Codd).

A relational database is in BCNF, is it is in 3NF.

A relational database is in 3NF, is it is in 2NF.

A relational database is in 2NF, is it is in 1NF.

For the purpose of this paper I will only explain the first three Normal Forms: 1NF, 2NF, 3NF.

First Normal Form: to satisfy the 1NF, a table or relation must have a primary key (unique

identifier). In addition, it cannot have null values for the primary key attributes, and each row-

column intersection must have single values (no repeating groups are allowed). Finally, all non-

key attributes must be functionally dependent on the primary key. The term functionally

dependent means that knowing the value of the attribute A, you can know the value of the

attribute B. Therefore the attribute B is functionally dependent on the attribute A, and the

attribute A determines the value of the attribute B.

Following this rule, the example above must be modify in order to avoid "repeating group"

(when a single field is used to store more than one value). At the example, Mercedes Fini has 3

course values: Mathematics I, Physics I and Mathematics II. The only way to store these 3 values

is having 3 different tuples or records with Mercedes Fini's courses:


20

TABLE: STUDIES

student_name Address Course mark year Fini, Mercedes 343 6th Street Mathematics I A 1997 Fini, Mercedes 343 6th Street Physics I A- 1997 Fini, Mercedes 343 6th Street Mathematics II B- 1998 Smith, John 23 Sunset Avenue Mathematics I A- 1997 Roland, Peter 86432 Two Nods Rd. Mathematics I B 1997

Second Normal Form: in a 2NF table or relation, each non-key attribute (neither a key nor a part

of a composite key) must be functionally dependent on the whole of the key (not just a part of it).

In other words: all the attributes that are not a part of the primary key must depend on all the

attributes that compose the primary key. And, of course, the relation must be in 1NF.

Consider the table STUDIES: the primary key of the table is the set of attributes (student_name +

course) because knowing the values of both of them we can determine the whole register. Notice

that, by knowing just the student_name (Ex. Mercedes Fini) we cannot know that Mercedes

Finis grade in 1997, at the course Physics I, was B-. This is due to the fact that Mercedes Fini

has more than one register. Also, by knowing a value of the attribute course (Ex. Mathematics I)

we can not predict the values of the other attributes of the register. That is because there are many

students who studied Mathematics I.

We have selected the primary key for the table STUDIES. Now we have to apply the 2NF:

Do both, the student_name and the course determinate the non-key attribute address?

No, we can know the address of the student by knowing only the student_name. Therefore this

example does not follow the 2NF rule.

To solve this problem we must divide the table STUDIES in two tables:


21

TABLE: STUDENT student_name Address Fini, Mercedes 343 6th Street Smith, John 23 Sunset Avenue Roland, Peter 86432 Two Nods Rd. TABLE: STUDIES Student_name Course Mark year Fini, Mercedes Mathematics I A 1997 Fini, Mercedes Physics I A- 1997 Fini, Mercedes Mathematics II B- 1998 Smith, John Mathematics I A- 1997 Roland, Peter Mathematics I B 1997 Now, Mercedes Fini has only one address because we have eliminated redundancy. The user can

update her address without any problem.

Third Normal Form: to be in 3NF, a relation must be in 2NF, and all non-key attributes must

depend on the primary key directly (no transitive dependencies are allowed).

By observing the next table, we can see that the mark of the student depends on the primary key

(student_name + Course + year). The problem here is that the attribute concept depends on the

attribute mark, which is not the primary key of the relation. This implies that this table it is not in

3NF.

TABLE: STUDIES Primary key: student name, Course, year student_name Course mark Year Concept Fini, Mercedes Mathematics I A 1997 Outstanding Fini, Mercedes Physics I A- 1997 Excellent Fini, Mercedes Mathematics II B- 1998 Good Smith, John Mathematics I A- 1997 Excellent Roland, Peter Mathematics I B 1997 Very Good


22

The following two tables show how the previous table was decomposed for accomplishing the

3NF. We can see how the attribute concept was moved into another table, so we do not have

anymore a transitive dependency.

TABLE: STUDIES Primary key: student_name+course+year Student_name course mark Year Fini, Mercedes Mathematics I A 1997 Fini, Mercedes Physics I A- 1997 Fini, Mercedes Mathematics II B- 1998 Smith, John Mathematics I A- 1997 Roland, Peter Mathematics I B 1997 TABLE: MARKS Primary key: mark mark concept A Outstanding A- Excellent B- Good A- Excellent B Very Good

Now, we have seen how to apply normalization, and before we move to the next section I

want to explain that it is strongly recommended to use a number (or letters) identifier for the

primary key, rather than the name, the description, or other fields like them. This is because you

can probably have more than one person/customer/student/etc with the same name. Do you

never see a boy and his father sharing the same name?

Another problem you can have, when using a name or description for identifying the

tuples in a given relation is that sometimes, a person can type them differently. When this

happens, if someone want to process a statistic, the identifiers will look differently for the

processor, as it will not understand that different letters can expressed the same. Think in the


23

example: Engineering or Enginering or engineerng. For the processor, they are basically

three different names.

For those that are not familiar with the normalization theory, but they are familiar with

the Data Modeling techniques, I will add this following section that explains how to convert your

Entity Relationship Model into a Relational model. After following this process it is

recommended to check each table, one by one, to see if a mistake was produced during the

conversion, and the database is not well normalized.

Mapping the ER Model into the Relational Model:

For mapping the ER to the RM you can follow the following guidelines, taken from

Elmasri and Navathe (172-178):

Step1: each entity type is mapped into a Relation or table in the RM, include the single

attributes of the entity, as columns (attributes) of the table, and select one of more attributes as

the primary key (unique identifier).

Step2: each weak entity (or dependent entity) is mapped into a Relation. The primary key of this

relationship will be formed by a combination of the primary key of the owner entity (the strong

entity), and the partial identifier of the weak entity.

Step3: each binary relationship 1:1 is mapped by duplicating the primary key of one Relation as a

foreign key in the other Relation. Can be also mapped by migrating both primary keys as foreign

keys but I do not recommend this approach, as it can carry to inconsistencies, Finally, it can be

mapped by merging both entities and the relationship into one single entity (this approach can be


24

used when none of the entities participate in any other relationship, and both entities are total). I

still recommend the first approach.

Step4: each strong binary relationship with cardinality 1:N is mapped by migrating the attributes

of the primary key at the 1-side of the relationship as a foreign key in the table that is at the N-

side of the Relationship. It is also possible to build a table to represent the relationship. This

second approach is particularly selected when:

- the relationship can be M:N in the future

- a lot of records will have the null value, and this foreign key will be almost null.

Step5: each binary M:N relationship is mapped as a new table whose primary key is composed by

the primary key of one entitys table, and the primary key of the second entitys table.

Additionally, if this relationship has its own attributes, they are transformed in columns of the

new table.

Step6: each multivalued attribute M is mapped into a new table whose primary key is composed

by the primary key of the entity or relationship that has the multivalued attribute M, plus an

additional attribute of M that makes this primary key unique. If M is a composite attribute, all its

simple components are included in this new table.

Step7: each n-ary relationship is mapped into a new table. In this new table all the attributes that

composed the primary key of the each of the n-entities (usually three entities) become columns.

In addition, each attribute belonging to the relationship itself is mapped as a new column of the

new table. For identifying the primary key, we need to take into consideration the cardinality of

the relationship:

For knowing the cardinality you must ask yourself:


25

Project

Project

- How many PARTS of the same (one) supplier are needed for one project? The answer will depend on the business rules. For instance, MANY (0..M) parts are provided by supplier S1, for the project P1.

Then ask yourself: - In how many PROJECTS does the same (one) supplier provide the same (one) part? Still the answer will depend on the business rules. For instance, the same part (P1) is provided by one supplier (S1) for MANY (1..M) projects.

Supplier Part

Supplier Part


26

Project

Finally, you must ask about the third entity:

- How many SUPPLIERS provided the same (one) part for the same (one) project?

The answer will also depend on the business rules. For instance, MANY (1..M) suppliers can

provide the same part (P1), for the same project (PJ1).

Following my example (that is similar of the one proposed by Elsmari and Navathe, and

the same ternary relationship described in many data modeling books), we have that the

cardinality of this relationship is M:N:N. Notice that we have one maximum cardinality for each

entity that participates in the relationship. In this example we have three entities, and three

maximum cardinality: M,N,N.

In a ternary relationship, when the cardinality is M:N:N, the primary key of the table that

represents this relationship in the relational model will be composed by the concatenation of the

primary keys of each entity.

If the business rules established that the supplier that provides a particular part (P1), for a

particular project (PJ1), is always the same (what means that no other supplier can provide the

Supplier Part


27

part P1 for the project PJ1), it is easy to know who is the supplier, by knowing the part identifier

(Part#) and the project identifier (Proj#). Talking in functional dependence terms, the part

number and the project number determine the supplier number.

(Part#, Proj#) => Supplier#

In this particular case, the primary key of the relationship table

SUPPLIERS_FOR_PROJECTS can be composed by the primary key of the PART table, and the

primary key of the PROJECT table: (Part#,Proj#). However, I still recommend to used all the

foreign key as the composite primary key (instead of only two fields), as the business rules can

changed, and it will be very hard to modify all the existing applications.

Normalization Implications

We have seen how normalization is important, because it helps us to avoid inconsistency

among data. However, it has other effects that are not desired. One of the problems of

normalizing a database is that you lose performance during the queries. As Nijssen and

Halpin say, the fact of reducing the number of tables, reduce the time of the query. On the

contrary, increasing the number of tables during the normalization process will increase the time

of the query execution (254). In other words, you spend more time during each query because

you need to link (join) more tables.

Some people believe that a database must be denormalize in order to have a better

performance during the query processing. According to Elmasri and Nevathe, denormalization

means that the attributes that are frequently required together in a query operation must be stored

in the same table or relation. However, it is necessary to control that each update transaction


28

modifies all the values of a duplicate attribute in order to maintain the database consistent (469).

In the not normalized example presented before, we have seen that Mercedes Fini has three

times the address stored in the database. This implies that all the programs that modify the

address attribute must check that the value of this attribute was updated as many times as it

appears in the database. As you can understand, this denormalized approach is risky. For

example, a new programmer can forget to include in his/her program the controlling routine,

which ensures that all the values of a same data were updated simultaneously. By this, he/she will

introduce the undesirable inconsistency in the database.

The second problem is clearly states by Yao, when he says:

The fragmentation of data into normalized relations unfortunately makes it necessary to

write tedious and lengthy queries to obtain useful information.(107)

At this point I cannot disagree with this statement. It is true that the query will be longer in a

normalized approach. This is due to the fact that the query language needs the programmer to

clearly state what tables he/she will use, and which is the attribute or group of attributes that link

them.

For example, in a query language known as SQL (structured query language) you will have

the following instruction to obtain a data from only one table:

Select student_name, address, course From table STUDIES Where student_name= Fini, Mercedes;

This query will give me a report with Mercedes Finis address and courses.

In a normalized approach the same report will be obtained through the following instruction:


29

Select student_name, address, course From table STUDENT, STUDIES Where student_name= Fini, Mercedes And STUDENT.student_name=STUDIES.student_name;

The last sentence shows the relation between the two tables (the primary_key attribute or

attributes that links them). You will have one of these sentences for each primary_key attribute

that links each table that contains the attributes that you want to list. As a consequence, the code

of the query is longer. However, it will take only few minutes more to write these few additional

sentences in the code.

In summary, I strongly recommend to normalize your relational database, even if the

query code become longer or the query performance is slower.

The following section explains the Query Optimization process that is provided by almost

every RDBMS.

Query Optimization: Did you try to do a query joining two tables with a large number of fields, and a large

number of records? If you did it, you probably will agree that the performance of the query

(response time) is not very convenient. Did you try to do the same query using a RDMS that

provides you the Query Optimization capability? If so, you must realize that the response time

has considerably improved. I will try to explain the way the Query Optimization capability

works, thought an example. Consider the following tables:

TABLE: STUDENT StudentID Student_Name Address ZipCode 10 Fini, Mercedes 343 6th Street 91711


30

20 Smith, John 23 Sunset Avenue 91711 30 Roland, Peter 86432 Two Nods Rd. 91716 40 Funson, Larry 345 8th Street 91711 50 Neuman, Paul 1010 Foothill Ave. 91711 60 Gibson, Mel 863 Two Birds Rd. 91716 70 Tapia, Roland 343 6th Street 91711 80 Banderas, Peter 273 La Luna St. 91714 90 Sony, Benson 842 Main Boulevard 91716 TABLE: STUDIES StudentID CourseID Mark Year 10 101 A 1997 10 104 A- 1997 10 102 B- 1998 20 103 A- 1997 30 101 B 1997 40 103 A 1997 40 104 A- 1997 50 105 B- 1998 60 101 A- 1998 60 105 C 1999 70 104 A+ 1999 70 106 A 1999 80 104 B 1999 90 103 A+ 1999 90 105 B 1999 TABLE: COURSES CourseID Course_Name 101 Mathematics I 102 Mathematics II 103 Mathematics III 104 Physics I 105 Physics II 106 Physics III

Suppose we want to perform the following query:

SELECT Student_Name FROM STUDENT, STUDIES WHERE STUDENT.StudentID=STUDIES.StudentID AND STUDIES.CourseID=101;


31

The corresponding Algebraic expression for this query is:

(( STUDIES JOIN STUDENT ) WHERE CourseID=101 ) [Student_Name]

In this example we have 9 students, 15 courses taken by the students (studies), and only 3

students that have taken the course 101. Now we will see how this query is performed without

any optimization at all:

First: JOIN relations STUDIES and STUDENTS (over StudentID).

During this step, the 15 tuples from the COURSE table are reading. Then each of the 9 students

tuples (records) is read 15 times, once per each course. This operation involves reading 135

tuples. The intermediate result consist on 15 joined tuples (that is, tuples that contain the fields

of both tables, and suite the condition: STUDENT.StudentID=STUDIES.StudentID).

Second: RESTRICT the result of the first step to just the tuples that suite the condition (tuples

where CourseID=101).

During this step, the 15 joined tuples are read in order to look the tuples that have

CourseID=101. The result is a relation that contains only 3 tuples.

Third: PROJECT the result of the second step over Student_Name.

During this step, the 3 tuples relation obtained are read, in order to select the Student_Name of

each. The output is a new relation containing only the Student_Name attribute, and the tuples that

are no duplicating (in this example there are no duplicate tuples, so the new relation will contain

3 tuples).

Now, if we count the processed tuples, we have that this query, performed in this way, involves

189 (15+135+15+15+3+3+3) processed tuples.


32

Resolving this query by following the optimization process would involve the subsequent

steps:

First: RESTRICT relation STUDIES to just the tuples for course 101.

During this step, the 15 tuples of the STUDIES relation are read in order to produce the 3 tuples

intermediate relation that suites the condition CourseID=101.

Second: JOIN the result obtained during the first step, to the STUDENT relation (over

StudentID).

During this step, the 9 tuples from the STUDENT relation are reading. Then each of the 9

students tuples (records) is read 3 times, once per each 101 course. This operation involves

reading 27 tuples. The intermediate result consist on 3 joined tuples (that is, tuples that contain

the fields of both tables, and suite the condition: STUDENT.StudentID=STUDIES.StudentID).

Third: PROJECT the result obtained during the second step over Student_Name.

During this step, the 3 tuples of the obtained relation are read in order to select the

Student_Name of each. The output is a new relation containing only the Student_Name attribute,

and the tuples that do not have duplicates (in this example there are no duplicate tuples, so the

new relation will contain 3 tuples).

Now, if we count the processed tuples, we have that this query, performed in this second way,

involves 63 (15+3+9+27+3+ 3+3) processed tuples.

In this example, tables are smaller enough to be kept in main memory (only the 135

tuples relation will be probably kept on disk, depending on the hardware capabilities). Still, it is

easy to realize that the second approach is better, as the total number of tuples processed is

smaller. In addition, we can improve the performance even more by indexing the STUDIES


33

relation on CourseID, so that the total number of tuples read from STUDIES will be only 3

(according to this example). Again, an index or hash on StudentID can be used in STUDENT

relation to improve the second step, by reading only the three needed tuples, instead of 9.

In real life this optimization is very important, as we will surely have a larger number of

records. For instance, suppose our STUDENT table contains 9,000 records, our STUDIES table

has 34,000 records, and only 800 records match the condition students taken the course 101.

Now, the maximum number of processed tuples can be 306,104,400 for a non-optimized query,

and 7,246,200 for a query that follows the optimization process. This is because, processing this

query without optimizing it, involves the following steps:

First, read the 34,000 STUDIES records, then, read the 9,000 STUDENTs records 34,000 times

(that is 306,000,000 tuples in total), and construct the intermediate 34,000 joined tuples table.

Due to the large number of records, we cannot keep all this tuples in main memory, so we need

to write them out to disk, and then read them from disk (what takes longer than keep them in

memory). Once we generated the intermediate 34,000 joined tuples table, we read this table back

into memory, and produce an 800 tuples tables (by performing the Restriction operation). We

assume this intermediate relation cannot be kept in memory according to its size, so it will be

need to save it out to disk. Finally, we need to read these 800 tuples again in order to perform the

Projection, resulting in a new 800 tuples relation. In total we have: 34,000 + 306,000,000 +

34,000 + 34,000 + 800 + 800 + 800 = 306,104,400 processed tuples (this is the maximum

number, when no duplicate tuples were found).

Following the optimization process we have:


34

First, read the 34,000 tuples of the STUDIES relation and produce the 800 tuples relation

output. Then, read the 9,000 tuples from the STUDENT relation and, read each of the tuples 800

times, once per each 101 course (total 7,200,000 tuples processed), and produce a new 800

joined tuples relation. Take these 800 tuples and make the projection, in order to obtain the

desired relation (that has maximum 800 tuples, if no duplicate tuples are found). In total we

have: 34,000 + 800 + 9,000 + 7,200,000 + 800 + 800 + 800 = 7,246,200 processed tuples. You

easily can see how the optimization mechanism has considerably improved this query

performance. When the tables involved in the query have a larger number of records and

attributes, the advantage of using the query optimization mechanism can be even more important

(an easily noticed). Sometimes, this performance in the queries can be still improved more, as the

smaller number of reading tuples makes possible that the intermediate results were kept in main

memory, instead of being kept in disk space (so no I/O accesses to disk storage are needed).

Additionally, the use of index and hash mechanisms allows improving this performance much

more, as the number of reading tuples can be still reduced.

The Optimization Process

The optimization process can be described as a mechanism by which a query is broken

down into smaller queries that involve a lower number of processed tuples and a lower number

of processed fields. In other words, the optimization process tries to avoid the Cartesian product

operation (that involves a considerable large number of tuples and fields to be processed), and

tries to reduce to a minimum the total number of tuples to be scanned at each stage (by applying


35

first a restriction operation). This mechanism considerably improves the performance in

resolving the query.

Sometimes, the query optimization process can be called: the query decomposition

process. During this process, a query that involves multiple tuple variables is transformed into

smaller queries that involve one or two variables each, by using the detachment and tuple

substitution mechanisms. During the detachment process, the query is decomposed in smaller

components that have one variable in common with the rest of the query, and that cannot be

decomposed any more (this is often called reduction). These components are separate from the

query and then they are substituted by a new subquery where a chosen variable is assigned to a

set of possible calculated values through the tuple substitution mechanism (generally called

nested loops). It is important to notice that detachment generates two queries from one, for this

reason it is applied before applying the tuple substitution mechanism. This second mechanism is

applied once it is no possible to apply detachment anymore. These two mechanisms are applied

repeatedly in order to decompose the query into irreducible subqueries (components) in order to

win the flexibility needed for improving the query performance. The idea is to apply first the

operations that reduce the absolute sizes of the intermediate results considering their attributes

size and the number of them-. As a rule, it is needed to resolve first the one-tuple variable, and

then the two-tuple variables. The following example explains this process:

Consider the following tables:

TABLE: STUDENT StudentID Student_Name Address ZipCode 10 Fini, Mercedes 343 6th Street 91711


36

20 Smith, John 23 Sunset Avenue 91711 30 Roland, Peter 86432 Two Nods Rd. 91716 40 Funson, Larry 345 8th Street 91711 50 Neuman, Paul 1010 Foothill Ave. 91711 60 Gibson, Mel 863 Two Birds Rd. 91716 70 Tapia, Roland 343 6th Street 91711 80 Banderas, Peter 273 La Luna St. 91714 90 Sony, Benson 842 Main Boulevard 91716 TABLE: STUDIES StudentID CourseID Mark Year 10 101 A 1997 10 104 A- 1997 10 102 B- 1998 20 103 A- 1997 30 101 B 1997 40 103 A 1997 40 104 A- 1997 50 105 B- 1998 60 101 A- 1998 60 105 C 1999 70 104 A+ 1999 70 106 A 1999 80 104 B 1999 90 103 A+ 1999 TABLE: COURSES CourseID Course_Name 101 Mathematics I 102 Mathematics II 103 Mathematics III 104 Physics I 105 Physics II 106 Physics III


37

TABLE: CITY ZipCode City_Name 91711 Claremont 91712 Rancho Cucamonga 91713 Upland 91714 Montclair 91715 San Dimas 91716 Pomona Suppose we want to perform the following query Q0: Get names of students that live in Pomona

and have taken Mathematics I during 1997.

SELECT Student_Name

FROM STUDENT S, STUDIES E, COURSES C, CITY L

WHERE L.City_Name= Pomona

AND L.ZipCode=S.ZipCode

AND S.StudentID=E.StudentID

AND E.Year=1997

AND E.CourseID=C.CourseID

AND C.Course_Name= Mathematics I;

The tuple variables in query Q0 are S, E, C, and L, where S varies over the limits of the

relation STUDENTS, E ranges over the relation STUDIES, C ranges over COURSES, and L

ranges over the relation CITY.

The first process involves the detachment of the one-variable queries L, C and E, as

follows:

D1: L = Select ZipCode from CITY where City_Name= Pomona;


38

D2: C = Select CourseID from COURSES where Course_Name= Mathematics I;

D3: E = Select CourseID, StudentID from STUDIES where Year=1997;

Now, the remained query will be:

Q2: SELECT Student_Name

FROM STUDENTS S, L, E, C

WHERE S.ZipCode=L.ZipCode

AND S.StudentID=E.StudentID

AND E.CourseID=C.CourseID;

Now we process the two-variable detachments:

D4: S = Select StudentID, Student_Name from S where S.ZipCode=L.ZipCode;

D5: E= Select StudentID from E where CourseID=CCourseID;

And the remained query will be:

Q3: SELECT Student_Name from S where S.StudentID= E.StudentID;

Another way to do it is to process:

D4: E= Select StudentID from E where CourseID=CCourseID;

And the remained query will be:

Q3: Select Student_Name from S where S.ZipCode=L.ZipCode

And S.StudentID= E.StudentID;


39

D 1 D 3D 2

L C E

D 4 D 5

Q 3

C E

E

L

S

S

O v e r a l lr e s u l t

The following picture shows the Decomposition Tree for Query Q0:

As a general rule, first it is need break down the RESTRICTION operation (with more

than one condition) into a cascade of RESTRICTION operations, to obtain the flexibility needed

for moving the RESTRICTION operations down, in the tree. Then it is needed to reorder the tree

nodes in order to process the most restrictive RESTRICTIONS first (that are the

RESTRICTIONS that produce a relation with the smallest number of tuples). Next, break down

the PROJECTION attributes, and put them as down in the tree as possible, by creating new

PROJECTION operations. This will reduce the overall number of processed attributes. Finally,

identify the group of operations that can be processed together.


40

Conclusion A relational database needs to be consistent in order to make its data reliable and useful.

It is possible to have consistency by two different approaches: (1) applying normalization or (2)

not applying normalization in the relational databases. In the first approach, the database will be

always consistent. If we take the second position, or denormalization, we need to include control

routines in every program that updates the database. This means that the second approach implies

that the developer of the database is taking a risk. As I have mentioned, a new programmer can

forget to add the controlling routine in a program, allowing that the same data was updated in one

tuple, but not in others. This means that the second approach DOES NOT ALWAYS ensure

consistency among data. Therefore, it is recommended to select the first approach applying

normalization since this approach guarantees the data consistency. If you still have some doubts

about the query performance, remember that it can be improved through the Query Optimization

mechanisms. This process does not only improve the query response time, it also allows the

subqueries to be processed simultaneously in a multiprocessor environment.


41

Appendixes

In this section of the paper I will give an overview about the origins of the Relational

Model. I do not add this material in the paper body, because I am trying to focus only in the

important issues of the Relational Model: basic concepts, normalization, and query optimization.

First thing you need to know about the Relational Model, is that the most prestigious

researchers in this field of study are Codd, E.F. and Date, C.J. (that is the reason I mentioned

them so many times).

The Origins of the Relational Model

According to Date, database management has evolved into a science thanks to Codds

work, that has provided a theoretical framework (the relational model) within which a variety of

important problems could be attacked in a scientific manner. He also recognizes that the main

importance of Codds work is that he has not only introduced the Relational Model, but the

concept of a data model, needed to deal with the problems associated with the storage of a

large amount of data.

E.F. Codd was the first researcher who introduced the logic of predicates in the database

management field, enlarging his contribution with the introduction of the relational algebra and

the relational calculus in the relational data model. Furthermore, Codd has developed the first

relational data language (ALPHA) that allowed the manipulation of data independently of its

physical representation (this independence is nowadays called physical data independence,


42

mentioned as one of the advantages that the relational model has over the network model). In

addition, Codds papers introduced the term functional dependence, and the first three normal

forms of the Normalization Theory.

Codds second relational paper: A Relational Model of Data for Large Shared Data

Banks published in 1970 (Communications of the ACM 13, No.6) is probably the most famous

paper in the history of database management. (Date, Dec.1998)

Codds first paper is focused on data redundancy and related issues. This second paper

seems like a revision of the first one, but it stresses the importance of providing physical data

independence within a data model. In addition, this second paper includes some new concepts.

For instance, it introduces the idea that data must be always normalized. As a result, this paper

has had a big repercussion in the database field of study.

Codds third paper, in 1972, provides a formal definition for both: relational algebra and

relational calculus. Then, it presents an algorithm for mapping a calculus expression into an

algebra expression. Through this, Codd has proved that the algebra is as powerful as the calculus.

The algebra includes both, objects, and operators that follow particular principles. The

relational algebra has relations, as its objects, and has operations such as restriction, projection,

union, difference, join, etc. These operators satisfy certain laws, such as associativity,

commutativity, and closure. Closure is very important for the relational algebra, and it implies

that the result of an operation between two relations is always another relation. (Date, Jan.1998).


43

Glossary

Alternate key: this term was apparently introduced by Date. An alternate key is any of the

candidate keys that were not chosen as the primary key of the relation (Date, March 1999).

Cartesian product: the relational Cartesian product produces a set of tuples, instead of simple

pairs. The result of a Cartesian product expression is a tuple, containing all components of the

first relation, together with all the components of the second relation. This algebraic operator is

commutative and associative (Date, Jan.1999)

Composition: the composition of the relation A{X,Y} with the relation B{Y,Z) is the projection

on X and Z of a join of A with B, always considering that X,Y and Y,Z are attributes of the

corresponding relations A and B. (Date, Nov.1998)

Consistency: at this point, Dates states that Codds definition of consistency doesnt capture all

aspects of integrity. My interpretation of Codds definition of consistency is as follows: a set of

relations is consistent when the redundancies match, by this I mean that the redundant data do

not have different values. According to Codd, to ensure consistency the system should be

informed about the redundancies in order to control them. (Date, Nov.1998)

Derivability: a relation is said to be derivable if and only if its expressible. (Date, Nov.1998)

Difference: the difference of two relations is a third relation formed by the tuples of the first

relation that do not appear in the second relation (Kroenke).

Division: the division is the inverse of the Cartesian product, as you can see in the following

formula quoted from Dates article: ( R TIMES S ) DIVIDEBY S = R (Date, Jan.1999)


44

Factoring: this operation is also called nesting. The factoring operator transforms a normalized

relation into an unnormalized relation (Date, Jan.1999)

First Normal Forms (1NF): the definition for the 1NF is not provided in the article, however I

will add it in my summary. According to Kroenke, a table is said to be in 1NF when its cells have

single values (neither repeating groups nor arrays are allowed as values) (118).

Full Dependence: the functional dependence A B is a full dependence when B is not

functionally dependent on any subset of A (Date, March 1999)

Intersection: the intersection of two relations is a third relation formed by the tuples contained in

both relations (Kroenke).

Join: the join of a relation A{X,Y} with the relation B{Y,Z} is a relation C{X,Y,Z} where

C{X,Y}=A and C{Y,Z}=B. The natural join is a special case of the join operation that is the

only join possible when the attribute X is functionally dependent on the attribute Y (relation A)

or the attribute Z is functionally dependent on the attribute Y (relation B) (Date, Nov.1998). In

other words, the join of two relations is a third relation formed by applying the product operator

and then selecting only the tuples that follow a specified criteria (Kroenke). In the natural join

operation redundant attributed must be eliminated.

Key: is the unique identifier of the tuple, in the Relation. It is said that the key is

nonredundant when each attribute (that constitutes the key) is necessary for a unique

identification. This definition implies that if you remove an attribute from the key, then you will

not have a unique identifier (Date, Oct.1998).


45

Normalization: Date defines the normalization process as a process of taking projections, and

further that we can recover the original relation by taking the natural join of those projections,

what implies that the process is reversible: the decomposition is nonloss.(March 1999)

Permutation: this operation allows the reordening of attributes of a given relation. One year after

this first paper, Codd stated that the order of the attributes of a given relation is irrelevant so

this operation would remain for the internal use of the language (Date, Nov.1998).

Projection: it implies that the points in n-dimensional space, corresponding to a relation of degree

n, can be projected over m of its attributes, in the m axes (having always that m


46

Second Normal Form (2NF): Codds definition of 2NF is that a relation must be in 1NF and

every nonprime attribute must be fully dependent on each candidate key.

A prime attribute is an attribute that composed at least one candidate key. A nonprime attribute is

an attribute that is not prime (Date, March 1999).

Third Normal Form (3NF): Codds definition of 3NF is that a relation R is in 3NF if it is in 2NF

and every nonprime attribute is nontransitively dependent on each candidate key of R (Date,

March 1999)

Tie: it implies that given a relation with n attributes (X1 to Xn), the result of this operation is the

restriction of the relation to just the particular rows in which Xn=X1 (Date, Nov.1998).

Trivial Dependence: A B is a trivial functionally dependence if B is a subset of A.

Union: the union of two relations is a third relation formed by adding the tuples of both relations

and eliminating duplicate tuples (Kroenke).


47

References

Atre S., Data Base: Structured Techniques for Design, Performance, and Management , 1980,

John Wiley & Sons, Inc.

Date C.J., An Introduction to Database Systems. Reading, Massachusetts: Addison-Wesley

Publishing Company, 1982.

Davis B., Data Base Management Systems: User Experience in the U.S.A., 1975, NCC

Publications.

Dutka, Alan F. and Hanson, Howard H., Fundamentals of Data Normalization. Reading,

Massachusetts: Addison-Wesley Publishing Company, 1989.

Elmasri, R. and Nevathe, S.B., Fundamentals of Database Systems. Redwood City, California:

The Benjamin/Cummings Publishing Company, Inc, 1994.

Kroenke, D.M., Database Processing: Fundamentals, Design, and Implementation (6th edition).

Upper Saddle River, N.J.: Prentice Hall, 1998.

Haderle Donald J., Database Role in Information Systems: The Evolution of Database

Technology and its impact on Enterprise Information Systems, In Goos G. and

Hatmanis J., Lecture Notes in Computer Science, 1990, Spring-Verlag.

Hawryszkiewycz I.T., Database Analysis and Design, Macmillan Publishing Company, 1991.

Nijssen G.M. and Halpin T.A., Conceptual Schema and Relational Database Design: A Fact

Oriented Approach. New York: Prentice Hall, 1989.


48

Turk Thomas A., Planning And Designing The Data Base Environment, Van Nostrand

Reinhold Company Inc., 1985.

Yao, S. Bing, Principles of Database Design, Volume I: Logical Organizations, New Jersey:

Prentice Hall, 1985.

Articles:

Date, CJ, The Birth of the Relational Model (part 1 of 3), Intelligent Enterprise, October 1998.

In this article Date explained Codds first relational paper: Derivability, Redundancy, and

Consistency of Relations Stored in Large Data Banks, IBM Research Report RJ599, 1969.

Date, CJ, The Birth of the Relational Model (part 2 of 3), Intelligent Enterprise, November 1998.

In this article Date explained Codds first relational paper: Derivability, Redundancy, and

Consistency of Relations Stored in Large Data Banks, IBM Research Report RJ599, 1969.

Date, CJ, The Birth of the Relational Model (part 3 of 3), Intelligent Enterprise, December 1998.

This article talk about the Codds papers regarding the Relational Model. Particulary, the article

refers to the following paper: Codd, E.F., A Relational Model of Data for Large Shared Data

Banks , Communications of the ACM 13, No.6, 1970.

Date, JC., Thirty Years of Relational Codds Relational Algebra, Intelligent Enterprise, January

1999. In this article, Date analyzes Codds third famous paper: Relational Completeness of Data

Base Sublanguages, published in 1972 (IBM Research Report RJ987).

Date, JC., Thirty Years of Relational: The First Three Normal Forms, Intelligent Enterprise,

March 1999. In this article, Date analyzes Codds three relational papers: The Second and Third


49

Normal Forms for the Relational Model, IBM technical memo (October 6th, 1970). Further

Normalization of Data Base Relational Model, IBM Research Report RJ909 (August 31st, 1971).

Normalized Data Base Structure: A Brief Tutorial, ACM SIGFIDET Workshop on Data

Description, Access, and Control, San Diego (November 11th-12th, 1971).

bases de datos - modelo racional

Documents