sql unit 1 an introduction to relational databases

1

SQL Unit 1An Introduction to Relational Databases

Kirk Scott

2

• 1.1 Entities, Tables, and Primary Keys• 1.2 One-to-Many Relationships and Foreign

Keys• 1.3 One-to-One and Many-to-Many

Relationships• 1.4 An Introduction to Data Types• 1.5 Nulls and Integrity

3

1.1 Entities

• 1.1.1 Definition of an Entity• The term "entity" refers to any individual

item that can have information stored about it in a database.

• This may be a person, a thing, or some sort of abstraction that doesn't have a physical existence.

4

1.1.2 Describing Entities by Means of their Characteristics

• In order to have information about an entity stored in a database, it has to be possible to describe it using values which may be words or phrases or numeric quantities.

5

1.1.3 Storing Information about Entities in Tables

• Information about entities is stored in rectangular tables.

• A table may contain information about more than one entity of the same kind.

• If there are multiple rows in a table, then you are storing information on that many different entities.

• The information about each individual entity is contained in a single row in the table.

• The values which describe that entity are contained in the columns of that row.

6

• Suppose you decide that people are entities that you want to store information about.

• This is an illustration of the general idea:

SSN name dob123-45-6789 Bob 1/1/01…

7

1.1.4 Table Schema Notation

• Table schema notation can be used to specify the name of a table and its fields without providing sample data.

• Sometimes people are tempted to give tables plural names.

• It turns out to be less confusing to talk about tables and their contents if the table names are singular.

• The name of the table should describe a single instance of the kind of entity stored in a row of the table.

8

• For example, if you want to store information about people, you would give the table the name "Person".

• Here is the table specification using this naming convention and schema notation:

• • Person(SSN, name, dob)

9

1.1.5 Parallel Sets of Terminology for Relational Tables

• Table, Row, Column• File, Record, Field• Relation, Tuple, Attribute

10

• There are three parallel sets of terminology when referring to the structure of data mentioned above.

• One set has already been used above. • Each row in the table contains information about

one entity. • Each column contains a value describing a

particular characteristic of the entity. • The characteristic described by a given column is

the same for all of the entities in the table.

11

• An older set of terminology, which is still used, refers to files, records, and fields instead of tables, rows, and columns.

• Sometimes the term flat file is used. • This means that each record in the file has the

same number of fields.

12

• A more theoretical set of terminology refers to relations, tuples, and attributes, respectively.

• The terms relation and tuple may be somewhat obscure, but the term attribute is very descriptive.

• Each column in the table contains a value which describes an attribute of the entity in question.

13

• In these notes all of these sets of terminology may be used at one time or another, and they may be mixed up, referring to attributes of records, or rows of files, for example.

• You may also use the terminology indiscriminately when answering questions.

• It's simply important that you know what the different terms mean.

14

1.1.6 Deciding on Entities in a Database

• The designer of a database has to determine what entities will have information stored about them.

• Entities are conceptual in nature. • For any given situation, there is not necessarily

just one correct set of entities which will describe the situation.

• Suppose you want to record information about mothers and children, for example.

15

• You could regard both mothers and children as instances of persons, and decide that the underlying entity is a person.

• You could also decide that mothers and children are distinct entities in your view of the world.

• This second approach would have to be taken if the attributes you store for mothers and children are different.

16

1.1.7 Incorrect Design 1: Mixing Types of Entities in Tables

• Suppose that you have decided to record different attributes for mothers and children and that they are conceptually different entities.

• Suppose also that you show the relationship between mothers and children by putting the records for children after their mother's record.

• It may be convenient or customary to show information in this way, but this is not a correct table design:

17

motherA …child1 …child2 …motherB …child3 …… …

18

• This is not allowed under the relational model.

• The theory of relational databases is similar to set theory in math.

• The rows in the tables are like elements in a set.

• All of the elements have to be of the same kind.

19

• If two kinds of entities are different, then they belong in different tables, even if they have the same number of attributes.

• Each table has to contain information about one kind of entity only.

• One table can't contain records for two different kinds of entities.

20

• It is also true that the order of the elements of a set does not have any meaning.

• Likewise, the order of the rows of a table has no meaning.

• It may be convenient for users to see the rows sorted in some particular order, and it will be possible to display information in this way.

• But intrinsically the order is immaterial.

21

• If you enter data in a particular order, that can imply no relationship between rows;

• also, if you happen to see the data displayed in a particular order, you can infer no relationship between the rows.

• It is not permissible to store relationships between entities by means of their relative positions in tables.

• Relationships can only be captured by means of attribute values.

22

1.1.8 Incorrect Design 2: Repeating Fields or Multi-valued Fields

• It may also be convenient to show information as outlined below, but this is also incorrect:

motherA child1 child2 motherB child3 …

23

• This is not allowed under the relational model. • Different mothers may have different numbers of

children. • This means that the number of attributes in a

row for a mother could vary. • You may think that it would be possible to set a

maximum number of children per mother and use that to set a fixed number of columns per row.

24

• However, for any number you choose there are two undesirable results:

• You may encounter a mother who has more children than the maximum;

• and for all mothers who have less than the maximum you have lots of wasted space for information about children.

25

• This example emphasizes the idea of a flat file.

• The rows of a table can't be jagged. • They all have to contain the same set of

attributes.

26

1.1.9 Incorrect Design 3: Concatenating Related Records

• There is at least one more alternative design, which is incorrect.

• It is worth taking a look at because it illustrates a different set of problems.

• Suppose for the purposes of illustration that each mother had 4 attributes and each child had 2 attributes.

• Then let the table be designed so that each row contained all of the information about one mother and one of her children.

27

For example:

motherA … … … child1 …motherA … … … child2 …motherB … … … child3 …

28

• The fundamental problem remains that information about more than one different kind of entity is being stored in one table.

• The practical problem with this design is redundancy.

• If a mother has more than one child, then the mother's data is repeated for as many children as she has.

29

• This is bad for at least three reasons. • 1. It wastes space. • 2. If the mother's information ever has to be

updated, more than one record may have to be updated, rather than just one.

30

• 3. And the most insidious problem of all is this: • By recording the same information more than

one time, you open up the possibility that different values will be recorded for the same attribute in different places.

• If this happens, it is clear that at least one of the entries is wrong, but it's impossible to tell which one.

• This is known as a data integrity problem.

31

1.1.10 No Duplicate Records; Primary Keys

• Relations are like sets in another way. • Duplicate elements are not allowed in a set,

and duplicate rows are not allowed in tables. • This makes perfect sense. • What purpose would it serve to store the

information about a given entity more than one time?

32

• Another way of saying that there can be no duplicate records is that all of the records in a file are unique.

• In other words, when taking the values in all fields of each record into account, no two records in the table contain exactly the same set of values.

33

• It is also customary to have a single field which uniquely identifies each record.

• That is to say, there are no duplicate values for that field in the whole table.

• When recording information about people, their social security number is a good example of this.

• No two people are supposed to have the same social security number.

• If you know their number, you have positive identification and you can look them up and find out other information about them.

34

• It is customary, but not required, to have the unique identifier be the first field in the table.

• This field is called the primary key of the table. This picture, given previously, illustrates the general idea:

SSN name dob123-45-6789 Bob 1/1/01…

35

• The schema notation for a table can be expanded to show which field is the primary key. One alternative is to underline the primary key field:

• • Person(SSN, name, dob)• • Another alternative is to explicitly mark the primary key

field by following it with the abbreviation p.k.:• • Person(SSN p.k., name, dob)

36

1.2 One-to-Many Relationships and Foreign Keys

37

1.2.1 All Tables in a Database are Related

• In theory, you could have a database consisting of one table, containing information about just one type of entity.

• In practice, a database will store information about more than one type of entity and will consist of more than one table.

• Each table in a database has to be related in some way to at least one of the other tables in the database.

38

• Collectively, all of the tables have to be related to each other.

• Informally, the first step in database design is determining the entities and attributes involved.

• The second step is determining the relationships among the entities.

39

1.2.2 The Three Kinds of Relationships

• There are three kinds of relationships that can exist between entities:

• one-to-one (1-1), • one-to-many (1-m), • and many-to-many (m-n). • Each of these kinds of relationships can be

captured in a relational database design.

40

1.2.3 The One-to-Many Relationship; ER Notation

• It turns out that the 1-m relationship is the most basic one. • The mother-child relationship is of this type, and it can be

illustrated using an entity-relationship (ER) diagram, shown below.

• In this kind of diagram the tables are represented as rectangles and the relationship between the tables is represented by a line between them.

• One end of the line is forked. • This is known as a crow's foot, and it is this end of the

relationship which is "many":

41

Mother Child

42

• It is also possible to include field names using this notation.

• In this example mid stands for mother id and kid stands for child id.

• These fields are the primary keys and they are indicated with the notation p.k.

43

mid p.k.name…

kid p.k.name…

Mother Child

44

1.2.4 Foreign Keys

• Capturing the relationship between two tables depends on the use of what is called a foreign key.

• A foreign key is a field in one table which happens to be the primary key field of another table.

• Foreign key can be abbreviated f.k. • The way to capture a 1-m relationship is to embed

the primary key of the "one" table as a foreign key in the "many" table.

45

• Continuing to use ER notation, the example above can be expanded to show the 1-m, or primary key to foreign key relationship.

• The mid, the primary key of the mother table, is embedded as a foreign key (with the same name) in the child table:

46

mid p.k.name…

kid p.k.namemid f.k.…

Mother Child

47

• In some books the list of fields in the Child table wouldn't explicitly show the mid field.

• They are relying on the crow's foot notation to indicate that the p.k. of the Mother table would be a f.k. in the Child table.

• It is redundant, but probably clearer, to show all of the fields in each table explicitly.

48

• The foreign key field doesn't have to have the same name as the corresponding primary key field.

• As an illustration, in this example it would be possible to have mid p.k. in the Mother table and motherid f.k. in the Child table.

• It is the notations p.k. and f.k. which represent the roles of the fields in the tables, not the names of the fields themselves.

49

• Another way of illustrating this relationship is as follows:

Mother Child

mid kid mid

50

• In this representation the arrow graphically shows the primary key field being embedded as a foreign key field.

• In some books the arrow might be shown going in the opposite direction.

• When done in that way it represents the fact that values in the "many" table refer back to values in the "one" table.

• The idea remains the same. • To capture a 1-m relationship, the primary key of the

"one" table is embedded as a foreign key in the "many" table.

51

1.2.5 A Concrete Example• The presentation so far has basically been about notation. • The underlying idea may become clearer if a concrete

example is shown. • Suppose that the Mother and Child tables consist of these

simple designs. • Notice the use of the abbreviation f.k. again to identify the

foreign key in the design:• • Mother(mid, name)• Child(kid, name, mid f.k.)

52

• The underlying assumption is that the mid fields in both tables are of the same type and have the same meaning, namely, they identify one of the mothers.

• Suppose that the tables contain this information:

53

Mothermid name1 Lily2 Matilda

Childkid name mida Ned 2b Ann 1c June 2

54

• The important observation is that the values in the mid field in the Child table tell you that Ned and June's mother is Matilda and Ann's mother is Lily.

• It is the values of the mid fields in the two tables that capture the relationship between them.

55

1.2.6 A Relationship between a Table and Itself

• Here is one last iteration on the idea of entities and capturing relationships between them.

• You could decide that the basic entity you're dealing with is a person, but you would also like to record who is whose mother among person entities.

• Here is an example of a design that accomplishes this:

56

Personpid name mid1 Ned 22 Ann3 June 2

57

• In this example, pid is the primary key and mid is the foreign key.

• mid refers to pid. • In other words, the primary key of the table is

embedded as a foreign key (necessarily with a different name) in the same table.

• What you gather from the data shown is that Ned and June are Ann's children.

• Ann's mother doesn't appear in the table.

58

• There are not literally two copies of the Person table, but the relationship between the table and itself could be diagrammed in this way:

pid p.k.namemid f.k.

pid p.k.namemid f.k.…

Person as Mother

Person as Child

59

• Do not mistake this design for one of the incorrect designs given earlier.

• The basic entity is person. • Some people are mothers and some are children

and you are recording information about the same set of attributes for both.

• The important point is that you can't tell who a mother's children are or vice-versa by whether or not their records are located next to each other.

60

• The relationship between records is captured by the values stored in the fields.

• The one-to-many relationship is captured by the fact that the primary key value of the entity which plays the "one" role is embedded as a foreign key value in the record of the entity which plays the "many" role.

• In this case, both entities happen to exist in the same table.

61

• Here is another diagram for this situation.

pid p.k.namemid f.k.

Person

Is the mother of

62

• This diagram is better than the previous one because it doesn't suggest that there are two copies of the Person table.

• On the other hand it is a bit more cryptic. • This diagram illustrates the fact that when the

relationship is not obvious, then the line indicating the relationship has to be given an explanatory label.

• Either diagram is acceptable.

63

1.3 One-to-One and Many-to-Many Relationships

64

1.3.1 Background

• The correct approach to implementing a 1-m relationship was given in the previous section.

• This section will show how to implement 1-1 and m-n relationships.

• The idea behind the examples will be biological mating patterns.

65

• The 1-m pattern will be repeated using cows as an example.

• In a herd social structure, one bull will acquire a harem of cows.

• Geese will be used to illustrate the 1-1 structure. • Canada geese, for example, typically mate for life. • They will only find a new partner if their old

partner dies.

66

• Finally, chimpanzees will be used to illustrate the m-n structure.

• Throughout their lives, chimpanzees may change partners and have children with different partners.

• Their social structure is not based on what humans call a nuclear family.

67

1.3.2 The One-to-Many Relationship Again

• There is just the one way of implementing a 1-m relationship.

• The primary key of the "one" table is embedded as a foreign key in the "many" table.

• The tables below illustrate this idea again.

68

Bullbid name1 Ferdinand2 Durham

Cowcid name bida Elsie 2b Bossy 1c Daisy 2

69

1.3.3 The One-to-One Relationship as a Single Table

• With the 1-1 model, there is a choice to be made.

• Suppose that the match-up of two entities is truly permanent.

• In this case, it would be possible to re-analyze the situation and determine that the match-up itself was a base entity, and make a single table for that.

• For example:

70

Goose Pairspair id goose name gander namea Gaye Garyb Gabriela Gusc Gladys Gil

71

1.3.4 The One-to-One Relationship as Two Tables

• Most of the time it is impractical to assume that the match-up is a base entity.

• The design alternative is to put the two kinds of entities into two different tables and link the tables with a primary key, foreign key pair.

• This is one alternative for handling the situation:

72

Goosegoose id goose name gander ida Gaye 1b Gabriela 2c Gladys 3

Gandergander id gander name1 Gary2 Gus3 Gil

73

• There are several things happening with this example that bear some explanation.

• This technique for capturing a 1-1 relationship allows for data to be included that would make it 1-m.

• Every database design has underlying assumptions.

• The assumption here is that the relationship is 1-1.

74

• It is up to the user to make sure that no data is entered into the tables which would imply a 1-m relationship.

• There is a choice in modeling a 1-1 relationship in this way:

• Which primary key should be embedded in the other table as a foreign key?

• In this example, the primary key of the Gander table is embedded as a foreign key in the Goose table.

• It would also be possible to embed the primary key of the Goose table as a foreign key in the Gander table.

75

• It doesn't make a difference in this example, but in real life, if there is any chance that there might be exceptions or that in the future the relationship could become 1-m, it is important to embed the primary key of what would be the "one" table as a foreign key in what would become the "many" table.

• Another concern which will be explained further in the future is minimizing the number of null values in tables.

76

1.3.5 An Incorrect Design for the One-to-One Relationship

• Beginning database designers sometimes make this mistake:

• They think that if one embedded primary key is a good thing, then two embedded primary keys are a better thing.

• Going back to the notation for embedding which uses arrows, this illustrates what they try to do, which is wrong:

77

Gander Goose

Ganderid, p.k. Gooseid, p.k. Ganderid, f.k.Gooseid, f.k.

78

• The problem with this design is that it's redundant.

• There is only one relationship between the tables, but it is captured twice.

• First of all, this is wasteful.

79

• The second flaw in this arrangement is that it opens the possibility of mistakenly storing conflicting data.

• What if Gus's primary key is the foreign key value in the record for Gabriela, but Gabriela's primary key value is not the foreign key value in the record for Gus?

80

• Another concern which will be explained in greater detail later on is that it may become impossible to enter data into the tables.

• This can result because additional constraints can be added which specify that fields may not be null and values may not be entered into foreign key fields unless they exist in primary key fields.

81

1.3.6 The Many-to-Many Relationship with a "Table in the Middle"

• If two kinds of entities are in a m-n relationship, in addition to the tables for the two kinds of entities, a third table is needed.

• The m-n relationship is captured by two 1-m relationships.

• Each of the base entity tables is in a 1-m relationship with the "table in the middle".

• Using the ER modeling notation, this is how it works:

82

Entity1 Table in the middle Entity2

83

Here is a simple example using data:

Female chimpfcid name1 Carol2 Alice3 Sue4 Jill5 Ann6 Barb

Pairingfcid mcid1 11 22 12 25 36 33 4

Male chimpmcid name1 Bob2 Ted3 John4 Lou

84

1.3.7 Cardinalities, or the Number of Entities on each End of a Relationship

• This example raises several points. • Notice that some of the chimps do not participate

in a pairing at all, namely female 4. • Some of the chimps participate in a 1-1

relationship, namely female 3 and male 4. • Some of the chimps are in a 1-m relationship,

namely male 3 and females 5 and 6. • Finally, males 1 and 2 and females 1 and 2 exhibit

an m-n relationship.

85

• When referring to relationships like m-n or 1-m, unless other assumptions are explicitly stated, it's customary to assume that the relationship may involve 0 or more match-ups on the "many" side of the relationship.

• It is also customary to assume that the relationship may involve 0 or 1 match-ups on the "one" side of the relationship.

86

1.3.8 The Primary Key of the Table in the Middle, Concatenated Keys

• The second point raised by the foregoing example has to do with the primary key of the table in the middle.

• As given, the pair of embedded foreign keys in the table in the middle is unique for each row.

• It is not necessary to invent a new primary key for the table.

• When more than one field together make a unique identifier and this is used as the primary key, this combination is known as a concatenated key field.

87

• Using the notation for table design given earlier, this m-n design can be represented as follows.

• Notice that underlining is used to show that something is part of the primary key, and at the same time the abbreviation f.k. is used to show that something is a foreign key.

• fcid and mcid together are the primary key field of the table in the middle.

• At the same time, each separately is a foreign key field:

88

• Female chimp(fcid, name)• Pairing(fcid f.k., mcid f.k.)• Male chimp(mcid, name)

• Or

• Female chimp(fcid p.k., name)• Pairing(fcid p.k. f.k., mcid p.k. f.k.)• Male chimp(mcid p.k., name)

89

1.3.9 A Separate Primary Key Field for the Table in the Middle

• The third point raised by this example has to do with adding a separate primary key field to the table.

• This may be handy, and there is nothing wrong with it.

• For example, the table in the middle might now take this form:

90

Pairingpid fcid mcid1 1 12 1 23 2 14 2 25 5 36 6 37 3 4

91

• The design could then look like this:• • Pairing(pid, fcid f.k., mcid f.k.)

• Or

• Pairing(pid p.k., fcid f.k., mcid f.k.)

92

1.3.10 Recording Historical Data

• This leads to the fourth, and conceptually most interesting point.

• In m-n relationships of this kind, the different pairings may occur at different times, and this may be important information to record.

• There may be different pairings of the same partners at different times.

• It is also possible that the times of pairings of different partners could overlap.

93

• This kind of information can be captured by adding a date or time field to the table in the middle, or possibly a beginning date or time and ending date or time field.

• For example:Pairing

fcid mcid beginning date end date1 1 1/1/01 1/6/011 2 2/2/02 2/7/021 1 3/3/03 3/8/03...

94

• As shown above, with time or date fields included, it is still reasonable to assume that each row would be unique and that no separate primary key field would be needed.

• Every field in the table is part of the key, and the design would look like this:

• • Pairing(fcid f.k., mcid f.k., beginning date, end

date)

95

• On the other hand, it might still be useful to have a simple primary key, in which case the design would look like this:

• • Pairing(pid, fcid f.k., mcid f.k., beginning date,

end date)

96

• It is always possible that the table in the middle would also have other fields containing useful information concerning the match-up.

• For example, there might be a field which contains yes or no values depending on whether the pairing resulted in offspring.

• In that case, the table design would expand to this:• • Pairing(pid, fcid f.k., mcid f.k., beginning date, end

date, offspring)

97

1.3.11 An Incorrect Design for the Many-to-Many Relationship

• The last remark on implementation choices for m-n relationships is the following:

• Beginning designers sometimes make the same mistake here as with 1-m relationships.

• They try to capture the relationship without the table in the middle and they embed the primary key of each of the base tables as a foreign key in the other.

98

• Not only does this suffer from the same general problems as when applied to 1-m relationships;

• it is also a deadly mistake when trying to represent m-n relationships.

• The incorrect design is shown below with the embedded foreign key fields.

99

Female chimpfcid name mcid1 Carol2 Alice3 Sue

Male chimpmcid name fcid1 Bob2 Ted3 John

100

• Suppose that the relationships among the chimps are completely promiscuous.

• Each female is paired with each male and vice-versa.

• There are 3 of each which means there are 9 possible different pairings.

101

• There are only 6 blank foreign key values where these relationships could be recorded.

• It is simply impossible to record all of the relationships with this faulty design.

• It is also undesirable that one concept, a pairing between a male and a female, may be recorded in two different ways.

102

1.3.12 A Table in a Many-to-Many Relationship with Itself

• Recall that a one-to-many relationship between mothers and children could be re-analyzed as a relationship among people, which led to a one-to-many relationship between the Person table and itself.

• Similarly, information about male and female chimps can be stored in a single table.

• All that's required to keep track of gender is the addition of a new field to the table.

103

• Then the relationships among chimps can be captured as a table in the middle, which matches one chimp record with another.

• The diagram below represents the idea as if there were two copies of the Chimp table, labeled (1) and (2).

• As before, keep in mind that there are not actually two copies of the table.

• Incidentally, this design allows for same sex relationships and relationships "with yourself".

104

cid p.k.namegender

cid p.k.namegender

Chimp (1)

Chimp (2)

cid1 p.k., f.k.cid2 p.k., f.k.beginning dateend date

105

• Whether or not such relationships are allowed are questions which have to be documented as part of the overall design separately from the designs of the individual tables and their primary key to foreign key relationships.

• If some kinds of relationships are not allowed, such restrictions can only be enforced by data integrity rules which are applied when data is entered into the tables.

106

This is also a correct diagram for this design:

cid p.k.namegender

Chimp

cid1 p.k., f.k.cid2 p.k., f.k.beginning dateend date

107

1.4 An Introduction to Data Types

108

1.4.1 What is a Data Type? Illustrating with Text vs. Numeric

• When creating a table using SQL, it's necessary to specify what kinds of data are stored in the fields of the table.

• Each field can only contain one kind of data, or data type.

• The idea of a data type can be explained using two broad categories:

• Text data and numeric data.

109

• If a field contains non-numeric data, or if it contains numeric digits, but it makes no sense to apply mathematical operations to them, then this field should be given a text data type.

• This kind of data is sometimes called alphanumeric because the field may contain letters or digits.

110

• If a field in a table contains numeric digits, and it would make sense to apply mathematical operations to the stored values, like adding or subtracting them, then this field should be given a numeric data type.

111

• This section is not an exhaustive treatment of all of the different types supported by SQL or all of their characteristics.

• In particular, it does not cover the specific details of any particular implementation of SQL.

• It is intended to give a general idea about the types and expand on the information given in the first note file.

112

• General information on what a complete implementation of SQL should include can be found by turning to the documentation of the SQL standard.

• Specific information on the data types supported by a specific implementation can be found by turning to the documentation for that implementation.

113

1.4.2 Other Data Types: Monetary and Date

• In addition to numeric fields and text fields, there are two other kinds of data type which are useful even for simple databases.

• There is a monetary data type. • Its values can be treated numerically, but it

has built-in formatting characteristics associated with money.

114

• The monetary or currency data type is a feature of MS Access.

• For example, OpenOffice Base and MySQL do not have a currency data type.

• There is also a date data type. • This has both numeric and textual

characteristics which make it useful for showing dates and times.

115

1.4.3 Other Data Types and Synonyms

• There are a fair number of different specific data types defined in SQL.

• Not all of them will be covered here. • In addition, there are a fair number of

synonyms for the various data types. • The text, numeric, monetary, and date data

types are briefly introduced below, along with their synonyms.

116

1.4.4 Text Data Types

• The text data type can be designated with the keyword TEXT or the keyword CHAR.

• CHAR is the standard name of the type.• TEXT is a synonym used in MS Access.

117

• When specifying a text field, it is necessary to specify its length (also known as its width) by putting a number in parentheses behind the keyword.

• For example, if you want to allow up to 12 characters for a name field, the type designation would be either one of the following choices:

• TEXT(12)• CHAR(12)

118

• There is an additional keyword, VARCHAR, which works in a similar way.

• It is worth knowing about because it is commonly used in place of CHAR.

• If a field is defined as TEXT(12) or CHAR(12), then exactly 12 spaces are reserved in storage to hold any value that might be entered.

119

• If a field is defined as• VARCHAR(12)• then a value up to 12 characters long could be

stored in it. • However, if a shorter value is stored, only that

amount of space is used. • This difference is not apparent to the user, but

some database designers make use of VARCHAR because of the space saving.

120

• In large databases, these considerations can be important.

• It is also the case that although TEXT is in general a synonym for CHAR, TEXT fields can be declared without lengths.

• Depending on the system, they may be able to store thousands or millions or more characters.

• The length of such a TEXT field will vary according to the amount of data stored in it.

121

1.4.5 When is a Number Field Not a Number?

• It is important to keep in mind what kinds of fields might be declared as text fields.

• A person's name typically only includes letters of the alphabet.

• A person's address might include a house number as well as a street name—a mixture of letters and numbers.

• Both of these fields should be text fields.

122

• An even more important example would be things like id numbers, serial numbers, etc.

• The common names for these attributes include the word "number", but they are not numbers.

• If there is never any purpose in applying arithmetic operations to them, they should not be defined as numeric fields.

• Depending on the source, serial numbers may include letters, digits, symbols such as dashes and so on.

123

• Social security numbers are the classic example. • In their pure form, they are simply defined as

sequences of 9 digits (the dashes separating the 3 groups of digits are optional).

• Even though they always consist entirely of digits, they are not numbers, and should not be defined as numeric fields.

• Telephone numbers and zip codes also fall into the same category.

124

1.4.6 Text Field Values Should Be Enclosed in Quotation Marks

• There is one last consideration concerning text fields.

• When writing SQL statements that refer to specific values, those values have to be in quotation marks.

• So, for example, if I have a name field, a value in that field would be represented as 'Fred'.

• If I had a social security number field, a value in that field would be represented as '473608932'.

125

• Depending on the implementation of SQL, single quotes or double quotes may be used to enclose the values of text fields.

• It turns out that even in some systems that support double quotes, there are situations where only single quotes will do.

• In these notes, single quotes will be used because they are the most likely to be correct in all situations.

126

• It is also worth noting that you may run into trouble when copying and pasting answers into an SQL editor.

• The editor may not recognize curly quotes: “, ”, ‘, ’. • If so, they should be replaced with straight quotes: ",

'. • There are other uses for quotation marks in SQL

besides enclosing the values of fields. • As you will find out later, one of those other uses

requires double quotes.

127

1.4.7 Numeric Data Types• The general kind of data that has the most different kinds

of types in SQL is numeric data. • There is a broad division between numbers that have

decimal places and those that don't, and within each category there are several different types.

• For the time being the integral data types will be ignored. • It is always possible to use a data type that allows decimal

places and simply enter whole number values into it. • Doing that makes life simpler because you don't have to

remember the different integral types.

128

• Among the numeric types that allow decimal places there is a distinction based on how many bytes are used internally to store the values.

• To the beginning user, these distinctions don't make a difference.

• There are a large number of synonyms covering all of the possibilities, but for practical purposes, you can simply choose between the synonyms NUMBER and NUMERIC.

129

• For those more familiar with the terminology of programming languages, one of these synonyms may be more memorable or meaningful:

• REAL, FLOAT, SINGLE, DOUBLE. • All of these types store values in binary form.

130

• There is also a numeric type named DECIMAL.

• This stores values in base 10. • This kind of representation takes up more

space than the other numeric types.• For those situations where genuine base 10

numbers are desired, this type is the correct option.

131

• A field is declared decimal(m, n), where m represents the total number of decimal digits and n represents the number of digits to the right of the decimal point.

• This data type is useful in systems like OpenOffice Base and MySQL, which don’t have a monetary data type.

• A declaration of decimal(m, 2), for example, would be suitable for holding monetary values.

132

• In MS Access, on output, currency fields will be formatted to show things like $ signs.

• Decimal fields will just be displayed as numbers.

133

• When representing numeric values, you never enclose them in quotes. If the type allows a decimal point, then one can be used.

• If the number is a whole number, it is not necessary to append .0 at the end.

• Groups of digits should never be separated by commas or any other punctuation mark.

134

• No other symbol, like a $ sign, is allowed. • Only the single decimal point is allowed. • So, for example, a valid representation of a

numeric value might look like this: 1894364.562

135

• This is a list of some of the numeric types and the number of bytes used to store them:

• • TINYINT 1 byte• SMALLINT 2 bytes• INTEGER 4 bytes (range: -231 to 231-1)• REAL 4 bytes, floating point• FLOAT 8 bytes, floating point• DECIMAL 17 bytes, genuine decimal representation without loss of

precision due to binary conversion. When declaring a field of this type, it is possible to specify how the digits are used. For example: fieldnameDECIMAL(total number of digits, digits to the right of the decimal point).

136

1.4.8 Monetary Data Types

• The monetary data types are specialized numeric types.

• In other words, it's possible to do arithmetic with these kinds of fields because they really do represent numbers, but these types also have other characteristics.

137

• The keyword for such a field is MONEY or CURRENCY.

• Such fields always carry 4 decimal places internally for computation, but they always display 2 decimal places by default.

• They also display the currency symbol, $ for example, along with the numeric value.

138

• On the other hand, the same rules apply to monetary types as numeric types:

• When representing or entering such data values, the user cannot use quotation marks, commas, or currency symbols.

• A valid value consists only of digits and up to one decimal point.

139

• This is the specification for how many bytes are used to hold a monetary value:

• • MONEY 8 bytes, up to 15 digits to the left of

the decimal point, up to 4 digits to the right, computations shown rounded to 2 digits to the right.

140

1.4.9 Date and Time Data Types

• Both DATE and TIME data types exist.• For the purposes of this introduction, only the

date data type will be considered.• Date fields are hybrids between text and

numeric fields. • They contain information that may be

represented in textual form, such as the name of a month, reflecting their textual nature.

141

• Finding the differences between dates is also meaningful, reflecting their numeric nature.

• The date types support special functions for separating out components such as the names of days or months.

• They also support many different kinds of representations, ranging from digits and slashes to fully spelled out names of days and months.

• Details will be covered later.

142

• Depending on the system, the synonyms for the date types may include: DATE, TIME, and DATETIME.

• The default representation of dates is of this form: mm/dd/yy, where the letters m, d, and y represent digits.

• Dashes can also be used instead of slashes.

143

• When the user is representing a date, it may be enclosed in # signs.

• For example: #10/11/09#. • If you are using a system other than MS Access, like

OpenOffice Base or MySQL, it will be necessary to use single quotes instead of # signs.

• On systems that support the # sign, using it is not a bad idea.

• This special punctuation serves as a reminder of the two aspects of dates, both textual and numeric.

144

1.4.10 Other Types

• In addition to the simpler data types, over time SQL has expanded to cover kinds of data which can't be classified as text or numbers.

• Some additional types include:• • BIT or BOOLEAN• These may take up 1 byte, but they are designed

to store 1 of only 2 values, true or false.

145

• BINARY, BLOB (binary large object), IMAGE, OLE (object linking and embedding)

• These are essentially unlimited in size. • They are designed to store files which are not

structured as database files. • For example, a binary file could contain

program code, it could contain a spreadsheet, it could contain a JPEG, etc.

146

1.4.11 Notation for Table Designs Including Data Types

• Every field in a table has to have a well-defined type.

• The notation used previously to represent table designs can be expanded to include the types of the fields.

• Here is the Person table example.

147

• It has been expanded to include fields of various types.

• It is in a mother-child relationship with itself. • The primary key of the table, SSN, is

embedded as the foreign key field motherSSN in the same table:

148

• Person• (SSN TEXT(9) primary key,• lastname TEXT(12),• firstname TEXT(12),• middle TEXT(1),• motherSSN TEXT(9) foreign key,• dob DATE,• heightininches NUMBER,• weightinpoundsNUMBER,• eyecolor TEXT(6)• wagerate MONEY,• streetaddress TEXT(24),• city TEXT(18),• state TEXT(2),• zip TEXT(5))

149

1.4.12 Primary and Foreign Key Field Data Types

• Logically, corresponding primary and foreign key fields represent the same thing, or mean the same thing, and as a result, they should be designed to have the same data type.

• In the case of text fields, you also expect them to have the same width.

• If they appear in different tables, they may have the same names.

150

• In this example they appear in the same table. • No two fields in the same table can have the

same name, so the foreign key field has to be given a different name.

• The name it has been given is descriptive of the role it is playing.

• That field contains the SSN of the mother of the person in whose record it appears.

151

1.5 Nulls and Integrity

152

1.5.1 Definition of Null

• The term "null" refers to the idea that a particular field in a particular record may not have data in it.

• A relational database management system allows this.

• Cases often arise in practice where the information doesn't exist or isn't known.

153

1.5.2 What Null is Not• When a database management system supports null values

in fields, it's important to understand what this does not mean.

• It does not mean that the fields contain the sequence of characters "null".

• It also does not mean that the field contains invisible blanks.

• Blank spaces themselves are a form of character. • What it means is that there is absolutely nothing in the

field, and the database management system is able to recognize fields that are in that state.

154

1.5.3 The NOT NULL Constraint

• Although null values are supported in general, it is also possible to specify that certain fields can never be null.

• It is not possible for primary key fields to be null, and it is also possible that for a certain database, the designer requires that if a user enters records into a given table, all records have to have valid values for certain specified fields.

155

1.5.4 Data Integrity

• Data integrity is the requirement that all data in all fields of a database be valid and correct.

• If databases didn't allow nulls and required that every field always contained a value, users would fill unknown fields with bogus data.

• This would violate data integrity.

156

• One solution to the ambiguous meaning of null values and the tendency of users to use bogus data when nulls are not allowed is to specify values like "not applicable" or "not known" which are valid entries for given fields.

• This solution works for text fields. • For numeric fields, null values are the only

option.

157

1.5.5 Entity Integrity

• The term integrity in database management systems refers to the validity and consistency of data entered into a database.

• The phrase "entity integrity" is the formal expression of a requirement that was stated informally earlier.

• Entity integrity puts the following requirement on a correctly implemented database:

158

• Every table has to have a primary key field, and no part of the primary key field can be null for any record in the table.

• If all or part of a key were allowed to be null, that would defeat the purpose that the primary key field be the unique identifier for every record in the table.

• To state the idea again, every key field value has to be complete and unique.

159

1.5.6 Referential Integrity

• It is the primary key to foreign key relationships that capture the relationships between entities that have been separated into different tables by the database design.

• It is critically important that the data maintaining the relationships be valid and consistent.

160

• The phrase "referential integrity" has the following meaning:

• Every value that appears in a foreign key field also has to appear as a value in the corresponding primary key field.

• This can also be stated negatively: • There can be no foreign key value that does

not have a corresponding primary key value.

161

• The meaning and importance of referential integrity can be most easily explained with a small example showing a violation of it.

• Consider these two tables:

162

Mothermid name1 Lily2 Matilda

Childkid name mida Ned 2b Ann 3c June

163

• Child b, Ann, is shown as having a mother with mid equal to 3.

• There is no such mother in the Mother table. • This is literally nonsense. • There is no sense in which this can be correct

and this is what referential integrity forbids.

164

• The example also illustrates two other things, which are related to "non-existent" values.

• These things are brought out in the following two points.

165

1.5.7 Nulls in Foreign Key Fields

• In the example above, child c, June, does not have a mother listed.

• In other words, the foreign key field is null. • This does not violate referential integrity. • As with null in any situation, it may mean

that the mother is not known.

166

• Nobody literally doesn't have a mother, but if the woman table only records information on living women, for example, then for an orphan, the mother "wouldn't exist".

• It is unlikely that you would rename the table "Child or Orphan"—but the idea is that the null value is allowed and this in some sense affects the meaning of what kinds of entities are entered into the table.

167

1.5.8 Primary Key Entities without Foreign Key Matches

• Also observe in the example that mother 1, Lily, does not have any matching records in the Child table.

• This does not violate referential integrity. • It only suggests that the Mother table might

be better named the Woman table, because it is reasonable to be recording information about women and children where some women do not have children.

168

1.5.9 Referential Integrity and Nulls on Insertion

• Referential integrity leads to another consideration.

• A database design with two tables in a primary key to foreign key relationship introduces interrelationship constraints.

• A fully-featured database management system will enforce referential integrity constraints.

169

• Once turned on, this means that at data entry time the primary key record has to go in first.

• Alternatively, if a foreign key record is entered first, null has to be allowed for the value of the foreign key field.

• In a table in a relationship with itself, the enforcement of referential integrity and NOT NULL on the foreign key field at the same time will prevent data entry into the table.

170

1.5.10 Referential Integrity on Updates and Deletions

• If a referential integrity constraint has been created, it will also be active when the contents of a table are to be changed by updating values or deleting records.

• The constraint puts a restriction on what can be entered into a foreign key field:

• The value has to exist in the primary key field.

171

• The constraint also puts restrictions on the primary key field:

• It can't be changed if there are foreign key records that depend on it.

• What can be done if primary key values are to be deleted or updated?

172

• The default settings for this part of a referential integrity constraint are summarized in these two phrases:

• On delete, restrict; on update, cascade. • Following this is a fuller explanation of what

these phrases mean in terms of the concrete mother and child example:

173

• On delete, restrict: • No mother record can be deleted if she has

corresponding child records in the other table. • To allow the deletion of the mother would lead to

a referential integrity violation. • To spell it out again: • Deletion in the primary key table is restricted if

there are corresponding records in the foreign key table.

174

• On update, cascade: • If the primary key value of a mother record is

updated, if she has corresponding child records in the other table, the foreign key values in those records are automatically updated to reflect the change.

• This problem arises less frequently because once a primary key value is assigned to an entity, it is rarely changed.

175

• Other referential integrity settings are also syntactically valid.

• For example: On delete, cascade (if the mother goes, death to the children);

• On update, restrict (no changes to the mother once she has children).

176

The End

sql unit 1 an introduction to relational databases

Documents

table names

table specification

individual entity

kind of entity

entitythe term entity

different entities

single row

parallel sets of terminology