relational database design unit ii 1. 2 advantages of using database systems centralized control of...

33
Relational Database Design UNIT II 1

Upload: dulce-marlowe

Post on 14-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Relational Database Design UNIT II

1

Page 2: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

2

Advantages of Using Database Systems

• Centralized control of a firm’s data

• Redundancy can be reduced (avoid keeping copies of the same data in many places)• Inconsistency can be avoided to some extent (updating occurs at one place instead of many places)

Page 3: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

3

• The data can be shared

• Security restrictions can be applied (passwords)

• Integrity of data can be maintained (e.g., an instructor has to exist before being assigned to teach a class)

Advantages of Using Database Systems (cont’d)

Page 4: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Relational Data Model

4

Page 5: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

5

RelationRelation• Definition: A relation is a named, two-dimensional table of data Definition: A relation is a named, two-dimensional table of data • Table consists of rows (records) and columns (attribute or field)Table consists of rows (records) and columns (attribute or field)• Requirements for a table to qualify as a relation:Requirements for a table to qualify as a relation:

– It must have a unique nameIt must have a unique name– Every attribute value must be atomic (not multivalued, not Every attribute value must be atomic (not multivalued, not

composite)composite)– Every row must be unique (can’t have two rows with exactly Every row must be unique (can’t have two rows with exactly

the same values for all their fields)the same values for all their fields)– Attributes (columns) in tables must have unique namesAttributes (columns) in tables must have unique names– The order of the columns must be irrelevantThe order of the columns must be irrelevant– The order of the rows must be irrelevantThe order of the rows must be irrelevant

Page 6: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

6

The table representation of an entity set

An entity set is usually represented as a table in the database. The name of the table is the same name as the entity set. For example:

SSN Name Street City

586-41-4125 Brown 2404 Maile Honolulu

332-18-6752 Yang 809 Dole St. Honolulu

121-898-5569 Gregson 211 Main St. Chicago

Customer

Page 7: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

7

The table representation of an entity set

In the table, columns are called attributes, rows are called tuples, observations, or records. Each row is an entity, and the table is an entity set.

CustomerSSN Name Street City

586-41-4125 Brown 2404 Maile Honolulu

332-18-6752 Yang 809 Dole St. Honolulu

121-898-5569 Gregson 211 Main St. Chicago

Page 8: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

8

RelationRelation• Definition: A relation is a named, two-dimensional table of data Definition: A relation is a named, two-dimensional table of data • Table consists of rows (records) and columns (attribute or field)Table consists of rows (records) and columns (attribute or field)• Requirements for a table to qualify as a relation:Requirements for a table to qualify as a relation:

– It must have a unique nameIt must have a unique name– Every attribute value must be atomic (not multivalued, not composite)Every attribute value must be atomic (not multivalued, not composite)– Every row must be unique (can’t have two rows with exactly the same Every row must be unique (can’t have two rows with exactly the same

values for all their fields)values for all their fields)– Attributes (columns) in tables must have unique namesAttributes (columns) in tables must have unique names– The order of the columns must be irrelevantThe order of the columns must be irrelevant– The order of the rows must be irrelevantThe order of the rows must be irrelevant

Page 9: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

9

KeysIt is important that any entity in an entity set be

uniquely identifiable. Practically, we use the values of certain attributes to uniquely identify an entity.

Keys can be Keys can be simplesimple (a single field) or (a single field) or compositecomposite (more than one field)(more than one field)

Page 10: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

10

Customer-name

Customer-street

customer

SSN

Customer-city

In this particular case, the value of the combination of any attribute(s) with SSN can uniquely identify a particular customer.

Page 11: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

11

KeysMeanwhile, it is also obvious that some of

these combinations may have redundant attributes.

Therefore, in theory, we systematically classify these combinations of attributes using three types of Keys: superkey, candidate key, and primary key.

Page 12: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

12

Keys(1) Superkey: is a set of attributes whose

value can uniquely identify an entity in the entity set.

A superkey contains one or more attributes. Example:

The combination of “SSN” and “Name” is a superkey of the following entity set customer.

Page 13: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

13

Because: The value of attributes “SSN” and “Name”, such as 558-36-1234 and Susan, can uniquely identify that particular customer in customer entity set, which is the pool of all customers.

Customer-name

Customer-street

customer

SSN

Customer-city

Page 14: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

14

Notationally, we write, Superkey: (SSN, name)

Other superkeys for the entity set customer: Superkey: (SSN)Superkey: (SSN, Street)Superkey: (SSN, City)Superkey: (SSN, Name, Street)Superkey: (SSN, Name, City)Superkey: (SSN, Street, City)Superkey: (SSN, Name, Street, City)

Page 15: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

15

Superkey is the broadest definition of unique identifiers of an entity in an entity set.

We are unsurprisingly very interested in the most economical combination(s) of attributes that can uniquely identify any particular entity.

Therefore, we introduce Candidate Key next.

Page 16: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

16

(2) Candidate key: is a set of one or more attributes whose value can uniquely identify an entity in the entity set, and any attribute in the candidate key cannot be omitted without destroying the uniqueness property of the candidate key. (It is minimal superkey).

Example:(SSN, Name) is NOT a candidate key, because taking out “name” still leaves “SSN” which can uniquely identify an entity. “SSN” is a candidate key of customer.Candidate key could have more than one attributes.

Page 17: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

17

Also, while most entity sets have only one candidate key, some entity sets could have more than one candidate key.

Example: Both “SSN” and “License #” are candidate keys of Driver entity set.

In building a database in a database software, the software will only allow to use one candidate key to be the unique identifier of an entity for an entity set.

Page 18: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

18

(3) Primary Key: is the candidate key that is chosen by the database designer as the unique identifier of an entity. The database designer chooses only one candidate key as the primary key in building the system.

Example: “SSN” and “License #” are both candidate keys of Driver entity set. The database designer can choose either one as the primary key.

Page 19: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Foreign Key· A foreign key is an attribute or attribute combination

of one relation R2 whose values are required to match those of the primary key of relation R1 where R1 and R2 are not necessarily distinct. Note that a foreign key and the corresponding primary key should be defined on the same domain(s).

Emp#e1e2e3

enameredblue

brown

Deptd1d2d3

Worksfordeptd1

d2

DnamePayTaxArt

Employee Dept

Foreign key

Page 20: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

20

Primary Key

Foreign Key (implements 1:N relationship between customer and order)

Combined, these are a composite primary key (uniquely identifies the order line)…individually they are foreign keys (implement M:N relationship between order and product)

Figure 5-3 Schema for four relations (Pine Valley Furniture Company)

Page 21: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Domain

• For each attribute, a set of permitted values, is called a the domain

21

Page 22: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Relational Data Structure

EMPLOYEEEmp# Name Emp Sex Mgr Emp#

E1 Jones Male E65

E6 Smith Male E28

E28 Jones Female -

Relation Attribute

Heading

Body

FemaleMale

Gender

Domain

Page 23: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

The Domain

Emp#E1E2E3

NameRed

BrownBlack

Mgr#-

E1E1

E1, E2, E3,E4Red, BrownBlack, Blue

Attributes

Domains

Employee

Person Name

Emp#

Page 24: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping
Page 25: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

25

Overall, superkey is the broadest unique identifier; candidate key is a subset of superkey; and primary key is a subset of candidate key.

In practice, we would first look for superkeys. Then we look for candidate keys based on experience and common sense. If there is only one candidate key, it naturally will be designated as the primary key. If we find more than one candidate key, then we can designate any one of them as primary key.

Page 26: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Integrity Constraints

• It ensures that changes made to the database by authorized users do not result in a loss of data consistency.

• It guards against accidental damage to the database

26

Page 27: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Domain Constraints

• They define valid values for attributes

• They are the most elementary form of integrity constraint.

• They test values inserted in the database, and test queries to ensure that the comparisons make sense.

Page 28: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

28

Integrity ConstraintsIntegrity Constraints

– NOT NULL: specification prohibits the insertion NOT NULL: specification prohibits the insertion of a null value for the attribute. Any data base of a null value for the attribute. Any data base modification that would cause a null to be modification that would cause a null to be inserted in an attribute declared to be not null inserted in an attribute declared to be not null generates an error diagnosticgenerates an error diagnostic

– E.g. balance numeric(12,2) not nullE.g. balance numeric(12,2) not null

Page 29: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

29

Integrity ConstraintsIntegrity Constraints

– Unique Constraints: specification says that Unique Constraints: specification says that attributes A1,A2… Aj, form a candidate key; attributes A1,A2… Aj, form a candidate key;

– no two tuples in the relation can be equal on all no two tuples in the relation can be equal on all the primary key attributes.the primary key attributes.

Page 30: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

30

Integrity ConstraintsIntegrity Constraints

– The Check Clause:ensures that attribute values The Check Clause:ensures that attribute values satisfy specified conditions.satisfy specified conditions.

– E.g.E.g.– Create table studentCreate table student(name char(15) not null,(name char(15) not null,Stud_id char(10),Stud_id char(10),Degree_level char(15),Degree_level char(15),Primary key(stud_id),check (degree_level in(‘Bachelors’, ‘Masters’)))Primary key(stud_id),check (degree_level in(‘Bachelors’, ‘Masters’)))The check clause ensures that the degree_level must be one of The check clause ensures that the degree_level must be one of

‘Bachelors’ or’Masters’‘Bachelors’ or’Masters’

Page 31: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Referential Integrity• Ensures that a value that appears in one relation for a given set of

attributes also appears for a certain set of attribute in another relation.– If an account exists in the database with branch name “Perryridge”, then

the branch “Perryridge” must actually exist in the database.

Primary keys ofrespective relations

Foreign key

branch (branch-name, branch-city, asset ) Perryridge Brooklyn 500,000

account ( account-no, branch-name, balance ) A-123 Perryridge 5000

A set of attributes X in R is a foreign key if it is not a primary key of R butit is a primary key of some relation S.

Page 32: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Assertions

• An assertion is a predicate expressing a condition that we wish the database to always satisfy.

• Domain constraints and referential integrity constraints are special forms of assertions.

• But there are many constraints we cannot express by using only these special forms.

• EX1: Every loan has atleast one customer who maintains an account with a minimum balance of Rs 1000.00

Page 33: Relational Database Design UNIT II 1. 2 Advantages of Using Database Systems Centralized control of a firm’s data Redundancy can be reduced (avoid keeping

Creating an Assertion

• When an assertion is created, the system will test it for validity.

• If the assertion is valid, then any future modification to the database is allowed only if it does not cause the assertion to be violated.

• But assertions can create a considerable amount of overhead, especially if complex assertions have been made.