mis 335 - database systemsmisprivate.boun.edu.tr/durahim/lectures/mis335-w3-relationalmodel.pdf ·...

52
MIS 335 - Database Systems Relational Model Ahmet Onur Durahim http://www.mis.boun.edu.tr/durahim/

Upload: nguyenxuyen

Post on 29-Jul-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Learning Objectives

• Relational Model Definitions

• SQL (DDL) – Create, Drop

• SQL (DML) – Add, Delete, Update

• Key Constraints and Referential Integrity

• Views and Security

• Convert ER models to Relational models

Relational Model • Introduced by Edgar F. “Ted” Codd in 1970

– Implemented by IBM (system R)

• Most widely used model. – Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc.

• “Legacy systems” in older models – IBM’s IMS / CODASYL (Hierarchical / Network data model)

• Major advantages – Simple data representation – The ease with which complex queries can be expressed

• Recent competitor: object-oriented model – ObjectStore, Versant, Ontos – A synthesis emerging: object-relational model

• Informix Universal Server, UniSQL, O2, Oracle, DB2

Relational DB Definitions

• Relational database: a set of relations

• Relation: made up of 2 parts: – Instance: a table, with rows and columns.

• #Rows = cardinality, #fields = degree / arity

– Schema: specifies name of relation, plus name and type and domain of each column/field. • e.g. Students (sid: string, name: string, login: string,

age: integer, gpa: real)

• Can think of a relation as a set of rows (tuples) (i.e., all rows are distinct)

Example Instance of Students Relation

• Cardinality = 3, degree = 5, all rows distinct

• Do all columns in a relation instance have to be distinct?

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.2

53650 Smith smith@math 19 3.8

Relational Query Languages

• A major strength of the relational model

– supports simple, powerful querying of data

• Queries can be written intuitively, and the DBMS is responsible for efficient evaluation

– The key: precise semantics for relational queries

– Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change

The SQL Query Language

• Developed by IBM (system R) in the 1970s

• Need for a standard since it is used by many vendors

• Standards: – SQL-86

– SQL-89 (minor revision)

– SQL-92 (major revision)

– SQL-99 (major extensions, current standard)

– SQL-2003/2006/2008/2011

SQL Standards Year Name Alias Comments

1986 SQL-86 SQL-86 First formalized by ANSI

1989 SQL-89 FIPS 127-1 Minor revision, in which the major addition were integrity constraints

1992 SQL-92 SQL2, FIPS 127-2

Major revision (ISO 9075), Entry Level SQL-92 adopted as FIPS 127-2

1999 SQL:1999 SQL3

Added regular expression matching, recursive queries (e.g. transitive closure), triggers, support for procedural and control-of-flow statements, non-scalar types, and some object-oriented features (e.g. structured types). Support for embedding SQL in Java (SQL/OLB) and vice-versa (SQL/JRT)

2003 SQL:2003 SQL 2003

Introduced XML-related features (SQL/XML), window functions, standardized sequences, and columns with auto-generated values (including identity-columns)

2006 SQL:2006 SQL 2006

ISO/IEC 9075-14:2006 defines ways in which SQL can be used in conjunction with XML. It defines ways of importing and storing XML data in an SQL database, manipulating it within the database and publishing both XML and conventional SQL-data in XML form. In addition, it enables applications to integrate into their SQL code the use of XQuery, the XML Query Language published by the World Wide Web Consortium (W3C), to concurrently access ordinary SQL-data and XML documents.

2008 SQL:2008 SQL 2008 Legalizes ORDER BY outside cursor definitions. Adds INSTEAD OF triggers. Adds the TRUNCATE statement

2011 SQL:2011

source: Wikipedia

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Shero shero@cs 18 3.2

53650 Shero shero@math 19 3.8

The SQL Query Language

• Find all 18 year old students

SELECT *

FROM Students S

WHERE S.age=18

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

The SQL Query Language

• Find just names and logins

SELECT S.name, S.login

FROM Students S

WHERE S.age=18

name login

Jones jones@cs

Shero shero@cs

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Shero shero@cs 18 3.2

53650 Shero shero@math 19 3.8

Querying Multiple Relations

sid cid grade

53831 Carnatic101 C

53831 Reggae203 B

53650 Topology112 A

53666 History105 B

sid name login age gpa

53666 Jones jones@cs 18 3.4

53650 Shero shero@cs 18 3.2

Students Enrolled

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=“A”

S.name E.cid

Shero Topology112

Names and cids of Students who had an A from the cid they are enrolled

Creating Relations in SQL

CREATE TABLE Students (sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL)

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Shero shero@cs 18 3.2

• Creates the Students relation • Observe that the type (domain) of each field is

specified, and enforced by the DBMS whenever tuples are added or modified

Creating Relations in SQL

CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2))

sid cid grade

53666 MS 413 BA

53688 MIS 335 BB

• Creates the Enrolled relation

• The Enrolled table holds information about courses that students take

Destroying and Altering Relations

• Destroys the relation Enrolled

• The schema information and the tuples are deleted

DROP TABLE Enrolled

• The schema of Students is altered by adding a new field

• Every tuple in the current instance is extended with a null value in the new field

ALTER TABLE Students ADD COLUMN firstYear:integer

sid name login age firstYear gpa

53666 Jones jones@cs 18 null 3.4

53688 Shero shero@cs 18 null 3.2

ALTER TABLE Students DROP COLUMN age

Adding and Deleting Tuples

• Can insert a single value

INSERT INTO Students(sid, name, login, age, gpa) VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

Adding and Deleting Tuples

• Can delete all tuples satisfying some condition

– name = ‘Ahmet’

DELETE FROM Students S WHERE S.name = ‘Ahmet’

• Powerful variants of these commands are available – more later

Updating Tuples

• Can modify/update some attributes of a tuple satisfying some condition – name = ‘Ahmet’, update age to 20 and increase gpa by 0.3

UPDATE Students S SET S.age = 20, S.gpa = S.gpa + 0.3 WHERE S.name = ‘Ahmet’

sid name login age firstYear gpa

53666 Jones jones@cs 18 null 3.4

53688 Ahmet ahmet@cs 18 null 3.2

sid name login age firstYear gpa

53666 Jones jones@cs 18 null 3.4

53688 Ahmet ahmet@cs 20 null 3.5

Integrity Constraints (ICs) • IC: condition specified on a DB schema that must be

true for any instance of the database – e.g., domain constraints – ICs are specified when schema is defined. – ICs are checked when relations are modified.

• A legal instance of a relation is one that satisfies all specified ICs. – DBMS should not allow illegal instances.

• DBMS checks for violations of ICs and disallows changes to the data that violate the specified ICs – Stored data is more faithful to real-world meaning – Avoids data entry errors, too!

Primary Key Constraints • A set of fields is a key for a relation if:

1. No two distinct tuples can have same values in all key fields, and

2. This is not true for any subset of the key

• If Part 2 false? => A superkey • If there’s > 1 key for a relation, one of the keys is

chosen (by DBA) to be the primary key – e.g., sid is a key for Students. (What about name?) – The set {sid, gpa} is a superkey

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Shero shero@cs 18 3.2

Primary and Candidate Key in SQL

• For a given student and course, there is a single grade

CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2) PRIMARY KEY (sid, cid))

Primary and Candidate Key in SQL

• Possibly many candidate keys (specified using UNIQUE), one of which is chosen as the primary key

• Used carelessly, an IC can prevent the storage of

database instances that arise in practice!

CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade))

• Students can take only one course, and receive a single grade for that course

• Further, no two students in a course receive the same grade

Foreign Keys, Referential Integrity

• Foreign key: Set of fields in one relation that is used to `refer’ to a tuple in another relation. – Must correspond to primary key of the second relation

– Like a `logical pointer’

• E.g. sid is a foreign key referring to Students:

sid cid grade

53831 Carnatic101 C

53831 Reggae203 B

53650 Topology112 A

53666 History105 B

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Shero shero@cs 18 3.2

Students

Enrolled

Foreign Keys, Referential Integrity • If all foreign key constraints are enforced, referential

integrity is achieved, – i.e., no dangling references.

• The foreign key in the referencing (Enrolled) relation must match the primary key of the referenced relation (Students) – Have the same number of columns and compatible data

types

stdid cid grade

53831 Carnatic101 C

53831 Reggae203 B

53650 Topology112 A

53666 History105 B

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Shero shero@cs 18 3.2

Students

Enrolled

Foreign key Primary key

Foreign Keys in SQL • Only students listed in the Students relation should

be allowed to enroll for courses

CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES STUDENTS)

sid cid grade

53650 Carnatic101 C

53666 Reggae203 B

53650 Topology112 A

53666 History105 B

sid name login age gpa

53666 Jones jones@cs 18 3.4

53650 Shero shero@cs 18 3.2

Students Enrolled

Enforcing Referential Integrity

• Consider Students and Enrolled

– sid in Enrolled is a foreign key that references Students

Students

Enrolled

• What should be done if an Enrolled tuple with a non-existent student id is inserted?

• Reject it! sid cid grade

53650 Carnatic101 C

53666 Reggae203 B

53650 Topology112 A

53666 History105 B

sid name login age gpa

53666 Jones jones@cs 18 3.4

53650 Shero shero@cs 18 3.2

Enforcing Referential Integrity • What should be done if a

Students tuple (with sid=53650) is deleted?

– Also delete all Enrolled tuples that refer to it

– Disallow deletion of a Students tuple that is referred to

– Set sid in Enrolled tuples that refer to it to a default sid (=99999)

– In SQL, also: Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’

sid cid grade

53650 Carnatic101 C

53666 Reggae203 B

53650 Topology112 A

53666 History105 B

sid name login age gpa

53666 Jones jones@cs 18 3.4

53650 Shero shero@cs 18 3.2

sid cid grade

99999 Carnatic101 C

53666 Reggae203 B

99999 Topology112 A

53666 History105 B

sid cid grade

null Carnatic101 C

53666 Reggae203 B

null Topology112 A

53666 History105 B

Enforcing Referential Integrity

• What should be done if the primary key of Students tuple is updated

• We have options similar to the previous case

– Also update all Enrolled tuples that refer to it

– Disallow update of a Students tuple that is referred to

– Set sid in Enrolled tuples that refer to it to a default sid

– Set sid in Enrolled tuples that refer to it to a special value null, denoting `unknown’ or `inapplicable’

Referential Integrity in SQL

• SQL/92 and SQL:1999 support all 4 options on deletes and updates

– Default is NO ACTION (delete/update is rejected)

– CASCADE (also delete all tuples that refer to deleted tuple)

– SET NULL / SET DEFAULT (sets foreign key value of referencing tuple)

CREATE TABLE Enrolled (sid: CHAR(20), cid: CHAR(20), grade: CHAR(2), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES STUDENTS ON DELETE CASCADE ON UPDATE SET DEFAULT)

Where do ICs Come From? • ICs are based upon the semantics of the real-world

enterprise that is being described in the database relations.

• Domain, Primary key and Foreign key ICs are the most common; but it may be necessary to specify more general ICs; – Student ages be within a certain range of values

• All students must be at least 16 years old • The DBMS rejects inserts and updates that violate the constraint

– Table Constraints: Associated with a single table and checked whenever that table is modified

– Assertions: Involve several tables and are checked whenever any of these tables is modified

Views • A view is just a relation, but we store a definition,

rather than a set of tuples

• Views can be dropped using the DROP VIEW command – How to handle DROP TABLE if there’s a view on the table?

• DROP TABLE command has options to let the user specify this

CREATE VIEW YoungActiveStudents (name, grade) AS SELECT S.name, E.grade FROM Students S, Enrolled E WHERE S.sid = E.sid and S.age < 21

Views and Security

• Used to restrict data access and/or simplify data access

– Views can be used to present necessary information (or a summary), while hiding details in underlying relation(s)

– Given YoungActiveStudents, but not Students or Enrolled, we can find students who have enrolled, but not the cid’s of the courses they are enrolled in

Logical DB Design: ER to Relational

Entity sets to tables

Employees

ssn

name

lot

CREATE TABLE Employees (ssn: CHAR(11), name: CHAR(20), lot: INTEGER, PRIMARY KEY (ssn))

Relationship Sets to Tables

In translating a relationship set to a relation, attributes of the relation must include:

– Keys for each participating entity set (as foreign keys) • This set of attributes forms a

superkey for the relation

– All descriptive attributes

Works_In Departments

did dname since

budget

Employees

ssn name

lot

CREATE TABLE Works_In (ssn: CHAR(11), did: INTEGER, since: DATE, PRIMARY KEY (ssn, did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)

Relationship Sets to Tables

Unary relationships

CREATE TABLE Reports_To (supervisor_ssn: CHAR(11), subordinate_ssn: CHAR(11), PRIMARY KEY (supervisor_ssn, subordinate_ssn) FOREIGN KEY (supervisor_ssn) REFERENCES Employees(ssn), FOREIGN KEY (subordinate_ssn) REFERENCES Employees(ssn))

Reports_To

Employees name ssn

supervisor subordinate

Review: Key Constraints

• Each dept has at most one manager, according to the key constraint on Manages

Manages Departments

did dname since

budget

Employees

ssn name

lot

Many-to-Many 1-to-1 1-to-Many Many-to-1

Translation to relational model?

CREATE TABLE Manages (ssn CHAR(11), did INTEGER, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)

Review: Key Constraints

• Map relationship to a table: – Because each department

has at most one manager, no two tuples can have the same did values but differ on the ssn value

– Note that did is the key now – Separate tables for

Employees and Departments

Manages Departments

did dname since

budget

Employees

ssn name

lot

CREATE TABLE Dept_Mgr (did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11), since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees)

Review: Key Constraints

• Since each dept has a unique manager, we could instead combine Manages and Departments

Manages Departments

did dname since

budget

Employees

ssn name

lot

Review: Participation Constraints

• Does every department have a manager? – If so, this is a participation constraint – The participation of Departments in Manages is said to be

total (vs. partial) • Every did value in Departments table must appear in a row of the

Manages table (with a non-null ssn value!)

Works_In

since

Manages Departments

name ssn since

Employees

dname did

since lot

Participation Constraints in SQL

• We can capture participation constraints involving one entity set in a binary relationship, but little else (without resorting to CHECK constraints)

CREATE TABLE Dept_Mgr ( did INTEGER, dname CHAR(20), budget REAL, ssn CHAR(11) NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees ON DELETE NO ACTION)

Participation Constraints in SQL • We cannot capture participation constraints using the

translation approach that creates a distinct table for the relation • The NOT NULL constraint would prevent the firing of a manger,

but does not ensure that a manager is initially appointed for each department CREATE TABLE Manages

(ssn CHAR(11) NOT NULL, did INTEGER NOT NULL, since DATE, PRIMARY KEY (did), FOREIGN KEY (ssn) REFERENCES Employees, FOREIGN KEY (did) REFERENCES Departments)

CREATE TABLE Department (did INTEGER, ….

Assertion to capture Works_In total participation

• If we want to enforce total participation constraint such as “each employee must work at at least one department”, we can use assertions as follows;

CREATE TABLE Product (id INT NOT NULL, category INT NOT NULL, price DECIMAL, PRIMARY KEY (id, category) )

CREATE TABLE Product_Order (no INT NOT NULL, prd_cat INT NOT NULL, prdid INT NOT NULL, custid INT NOT NULL, PRIMARY KEY (no), FOREIGN KEY (prd_cat, prdid) REFERENCES Product(category, id) ON UPDATE CASCADE FOREIGN KEY (custid) REFERENCES Customer(id)) )

CREATE TABLE Customer (id INT NOT NULL, PRIMARY KEY (id) )

Review: Weak Entities

• A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. – Owner entity set and weak entity set must participate in a

one-to-many relationship set (1 owner, many weak entities)

– Weak entity set must have total participation in this identifying relationship set

– It has both a key and total participation constraints

Policy Dependents

name ssn cost

Employees

pname age

Dependents Policy

name

Translating Weak Entity Sets

• Weak entity set and identifying relationship set are translated into a single table – When the owner entity is deleted, all owned weak

entities must also be deleted

CREATE TABLE Depd_Policy ( pname CHAR(20), age INTEGER, cost REAL, ssn CHAR(11), PRIMARY KEY (pname, ssn), FOREIGN KEY (ssn) REFERENCES Employees ON DELETE CASCADE)

ssn cannot be null since it is part of the PK

Review: ISA Hierarchies

• As in C++, or other PLs, attributes are inherited

• If we declare A ISA B, every A entity is also considered to be a B entity

Contract_Emps

name ssn

Employees

contractid

Hourly_Emps

hours_worked hourly_wages

ISA

• Overlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed)

• Covering constraints: Does every Employees entity also have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no)

name

Translating ISA Hierarchies to Relations

• General approach: 3 relations: map each entity sets Employees, Hourly_Emps and Contract_Emps to a distinct relation – Hourly_Emps: Every employee is recorded in Employees.

• For hourly emps, extra info recorded in Hourly_Emps (hourly_wages, hours_worked, ssn);

• must delete Hourly_Emps tuple if referenced Employees tuple is deleted

– Queries involving all employees are easy, but those involving just Hourly_Emps require a join to get some attributes

• Alternative: Just Hourly_Emps and Contract_Emps – Hourly_Emps: ssn, name, lot, hourly_wages, hours_worked

– Each employee must be in one of these two subclasses

Aggregation • Employees, Projects, and Departments entity

sets and the Sponsors relationship set are map as described previously

• For the monitors relationship set, create a relation with the following attributes; – The key attributes of Employees => ssn – The key attributes of Sponsors => (did, pid) – The descriptive attributes of Monitors => until

Departments

name

ssn

Employees

Projects

pbudget

started_on

Monitors

Sponsors

since

pid

until

budget

did dname

Review: Binary vs. Ternary Relationships

Covers

Policies

Employees Dependents

name

ssn

policyid cost

pname age

Purchaser

Policies

Employees Dependents

name

ssn

policyid cost

pname age

Beneficiary

Better design

Bad design

• If we have additional requirements; – A policy cannot be owned jointly by

two or more employees – Every policy must be owned by

some employee – Dependents is a weak entity, and

uniquely identified by taking pname in conjunction with policyid of a policy entity

• ER diagram is inaccurate

• What are the additional constraints in the 2nd diagram?

Binary vs. Ternary Relationships

• The key constraints allow us to combine Purchaser with Policies and Beneficiary with Dependents

• Participation constraints lead to NOT NULL constraints

• What if Policies is a weak entity set?

CREATE TABLE Policies ( policyid INTEGER, cost REAL, ssn CHAR(11) NOT NULL, PRIMARY KEY (policyid), FOREIGN KEY (ssn) REFERENCES Employees ON DELETE CASCADE)

CREATE TABLE Dependents ( pname CHAR(20), age INTEGER, policyid INTEGER, PRIMARY KEY (pname, policyid), FOREIGN KEY (policyid) REFERENCES Policies ON DELETE CASCADE)

Relational Model: Summary • A tabular representation of data

• Simple and intuitive, currently the most widely used

• Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations – Two important ICs: primary and foreign keys

– In addition, we always have domain constraints

• Powerful and natural query languages exist

• Rules to translate ER to relational model

Exercise - 1 • Answer following questions that are based on the following

relational schema:

• Give an example of a foreign key constraint that involves the Dept relation. What are the options for enforcing this constraint when a user attempts to delete a Dept tuple?

• Write the SQL statements required to create the preceding relations, including appropriate versions of all primary and foreign key integrity constraints

• Define the Dept relation in SQL so that every department is guaranteed to have a manager

Exercise - 1 • Answer following questions that are based on the following

relational schema:

• Write an SQL statement to add John Doe as an employee with eid = 101, age = 32 and salary = 15,000

• Write an SQL statement to give every employee a 10 percent raise

• Write an SQL statement to delete the Toy department. Given the referential integrity constraints you chose for this schema, explain what happens when this statement is executed