1 databases ii (fall 2009) professor: iluju kiringa [email protected] [email protected]...

28
Databases II (Fall 2009) Professor: Iluju Kiringa [email protected] http://www.site.uottawa.ca/ ~kiringa SITE 5072

Upload: randell-tyler

Post on 19-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

1

Databases II(Fall 2009)

Professor: Iluju [email protected]

http://www.site.uottawa.ca/~kiringaSITE 5072

Page 2: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

2

Review of Databases I

Chapters 2-5

Page 3: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

3

What is a DBMS? A centralized database is a large

collection of integrated data. Organizations face large quantities of

data that must be efficiently managed. Many store GBs, even TBs of data Some scientific applications store PBs of

data ! A Database Management System

(DBMS) is a software package designed to store and manage databases.

Page 4: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

4

Content of « Databases I » Foundations of DBMSs (Chap. 2-5, 19): :

Conceptual design - Chap. 2: • Input: requirements of an application • Output: ER diagrams (i.e., ER model)

Logical design – Chap. 3: • Input: ER diagrammes • Output: Relational model

Normalization – Chap. 19:• Input: Relational model • Output: Relational model in normal forms

Relational algebra and calculus – Chap. 4 SQL – Chap. 5

Application development (Chap. 6-7) Classic applications: Embeded SQL, JDBC, SQLJ, stored

procedures Internet applications : HTTP, HTML, XML, 3-tier architecture

Page 5: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

5

Content of « Databases I » (Cont’d) Storage and Indexing (Chap. 8-11):

Disks and files – Chap. 9:• Memory hierarchy• Disk and file management

Tree index – Chap. 10:• ISAM tree• B+ tree

Hasch index – Chap. 11:• static • extendible• linear

Page 6: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

6

Content of « Databases II » Query evaluation (Chap. 12-15)

External sort - Chap. 13 Evaluation of relational operators -- Chap. 14 System R: a sample evaluation algorithm –

Chap. 15 Transactions (Chap. 16-18)

Concurrency control – Chap. 17 Recovery – Chap. 18

Advanced topics (Chap. 22,25, 26, 23 ou 27) Distributed databases – Chap. 22 Data Warehousing – Chap. 25 Data Mining Chap. 26

Page 7: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

7

Overview of Database Design Requirements analysis: Find out what users want to do

with the database Conceptual design: Use the output of RA to develop a

high-level description of the data to be stored, along with their constraints. Output of CD usually is an ER-diagram.

Logical design: Choose a DBMS and map the conceptual schema (ER-diagram) into the data model of the DBMS. Output of this step is the logical schema of the data.

Schema refinement: Analyze the logical schema, identify potential problems in it, and fix these by refining the logical schema using known normal forms.

Physical design: Consider expected workloads that the database will support to further refine the design in order to meet desired performance criteria. Output here is the physical schema.

Page 8: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

8

ER Model The model

Basic elements: • Entity: real-world object• Relationship: association between entities• Attributes: description of an entity or a relationship

Advanced elements: • Constraints: qualification of a relationship• Weak entity : Identifiable only via another entity • Hierarchy : similar to OO-languages• Agregation: sort of macros

Conceptual design using the ER model Should a given concept be modelled as an entity or an attribute? Should a given concept be modelled as an entity or a

relationship? Should a given concept be modelled as a binary or as a ternary

relationship? etc

Page 9: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

9

Relational Database Concepts Relation: made up of 2 parts: Instance : a table, with rows and columns.

#Rows = cardinality, #fields = degree / arity. Schema : specifies name of relation, plus name and domain

(type) of each column (attribute). Can think of a relation as a set of rows or tuples (i.e., all

rows are distinct), where each tuple has the same arity as the relation schema.

Relational database: a set of relations, each with distinct name.

Relational DB schema: set of schemas of relations in the DB.

Relational DB instance: set of relation instances in the DB.

Page 10: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

10

Sample Relation

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8

Cardinality = 3, arity = 5, all rows distinct. Commercial systems allow duplicates. Order of attributes may or may not matter! Do all columns in a relation instance have to be distinct? Depends on whether they are ordered or not.

Schema : Students(sid: string, name: string, login: string,

age: integer, gpa: real).Instance :

Page 11: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

11

Creation and Alteration of Relations

CREATE TABLE Students CREATE TABLE Enrolled(sid: CHAR(20), (sid: CHAR(20), name: CHAR(20), cid: CHAR(20), login: CHAR(10), grade: CHAR(2)) age: INTEGER, gpa: REAL)

DROP TABLE Students ALTER TABLE Students ADD COLUMN firstYear: integer

INSERT INTO Students (sid, name, login, age, gpa) VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)

DELETE FROM Students S WHERE S.name = ‘Smith’

Page 12: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

12

Primary Key and Foreign Key in SQL

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid) )

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade) )

CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students )

Page 13: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

13

Logical DB Design: ER to Relational

The ER model represent the initial, high-level database design.

The task is to generate a relational database schema that is as close as possible to the ER model.

The mapping is approximate since it is hard to translate all the constraints of the ER model into an efficient logical (relational) model.

Page 14: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

14

Formal Relational Query Languages Two mathematical Query Languages

form the basis for “real” languages (e.g. SQL), and for implementation:

Relational Algebra: More operational, very useful for representing execution plans.

Relational Calculus: Lets users describe what they want, rather than how to compute it (Non-operational, declarative).

Page 15: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

15

Relational Algebra Basic operations:

Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cross-product ( ) Allows us to combine two relations. Set-difference ( ) Gives tuples in 1st rel., but not in 2nd rel. Union ( ) Gives tuples in rel. 1 and in rel. 2.

Additional operations: Intersection, join, division, renaming: Not (theoretically)

essential, but (practically) very useful. Since each operation returns a relation, operations can

be composed! (Algebra is “closed”.)

Page 16: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

16

Cross-Product Each row of S1 is paired with each row of

R1. Result schema has one field per field of S1

and R1, with field names `inherited’ if possible. Conflict: Both S1 and R1 have a field

called sid.

( ( , ), )C sid sid S R1 1 5 2 1 1

(sid) sname rating age (sid) bid day

22 dustin 7 45.0 22 101 10/ 10/ 96

22 dustin 7 45.0 58 103 11/ 12/ 96

31 lubber 8 55.5 22 101 10/ 10/ 96

31 lubber 8 55.5 58 103 11/ 12/ 96

58 rusty 10 35.0 22 101 10/ 10/ 96

58 rusty 10 35.0 58 103 11/ 12/ 96

Renaming operator:

Page 17: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

17

Joins Condition Join:

Result schema same as that of cross-product. Fewer tuples than cross-product, might be able

to compute more efficiently Sometimes called a theta-join.

R c S c R S ( )

(sid) sname rating age (sid) bid day

22 dustin 7 45.0 58 103 11/ 12/ 9631 lubber 8 55.5 58 103 11/ 12/ 96

S RS sid R sid

1 11 1

. .

Page 18: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

18

Joins (Cont’d)

Equi-Join: A special case of condition join where the condition c contains only equalities.

Result schema similar to cross-product, but only one copy of fields for which equality is specified.

Natural Join: Equijoin on all fields having the same name in both relations.

sid sname rating age bid day

22 dustin 7 45.0 101 10/ 10/ 9658 rusty 10 35.0 103 11/ 12/ 96

S Rsid

1 1

Page 19: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

19

Relational Calculus Has two flavors:

Tuple relational calculus (TRC) Domain relational calculus (DRC).

Has variables, constants, comparison ops, logical connectives, and quantifiers. TRC: Variables range over (i.e., get bound to) tuples. DRC: Variables range over domain elements (= field

values). Both TRC and DRC are simple subsets of first-order logic.

Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true.

Page 20: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

20

Tuple Relational Calculus

Query has the form:

)(| tpt

Answer includes all tuples t that make the formula p(t) be true.

Formula is recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger formulas using the logical connectives.

Page 21: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

21

TRC Formulas Atomic formula:

, or R.a op S.b, or R.a op constant op is one of

Formula: an atomic formula, or , where p and q are formulas, or , where variable R is free in p(R), or , where variable R is free in p(R)

The use of quantifiers and is said to bind R. A variable that is not bound is free.

RnameR , , , , ,

p p q p q, ,))(( RpR))(( RpR

R R

Page 22: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

22

Overview: Features of SQL

Data definition language: used to create, destroy, and modify tables and views.

Data manipulation language: used to pose queries, and to insert, delete, and modify rows.

Triggers and advanced integrity constraints: used to specify actions that the DBMS will execute automatically.

Embeddded SQL: allows SQL to be called from a host language, or

Dynamic SQL: allows run-time creation and execution of queries .

Page 23: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

23

Syntax of Basic SQL Query

relation-list A list of relation names (possibly with a range-variable after each name).

target-list A list of attributes of relations in relation-list

qualification Comparisons (Attr op const or Attr1 op Attr2, where op is one of ) combined using AND, OR and NOT.

DISTINCT is an optional keyword indicating that the answer should not contain duplicates.

SELECT [DISTINCT] target-listFROM relation-listWHERE qualification

, , , , ,

Page 24: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

24

Sample Query and Conceptual Evaluation

SELECT S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid AND R.bid=103

(sid) sname rating age (sid) bid day

22 dustin 7 45.0 22 101 10/ 10/ 96

22 dustin 7 45.0 58 103 11/ 12/ 96

31 lubber 8 55.5 22 101 10/ 10/ 96

31 lubber 8 55.5 58 103 11/ 12/ 96

58 rusty 10 35.0 22 101 10/ 10/ 96

58 rusty 10 35.0 58 103 11/ 12/ 96

Page 25: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

25

Aggregate Operators Significant extension of

relational algebra

COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)

SELECT AVG (S.age)FROM Sailors SWHERE S.rating=10

SELECT COUNT (*)FROM Sailors S

SELECT AVG ( DISTINCT S.age)FROM Sailors SWHERE S.rating=10

SELECT S.snameFROM Sailors SWHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2)

single column

SELECT COUNT (DISTINCT S.rating)FROM Sailors SWHERE S.sname=‘Bob’

Page 26: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

26

Queries With GROUP BY and HAVING

The target-list contains (i) attribute names (ii) terms with aggregate operations (e.g., MIN (S.age)). The attribute list (i) must be a subset of grouping-list.

Intuitively, each answer tuple corresponds to a group, and these attributes must have a single value per group. (A group is a set of tuples that have the same value for all attributes in grouping-list.)

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification

Page 27: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

27

Find the age of the youngest sailor with age 18, for each rating with at least 2 such sailors

Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes `unnecessary’.

2nd column of result is unnamed. (Use AS to name it.)

SELECT S.rating, MIN (S.age)FROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) > 1

sid sname rating age22 dustin 7 45.031 lubber 8 55.571 zorba 10 16.064 horatio 7 35.029 brutus 1 33.058 rusty 10 35.0

rating age1 33.07 45.07 35.08 55.510 35.0

rating7 35.0

Answer relation

Page 28: 1 Databases II (Fall 2009) Professor: Iluju Kiringa kiringa@site.uottawa.ca kiringa@site.uottawa.ca kiringa SITE 5072

28

SummarySQL is more declarative than earlier, procedural query

languages. is relationally complete; in fact, significantly more

expressive power than relational algebra. In practice, users need to be aware of how queries are

optimized and evaluated for best results. has any alternative ways to write a query; optimizer

should look for most efficient evaluation plan. allows specification of rich integrity constraints. Provides triggers to respond to changes in the

database.