fall 2004 ece569 lecture 02.1 ece 569 database system engineering fall 2004 yanyong zhang...

41
Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang www.ece.rutgers.edu/~ yyzhang Course URL www.ece.rutgers.edu/~yyzhang/fall04

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.1

ECE 569 Database System Engineering

Fall 2004

Yanyong Zhang www.ece.rutgers.edu/~yyzhang

Course URL www.ece.rutgers.edu/~yyzhang/fall04

Page 2: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.2

Today’s topic

Relational data model Different data models exist: relational, network,

hierarchical, object-oriented

Powerful, simple, declarative

Relational algebra

Page 3: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.3

Overview of Database Design

Conceptual design (ER model is used at this stage) What are the entities and relationships in the enterprise?

What information about these entities and relationships should we store in the database?

What are the integrity constraints or business rules that hold

A database ‘schema’ in the ER model can be represented pictorially

Can map an ER diagram into a relational schema

Page 4: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.4

Entity-Relationship model

To describe the conceptual scheme.

An entity is a thing that exists and is distinguishable.

Entity sets consist of a group of all “similar” entities.

Each entity has certain attributes.

A relationship among entity sets is an ordered list of entity sets.

Relationship set: collection of similar relationships An n-ary relationship set R relates n entity sets E1, …, En; each

relationship in R involves entities e1 E1, …, en En

- Same entity set could participate in different relationship sets, or in different “roles” in same set

Page 5: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.5

Example of entity-relationship model

Billed

Account Made To

Room#

Name

Address

Balance

Patient

Payment Amount

Date

Diagnosed

Disease NameTreatment

Vital Sign

Pulse

Blood

TimeFrom

Page 6: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.6

Data Model

A data model is a mathematical formalism with two parts:

A notation for describing data

A set of operations used to manipulate that data

Page 7: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.7

Why Study the Relational Model?

Most widely used model Vendors: IBM, informix, Microsoft,Oracle,Sybase,etc.

“legacy systems” in older models e.g., IBM’s IMS

Recent competitor: object-oriented model ObjectStore, Versant, Ontos

A synthesis emerging: object-relational model

- Informix Universal server, UniSQL, O2, Oracle, DB2

Page 8: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.8

Relational Database: Definitions

Relational database: a set of relations

Relation: made up of 2 parts Instance: a table, with rows and columns

- # Rows = cardinality, # columns = degree

Schema: specifies name of relation, plus name and type of each column

- e.g., students(sid:string, name:string, login:string,age:integer, gpa:real)

Can think of a relation as a set of rows or tuples (i.e., all rows are distinct)

Page 9: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.9

Set-theoretic notion of a relation

A domain D is a set of values colors = {red, blue, green}

age = set of positive integers less than 20

last_name = {a-zA-Z}+

A relation over domains D1,D2,…,Dn is a subset of D1D2D3…Dn.

Example R(colors, age) {(red,1), (blue,1),(green,1),

(red,2), (blue,2),(green,2), …}

Page 10: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.10

Set-theoretic notion of a relation (cont)

We are only interested in finite relations.

The members of a relation are called tuples.

Each relation that is a subset of the product of k domains is k-degree.

A relation can be represented as a table, where each row is a tuple and each column corresponds to one domain.

PERSON (NAME, AGE)

Name Age

Zhang 27

Merril 2

Stewart 40

Page 11: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.11

Set-theoretic notion of a relation (cont)

Because a relation is a set of tuples, the order of tuples in the table is insignificant (but the order of domains does matter).

Name Age

Zhang 27

Merril 2

Stewart 40

Name Age

Zhang 27

Stewart 40

Merril 2

Age Name

27 Zhang

2 Merill

40 Stewart

Page 12: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.12

An alternative definition – set of mappings

A relation schema R = {A1,A2,…,An} (no order imposed on attributes)

A relation instance of R is a finite set of mappings {1, 2, …, m} where

2: R D

where D = null dom(A1) dom(A2)… dom(An).

Each tuple i maps each attribute in tuple i to a value.

Note that attributes must be given names that are unique in a relation schema.

Under this definition, all three instances above are equivalent.

Page 13: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.13

Keys

A set S of attributes is a candidate key for R if No semantically correct instance of R can include two

tuples such that [s] = [s], and

No proper subset of S satisfies (a)

A set of attributes satisfying (a) but not necessarily (b) is a superkey.

A relation may have multiple candidate keys. One can be designated as the primary key.

Page 14: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.14

Example

Schema for a relation of degree four patient(name,address,balance_due,room#)

An instance of patient with three tuples that satisfies the key constraint (name is a candidate key)

patient Name Address Balance_due Room #

Zhang 1964 highland dr $5,230 518

Stewart 2342 K St. $25 203

Brinkley 2342 K St. $10,200 203

No other candidate key is possible (assuming that this is a semantically correct instance of patient).

Page 15: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.15

Example – Healthcare DB

patient(name, address, balance_due, room#)

payments(name, amount, date)

vital_signs(name, pulse, blood, date, time)

diagnosis(patient_name, disease_name)

disease(disease_name, treatment)

treats(patient, doctor)

Page 16: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.16

Example instance of students relation

Sid Name Login Age

Gpa

53666 Jones jones@cse 18 3.4

53688 Smith smith@eecs 18 3.2

53650 Smith smith@math 19 3.8

Page 17: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.17

Relational Query Language

A major strength of the relational model: supports simple, powerful querying of data

Queries can be written intuitively, and the DBMS is responsible for efficient evaluation

Key: precise semantics for relational queries

Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change

Page 18: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.18

The SQL Query Language

Developed by IBM (system R) in the 1970s

Need for a standard since it is used by many vendors

Standards SQL-86

SQL-89 (minor revision)

SQL-92 (major revision)

SQL-99 (major extensions, current standard)

Page 19: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.19

The SQL Query Language

To find all 18 year old students, we can write

To find just names and logins, replace the first line:

Sid Name Login Age Gpa

53666 Jones jones@cse 18 3.4

53688 Smith smith@eecs 18 3.2

select *from student S

where s.age=18

select s.name, s.login

Page 20: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.20

Logical DB design

Representing entity-relationship diagrams An entity set E can be represented by a relation whose

relation scheme consists of all the attributes of the entity set.

Each tuple of the relation represents one entity in the current instance of E.

A relationship R among E1, E2, …, Ek is represented by a relation whose relation scheme consists of the attributes in the keys for each of E1, E2, …, Ek.

Page 21: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.21

Relational Query language

Query language: Allow manipulation and retrieval of data from a database

Relational model supports simple, powerful QLs: Strong formal foundation based on logic

Allows for much optimization

Query language != programming language

Page 22: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.22

Formal Relational Query Languages

Two mathematical query languages form the basis for “real” languages (e.g., SQL) and for implementation:

Relational algebra: more operational, very useful for representing execution plans

Relational calculus: let users describe what they want, rather than how to compute it (Non-operational, declarative)

Page 23: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.23

Relational Algebra

A set of five basic operations that map one or more relations to a new relation.

Union: R S t tR or tS The set of tuples in R or S or both

The degrees of R and S must match

A B C

a b c

a d c

e a b

D E

d e

b c

F G H

d e b

a b c

a b b

d b bRS

T

a b c

a d c

e a b

d e b

a b b

d b b

R T

Page 24: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.24

Relational Algebra (cont)

Difference: R - S t tR and tS The set of tuples in R, but not in S

The degrees of R and S must match

A B C

a b c

a d c

e a b

D E

d e

b c

F G H

d e b

a b c

a b b

d b bRS

T

a d c

e a b

R - T

Page 25: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.25

Relational Algebra (cont)

Cartesian Product: R X S

(a1, a2, …, ar+s) (a1, a2, …, ar) R and (ar+1, ar+2, …, ar+s) S

R x S is of degree r+s where r=degree(R) and s=degree(s)

a tuple t is in R x S if its first r components match those of a tuple in R and its last s components from a tuple in S

A B C

a b c

a d c

e a b

D E

d e

b c

F G H

d e b

a b c

a b b

d b bRS

T

A B C D E

a b c d e

a b c b c

a d c d e

a d c b c

e a b d e

e a b b c

R x S

R.A R.B R.C S.D S.E

Page 26: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.26

Relational Algebra (cont)

Projection: i1, i2, …, im (R) (ai1, ai2, …, aim) (a1, a2, …, an) R

i1, i2, …, im (R) is of degree m

A tuple t in the result is obtained by removing and/or rearranging the attributes of a tuple of R

A B C

a b c

a d c

e a b

D E

d e

b c

F G H

d e b

a b c

a b b

d b bRS

T

G F

e d

b a

b d

2,1 (T)

Page 27: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.27

Relational Algebra (cont)

Selection: F(R) t t R and F(t)

F(R) is of the same degree as R

All tuples in R that satisfy predicate F are included in result

F is a formula involving

- Constants or components of the tuple – component i is denoted $I

- Comparison operators, <, =, >, , ,

- Logical operators and, or, not

F G H

d e b

a b c

a b b

d b b

T

F G H

d e b

a b b

d b b

$1=d or $2=$3 (T)

Page 28: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.28

Examples

Find the names of all patients with a balance of more than $10,000.

Find the names of all patients that have a pulse less than 50 or have been diagnosed with hypertension.

Page 29: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.29

Examples (cont)

Find all patients whose treatment includes the “application of leeches”

Print the names of all pairs of patients that occupy the same hospital room and have been diagnosed with the same disease.

Page 30: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.30

Examples (cont)

Print the name of the patient with the highest pulse recorded today

Find all patients that had pulse measured today

Find the set of all ‘losers’, i.e., those whose pulse is less than or equal to that of some other patient

Subtract (b) from (a) and project out patient name

Page 31: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.31

Examples (cont)

List all patients that suffer from at least one illness that no other patient suffers from

Page 32: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.32

Additional Operations

Intersection: R S t tR and tS The set of tuples in both R and S

The degrees of R and S must match

R S = R – (R – S) SR

R – ( R – S)

R – S

Page 33: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.33

Additional Operations (cont)

Theta-join: R ij S $i $r+j(R S) where r = degree(R) and <, =, >, , ,

When is ‘=‘ operation is called equijoin.

Find patient with highest pulse

Page 34: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.34

Additional Operations (cont)

Natural Join: R S An equijoin for all attributes that R and S have in

common.

The shared columns originating in S are projected out

R S i1, i2, …, im R.A1 = S. A1 and … and R.Ak = S.Ak (R x S) where i1, i2, … im is the list of all attributes of R x S, in order, except the attributes S.A1, …, S.Ak.

Example: Names of patients being treated with leeches.

Page 35: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.35

Sample Queries

Consider the following set of relations

frequents (drinker, bar)

serves (bar, beer)

likes (drinker, beer)

List the drinkers that frequent at least one bar that serves a beer they like

Construct the relation should_visit(drinker,bar) consisting of all tuples <d, b> where bar b serves a beer that drinker d likes.

Page 36: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.36

Sample Queries (cont)

List the drinkers that frequent only bars that serve some beer they like. (Assume every drinker frequents at least one bar.)

Print the drinkers that frequent no bar that serves a beer that they like

Page 37: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.37

Questions on Healthcare DB

Retrieve all pairs of patients that shared a room at the same time

List all patients that finished paying their bill before one of their roomates began paying theirs.

Print the names of all pairs of patients that occupy the same hospital room and have been diagnosed with the same disease.

Page 38: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.38

Components of Data-Intensive Systems

Three separate types of functionalities Data management

Application logic

Presentation

The system architecture determines whether these three components reside on a single system (“tier”) or are distributed across several tiers

Page 39: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.39

Single-tier architecture

All functionality combined into a single tier, usually on a mainframe

Users access through dumb terminals

Advantages: Easy maintenance and administration

Disadvantages: Today, users expect graphical user interface

Centralized computation of all of them is too much for a central system

Page 40: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.40

Client-Server Architecture

Thin client: work division Client implements only the graphical user interface

Server implements business logic and data management

Thick client: work division

- Client implements both the graphical user interface and the business logic

- Server implements data management

Disadvantages:

- No central place to update the business logic

- Security issues: server needs to trust clients

- Scalability: does not scale more than several 100s of clients

Page 41: Fall 2004 ECE569 Lecture 02.1 ECE 569 Database System Engineering Fall 2004 Yanyong Zhang yyzhangyyzhang Course

Fall 2004ECE569 Lecture 02.41

The three-tier architecture

Client program (web browser)

Application Server(Tomcat, Apache)

Database System(DB2)