1.1 cas cs 460/660 introduction to database systems relational model and more…

Post on 26-Dec-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1.1

CAS CS 460/660CAS CS 460/660Introduction to Database SystemsIntroduction to Database Systems

Relational ModelRelational Modeland more…and more…

1.2

The Structure SpectrumThe Structure Spectrum

Structured (schema-

first)

Relational DatabaseFormatted Messages

Semi-Structured

(schema-later)

DocumentsXML

Tagged Text/Media

Unstructured (schema-

never)

Plain Text

Media

1.3

The Relational ModelThe Relational Model

The Relational Model is UbiquitousMySQL, PostgreSQL, Oracle, DB2, SQLServer, …l Foundational work done at

IBM Santa Teresa Labs (now IBM Almaden in SJ) – “System R” UC Berkeley CS – the “Ingres” System

Note: some Legacy systems use older models e.g., IBM’s IMS

Object-oriented concepts have been merged in Early work: POSTGRES research project at Berkeley Informix, IBM DB2, Oracle 8i

As has support for XML (semi-structured data)

1.4

Relational ModelRelational Model

The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F. Codd.

Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387.

1.5

Relational Database: DefinitionsRelational Database: Definitions

Relational database: a set of relations

Relation: made up of 2 parts:

Schema : specifies name of relation, plus name and type of each column

Students(sid: string, name: string, login: string, age: integer, gpa: real)

Instance : the actual data at a given time

#rows = cardinality #fields = degree / arity

1.7

Ex: Instance of Students RelationEx: Instance of Students Relation

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.2

53650 Smith smith@math 19 3.8

• Cardinality = 3, arity = 5 , all rows distinct

• Do all values in each column of a relation instance have to be unique?

1.8

SQL - A language for Relational DBsSQL - A language for Relational DBs

Say: “ess-cue-ell” or “sequel” But spelled “SQL”

Data Definition Language (DDL) create, modify, delete relations specify constraints administer users, security, etc.

Data Manipulation Language (DML) Specify queries to find tuples that satisfy criteria add, modify, remove tuples

The DBMS is responsible for efficient evaluation.

1.9

The SQL Query LanguageThe SQL Query Language

The most widely used relational query language.

Originally IBM, then ANSI in 1986

Current standard is SQL-2011 2008 added x-query stuff, new triggers,…

2003 was last major update: XML, window functions, sequences, auto-generated IDs. Not fully supported yet

SQL-1999 Introduced “Object-Relational” concepts. Also not fully supported yet.

SQL92 is a basic subset Most systems support at least this

PostgreSQL has some “unique” aspects (as do most systems).

SQL is not synonymous with Microsoft’s “SQL Server”

1.10

Creating Relations in SQLCreating Relations in SQL

Creates the Students relation. Note: the type (domain) of each field is specified, and enforced by

the DBMS whenever tuples are added or modified.

CREATE TABLE Students(sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa FLOAT)

1.11

Table Creation (continued)Table Creation (continued) Another example: the Enrolled table holds information about

courses students take.

CREATE TABLE Enrolled(sid CHAR(20), cid CHAR(20), grade CHAR(2))

1.12

Adding and Deleting TuplesAdding and Deleting Tuples Can insert a single tuple using:

INSERT INTO Students (sid, name, login, age, gpa) VALUES ('53688', 'Smith', 'smith@ee', 18, 3.2)

• Can delete all tuples satisfying some condition (e.g., name = Smith):

DELETE FROM Students S WHERE S.name = 'Smith'

Powerful variants of these commands are available; more later!

1.13

KeysKeys Keys are a way to associate tuples in different relations Keys are one form of integrity constraint (IC)

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

Enrolled Students

PRIMARY KeyFOREIGN Key

1.14

Primary KeysPrimary Keys

A set of fields is a superkey if: No two distinct tuples can have same values in all key fields

A set of fields is a key for a relation if : It is a superkey

No subset of the fields is a superkey

what if >1 key for a relation? One of the keys is chosen (by DBA) to be the primary key.

Other keys are called candidate keys.

E.g. sid is a key for Students.

What about name?

The set {sid, gpa} is a superkey.

1.15

Primary and Candidate Keys in SQLPrimary and Candidate Keys in SQL Possibly many candidate keys (specified using UNIQUE),

one of which is chosen as the primary key.

• Keys must be used carefully!• “For a given student and course, there is a single

grade.”

“Students can take only one course, and no two students in a course receive the same grade.”

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid))

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade))

vs.

1.16

Foreign Keys, Referential IntegrityForeign Keys, Referential Integrity

Foreign key: a “logical pointer” Set of fields in a tuple in one relation

that `refer’ to a tuple in another relation. Reference to primary key of the other relation.

All foreign key constraints enforced? referential integrity! i.e., no dangling references.

1.17

Foreign Keys in SQLForeign Keys in SQL E.g. Only students listed in the Students relation should be allowed

to enroll for courses. sid is a foreign key referring to Students:

CREATE TABLE Enrolled (sid CHAR(20),cid CHAR(20),grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students);

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

Enrolled

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

Students

11111 English102 A

1.18

Next UpNext Up

We’ll talk a bit about the SQL DML

Then we’ll start describing the DBMS from storage on up

1.19

The SQL DMLThe SQL DML

Single-table queries are straightforward.

To find records for all 18 year old students with gpa’s above 2.0, we can write:

SELECT * FROM Students S WHERE S.age=18

AND S.gpa > 2.0

To get just names and logins, replace the first line:SELECT S.name, S.login

1.20

Basic SQL QueriesBasic SQL Queries

SELECT [DISTINCT] target-listFROM relation-listWHERE qualification• relation-list : A list of relation names

– possibly with a range-variable after each name

• target-list : A list of attributes of tables in relation-list

• qualification : Comparisons combined using AND, OR and NOT.– Comparisons are Attr op const or Attr1 op

Attr2, where op is one of =≠<>≤≥• DISTINCT: (optional) indicates that the answer should

have no duplicates.

– In SQL SELECT, the default is that duplicates are not eliminated! (Result is called a “multiset”)

1.21

Querying Multiple RelationsQuerying Multiple Relations

Can specify a join over two tables as follows:SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

result = S.name E.cid Jones History105

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B

Note: obviously no referential integrity constraints have been used here.

1.22

Basic Query SemanticsBasic Query Semantics

The Semantics of a SQL query are defined in terms of the following conceptual evaluation strategy:

1. do FROM clause: compute cross-product of tables (e.g., Students and Enrolled).

2. do WHERE clause: Check conditions, discard tuples that fail.

3. do SELECT clause: Delete unwanted fields.

4. If DISTINCT specified, eliminate duplicate rows.

Probably the least efficient way to compute a query! • A query optimizer will find more efficient strategies to get the same answer.

1.23

Step 1 – Cross ProductStep 1 – Cross Product

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B sid name login age gpa

53666 Jones jones@cs 18 3.4

53688 Smith smith@ee 18 3.2

sid cid grade53831 Carnatic101 C53831 Reggae203 B53650 Topology112 A53666 History105 B

1.24

Step 2) Discard tuples that fail predicateStep 2) Discard tuples that fail predicate

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

1.25

Step 3) Discard Unwanted ColumnsStep 3) Discard Unwanted Columns

S.sid S.name S.login S.age S.gpa E.sid E.cid E.grade 53666 Jones jones@cs 18 3.4 53831 Carnatic101 C 53666 Jones jones@cs 18 3.4 53832 Reggae203 B 53666 Jones jones@cs 18 3.4 53650 Topology112 A 53666 Jones jones@cs 18 3.4 53666 History105 B 53688 Smith smith@ee 18 3.2 53831 Carnatic101 C 53688 Smith smith@ee 18 3.2 53831 Reggae203 B 53688 Smith smith@ee 18 3.2 53650 Topology112 A 53688 Smith smith@ee 18 3.2 53666 History105 B

SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade=‘B'

1.26

Aggregate OperatorsAggregate OperatorsFor calculation and analytics COUNT (*)

COUNT ([DISTINCT] A)SUM ([DISTINCT] A)AVG (A)MAX (A)MIN (A)

SELECT AVG (S.gpa)FROM Students SWHERE S.age=18;

SELECT COUNT (*)FROM Students;

SELECT COUNT (DISTINCT S.age)FROM Students SWHERE S.name=‘Bob’;

1.27

GROUP BY and HAVINGGROUP BY and HAVING

Sometimes, we want to apply aggs to each of several groups of tuples.

This query computes the average gpa per major (assume students have a “major” attribute)

If you want to exclude “small” majors, use Having:

SELECT S.major, AVG (S.gpa) as AvgGPAFROM Students SGROUP BY S.major ;

SELECT S.major, AVG (S.gpa) as AvgGPAFROM Students SGROUP BY S.majorHAVING COUNT (*) > 10 ;

1.28

(Slightly) Less Basic SQL Queries(Slightly) Less Basic SQL Queries

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification

1.29

Conceptual EvaluationConceptual Evaluation

The cross-product of relation-list is computed, tuples that fail qualification are discarded, `unnecessary’ fields are deleted, and the remaining tuples are partitioned into groups by the value of attributes in grouping-list.

One answer tuple is generated per qualifying group.

1.30

Conceptual Evaluation (cont.)Conceptual Evaluation (cont.)

The group-qualification is then applied to eliminate some groups.

Expressions in group-qualification must have a single value per group!

That is, attributes in group-qualification must be arguments of an aggregate op or must also appear in the grouping-list.

One answer tuple is generated per qualifying group.

1.31

Okay: Let’s start from the Okay: Let’s start from the bottom up…bottom up…

Query Optimizationand Execution

Relational Operators

Access Methods

Buffer Management

Disk Space Management

Student Records stored on disk

Database app

These layersmust considerconcurrencycontrol andrecovery

top related