1 csci485 – file & database management systems bahram zartoshty office: sal 346 phone: tba...

52
1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were developed by Professor Ghandeharizadeh

Upload: owen-quinn

Post on 27-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

1

CSCI485 – File & Database Management Systems

Bahram ZartoshtyOffice: SAL 346

Phone: TBAOffice Hours: TTH 1:15-2:50pm

Note: Parts of this lecture were developed by Professor Ghandeharizadeh

Page 2: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Logistics

Required text book: Database System Concepts, Silberschatz,

Korth & Sudarshan, Fifth edition.

Pre-req for the course: CS201: Data Structures Knowledge of an object-oriented

programming language such as C++, Java, C#

Page 3: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

3

Teaching Assistant

Shahin ShayandeOffice: (Microsoft Lab)SAL 200C

Office Hours: TBA

Page 4: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

4

Grading

Midterm 1: 35% Midterm 2: 35% Project & Assignments: 30%

Page 5: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

5

What to do immediately?

Register with the web sitehttp://dblab.usc.edu/csci485

Page 6: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

6

Database Management System (DBMS)

Database: An integrated collection of data, usually stored on secondary storage, typically describing the activities of one or more related organizations.

Database management system (DBMS): A collection of software/programs designed to assist in maintaining and utilizing large collections of data.

DBMS contains information about a particular enterprise

Used almost on a daily basis for either individual or business use.

Relational database vendors were one of the fastest growing sectors during the .COM boom!

Page 7: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

7

BEFORE DBMSIn the early days, database applications were built on top of file systems

Data

DataUser

1

User 2

Application programs

Application programs

Page 8: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

8

Data managed by DBMS

AFTER DBMS

User 1

User 2

DBMS

Application programs

Application programs

Page 9: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

9

WHY A DBMS?1. Reduced application development time2. Data independence: Application programs not dependent

on data representation and storage details3. Data sharing: data is better utilized (discovered and

reused), redundancy of data is minimized4. Data integrity and consistency: one may enforce

consistency constraints on data, e.g., number of seats sold ≤ number of seats on the plane × 1.1

5. Centralized control: DBA tunes the database to balance user's needs

6. Security: mechanisms to prevent unauthorized access. These mechanisms are based on content instead of file-oriented approach.

7. Concurrency control: avoids undesirable race conditions that arise with simultaneous access/updates to data

8. Crash recovery: ensures the integrity of data in the presence of failures

Page 10: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

10

DATABASE MANAGEMENT SYSTEMS ARCHITECTURE

User 1

User n

Conceptual schema

Conceptual schema

Physical data

DBDBMS

Page 11: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

11

Data Models

A collection of tools for describing Data Data relationships Data semantics Data constraints

Relational model Entity-Relationship data model (mainly for database

design) Object-based data models (Object-oriented and Object-

relational) Semistructured data model (XML) Other older models:

Network model Hierarchical model

Page 12: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

ChallengesConceptual

Logical

Physical

Abstraction, Inheritance, Encapsulation

Reduction to tables with minimal: data duplication, potential for data loss and update anomalies

Effective use of a DBMS, management of mismatch between tables and OO constructs, Index structures, CC & Crash recovery, Optimization techniques

Page 13: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Entity-Relationship (ER) data model Entities, Attributes, Relationships

Emp

SS#

name

address

Page 14: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Entity-Relationship (ER) data model Entities, Attributes, Relationships

Enrolledin

Emp

SS#

name

address

Health Plan name

Co-Pay

Page 15: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Entity-Relationship (ER) data model Entities, Attributes, Relationships Recursive relationships

Marriedto

Emp

SS#

name

address

Page 16: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Entity-Relationship (ER) data model Entities, Attributes, Relationships Recursive relationships

Worksfor

Emp

SS#

name

address

Page 17: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Entity-Relationship (ER) data model Entities, Attributes, Relationships Recursive relationships

Worksfor

Emp

SS#

name

address

date

Page 18: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Entity-Relationship (ER) data model Entities, Attributes, Relationships Recursive relationships Inheritance

sidstudent

name

ISA

graduateUndergrad

SpecializationGeneralization

Page 19: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Abstraction, Inheritance, Encapsulation

Exercise these concepts using in-class examples and homework assignments A library database contains a listing of authors who have written

books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects.

Page 20: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Abstraction, Inheritance, Encapsulation

Exercise these concepts using in-class examples and homework assignments A library database contains a listing of authors who have written

books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects.

Entity sets: authors, subjects, books, libraries Relationship sets: wrote, carry, indexed

Page 21: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Conceptual Data Models

Abstraction, Inheritance, Encapsulation

Exercise these concepts using in-class examples and homework assignments A library database contains a listing of authors who have written

books on various subjects (one author per book). It also contains information about libraries that carry books on various subjects.

carry

books indexwrote subjectauthorsSS#

name

title isbn

Subjectmatter

librariesaddress

Page 22: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Models

Logical

Physical

WorksforEmp

SS#nameaddress

Page 23: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Relational Data Model

Prevalent in today’s market place. Why? Performance!

Everything is a table!

Logical data design is the process of reducing an ER diagram to a collection of tables.

Page 24: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Logical Data Design

Trivial reduction: An entity set = a table A relationship set = a table

Pitfalls: Duplication of data Unintentional loss of data Data ambiguity that impacts software

design, resulting in update anomalies

Page 25: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Duplication

Worksfor

Emp

SS#

name

address

396 Shahram Seattle

400 Asoke Chicago

200 Joe New York

396 400

200 400

120 400

SS# Name Address SS# MGRSS#

Page 26: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Duplication

The SS# column is duplicated!

Worksfor

Emp

SS#

name

address

396 Shahram Seattle

400 Asoke Chicago

200 Joe New York

396 400

200 400

120 400

SS# Name Address SS# MGRSS#

Page 27: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Duplication: Solution Merge the two tables into one:

396 Shahram Seattle 400

400 Asoke Chicago NULL

200 Joe New York 400

SS# Name Address MGRSS#

Worksfor

Emp

SS#

name

address

Page 28: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Loss

Ford maintains warehouses containing different automobile parts

Records are inserted and deleted based on availability of a part at a warehouse

123 Piston Tijuana

203 Cylinder Michigan

877 Bumper Michigan

389 Seats Arizona

Part# Description Location

Page 29: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Loss (Cont…)

When a warehouse becomes empty, it is lost from the database:

Solution: utilize two different tables

123 Piston Tijuana

389 Seats Arizona

Part# Description Location

123 Piston 12

389 Seats 45

Part# Description WHID

12 Tijuana

45 Arizona

WHID Location

Page 30: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Ambiguity

Represent faculty of a department as:

A change of address for a faculty might be for the entire department. This cannot be differentiated with this table design!

Ghandeharizadeh Comp Sci SAL

Zartoshty Comp Sci SAL

Bohem Comp Sci SAL

Faculty Department Location

Page 31: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Ambiguity

Utilize two tables:

Ghandeharizadeh Comp Sci

Zartoshty Comp Sci

Jenkins Bio Medical

Bohem Comp Sci

Faculty Department

Comp Sci SAL

Sex Ed BOVARD

Bio Medical HEDCO

Department Location

Page 32: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Ambiguity (Cont…)

Employees of a bi-lingual company having different skills.

Update anomalies!

Asoke Teach Hindi

Asoke Cook French

Asoke Null German

Asoke Program English

Employee Skill Language

Page 33: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Ambiguity: Solution

Utilize two tables:

Asoke Teach

Asoke Cook

Asoke Program

Employee SkillAsoke Hindi

Asoke French

Asoke German

Asoke English

Employee Language

Page 34: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Logical Data Design

A quest to flatten objects with minimal data duplication, loss of data, and update anomalies!

William Kent, “A Simple Guide to Five Normal Forms in Relational Database Theory”, Communications of the ACM 26(2), Feb 1983, 120-125.

Page 35: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Data Models

Physical

WorksforEmp

SS#nameaddress

Logical Data Design

396 Shahram Seattle 400

400 Asoke Chicago Null

SS# Name Address MGR SS#

Page 36: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Physical Implementation

Reconstruct main memory objects for manipulation and presentation: Specify class definitions

Typically correspond to entity-sets Populate an instance of a class by issuing

SQL queries to a DBMS Update instances in memory Flush dirty instances back to DBMS

Potential use of transactions

Page 37: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Type Mismatch

A column of a row must be a primitive such as an integer, real, etc. It may NOT be an array of integers or

object pointers A property (attribute) of a class might

be of a multi-valued type, e.g., an array, a vector, etc.

Changes in software may impact the design of tables. (Management of type mismatch by the system designer.)

Page 38: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

Implementation

Set operators in the DBMS Does set A contain set B? Does value v1 appear in set A?

Aggregates in the DBMS Compute average employee salary Count the number of employees Find the oldest employee

Page 39: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

ChallengesConceptual

Logical

Physical

Abstraction, Inheritance, Encapsulation

Reduction to tables with minimal: data duplication, potential for data loss and update anomalies

Effective use of a DBMS, management of mismatch between tables and OO constructs, Index structures, CC & Crash recovery, Optimization techniques

Page 40: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

40

Entity-Relationship Model

Example of schema in the entity-relationship model

Page 41: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

41

Entity Relationship Model (Cont.)

E-R model of real world Entities (objects)

E.g. customers, accounts, bank branch Relationships between entities

E.g. Account A-101 is held by customer Johnson

Relationship set depositor associates customers with accounts

Widely used for database design Database design in E-R model usually converted

to design in the relational model (coming up next) which is used for storage and processing

Page 42: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

42

Relational Model

Example of tabular data in the relational model

Attributes

Page 43: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

43

A Sample Relational Database

Page 44: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

44

DATA INDEPENDENCE1. Physical data independence: modify the

physical scheme (data structures, e.g., B-tree or hash index) without causing application programs to be rewritten. These modifications are necessary to enhance performance and new software releases. Most relational vendors support this kind of data independence.

2. Logical data independence: Modify the conceptual scheme (e.g., add a new attribute to a table, rename an attribute) without causing application programs to be rewritten. This kind of data independence is harder to achieve.

Page 45: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

45

DATABASE LANGUAGES

There are several languages associated with a database:1. Data Definition Language (DDL): The database scheme is

specified by a set of definitions that are expressed by a special language named DDL. The result of compiling DDL statements is a set of tables stored in a file called data dictionary. This file contains meta-data (data about the data stored in the database).

2. Data Manipulation Language (DML): a language that enables users to access or manipulate data (retrieve, insert, replace, delete) as organized by a certain data model. We will look at a commercial DML named SQL. In general, there are two types of DML:

• Procedural: Describes what data is needed and how to get it: e.g., relational algebra

• Non-procedural: Describes what data is needed without specifying how to get it: e.g., tuple relational calculus

Page 46: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

46

SQL SQL: widely used non-procedural language

E.g. find the name of the customer with customer-id 192-83-7465select customer.customer-namefrom customerwhere customer.customer-id = ‘192-83-7465’

E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465

select account.balancefrom depositor, accountwhere depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-

number

Application programs generally access databases through one of Language extensions to allow embedded SQL Application program interface (e.g. ODBC/JDBC) which allow SQL

queries to be sent to a database

Page 47: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

47

Storage Management

Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.

The storage manager is responsible to the following tasks: Interaction with the file manager Efficient storing, retrieving and updating of data

Issues: Storage access File organization Indexing and hashing

Page 48: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

48

Query Processing

1. Parsing and translation2. Optimization3. Evaluation

Page 49: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

49

Query Processing (Cont.) Alternative ways of evaluating a given query

Equivalent expressions Different algorithms for each operation

Cost difference between a good and a bad way of evaluating a query can be enormous

Need to estimate the cost of operations Depends critically on statistical information

about relations which the database must maintain

Need to estimate statistics for intermediate results to compute cost of complex expressions

Page 50: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

50

Transaction Management A transaction is a collection of operations that

performs a single logical function in a database application

Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.

Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.

Page 51: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

51

SYSTEM USERS

There are several kind of users associated with a system:

Database administrator: defines schemas, storage structures and access method definitions, physical organization, authorization, integrity constraints.

Application programmers: they write a program and make it available to the end-users

Sophisticated users: they use a query language (SQL) to access the database interactively

Naive (end) users: they invoke the application programs

Page 52: 1 CSCI485 – File & Database Management Systems Bahram Zartoshty Office: SAL 346 Phone: TBA Office Hours: TTH 1:15-2:50pm Note: Parts of this lecture were

52

Overall System Structure