database - columbia universityxie/comsw1001/slides/slides_week7.pdf · database systems vs file...

36
DATABASE COMS W1001 Introduction to Information Science Boyi Xie 1

Upload: trandung

Post on 10-Apr-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

DATABASE COMS W1001 Introduction to Information Science Boyi Xie

1

Page 2: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Today’s Topics • Database System Applications • Database Systems versus File Systems • View of Data • Database Architecture • E-R Model • Relational Model •  Fundamental Operations • A Sample Relational Database

2

Page 3: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Database System Applications •  Banking

•  customer information, accounts, transactions

•  Airlines •  reservation, schedule

•  Universities •  student information, course registration

•  Telecommunication •  records of calls made, monthly bills

•  Finance •  purchases of financial instruments such as stocks

•  Sales •  product, customer, purchase information

•  Manufacturing •  inventories, orders, supply chain management

•  Human resources •  information about employees, salaries, payroll

3

Page 4: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency • Difficulty in accessing data • Data isolation

4

Page 5: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

View of Data • Data Abstraction

•  Physical level – the lowest level of abstraction describes how the data are actually stored.

•  Logical level – the next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data.

•  View level – the highest level of abstraction describes only part of the entire database.

•  Instances and Schemas •  Instance – the collection of information stored in the database at a

particular moment is called an instance of the database. •  Schema – the overall design of the database is called the database

schema. Schemas are changed infrequently.

5

Page 6: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Database Languages • Data-Definition Language

•  Specify the database schema •  Table definition and creation •  Metadata – data about data •  E.g. CREATE TABLE

• Data-Manipulation Language •  Retrieval of information stored in the database •  Insertion of new information into database •  Deletion of information from the database •  Modification of information stored in the database •  E.g. SELECT

6

Page 7: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Database Users and Administrators • People work with a database can be categorized as

database users or database administrators

7

Page 8: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Transaction Management •  Often, several operations on the database form a single logical unit of

work. •  E.g. a funds transfer, one account is debited and another account is

credited. Either both the credit and debit occur, or that neither occur.

•  A transaction is a unit of program execution that accesses and possibly updates various data items •  Atomicity – either all operations of the transaction are reflected properly in

the database, or none are •  Consistency – the correctness of the transaction and the preservation of

the consistency of the database •  Isolation – even though multiple transactions may execute concurrently,

each transaction is unaware of other transactions •  Durability – after a transaction completes successfully, the changes it has

made to the database persist, even if there are system failures

8

Page 9: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Database System Structure • Storage Manager

•  Authorization and integrity manager •  Tests for the satisfaction of integrity constraints and checks the authority

of users to access data •  Transaction manager

•  Ensures that the database remains in a consistent (correct) state despite system failures, and the concurrent transaction executions proceed without conflicting

•  File manager •  Manages the allocation of space on disk storage and the data structures

used to represent information stored on disk •  Buffer manager

•  Responsible for fetching data from disk storage into main memory, and deciding what data to cache in main memory

9

Page 10: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Database System Structure •  The Query Processor

•  DDL interpreter •  Interpret DDL statements and records the definitions in the data

dictionary •  DML compiler

•  Translates DML statements in a query language into an evaluation plan consisting of low-level instructions that the query evaluation engine understands

•  Query evaluation engine •  Executes low-level instructions generated by the DML compiler

10

Page 11: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

11 naive users

(tellers, agents, web users)

query processor

storage manager

disk storageindices

statistical datadata

data dictionary

applicationprogrammers

applicationinterfaces

applicationprogram

object code

compiler andlinker

buffer manager file manager authorizationand integrity

manager

transactionmanager

DML compiler and organizer

query evaluationengine

DML queries DDL interpreter

applicationprograms

querytools

administrationtools

sophisticatedusers

(analysts)database

administrators

use write use use

Page 12: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Application Architectures •  Two-tier architecture •  Three-tier architecture

12

Page 13: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Data Model • Entity-Relationship (E-R) model

•  The E-R model perceives the real world as consisting of basic objects, called entities, and relationships among these objects.

•  It is very useful in mapping the meanings and interactions of real-world enterprises onto a conceptual schema.

•  Many database design tools draw on concepts from the E-R model.

• Relational Model •  The relational model is today the primary data model for

commercial data-processing applications. •  A relational database consists of a collection of tables •  A row in a table represents a relationship among a set of values

13

Database designers often formulate database schema design by first modeling data at a high level, using the E-R model, and then translating it into the relational model.

Page 14: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Entity-Relationship Model •  The E-R model employs three basic notions

•  Entity sets •  An entity is a “thing” or “object” in the real world that is distinguishable from

all other objects •  An entity set is a set of entities of the same type that share the same

properties, or attributes •  Relationship sets

•  A relationship is an association among several entities •  The function that an entity plays in a relationship is called that entity’s role •  A relationship may also have attributes called descriptive attributes

•  Attributes •  Simple and composite attributes

•  E.g. person name is a composite attribute consisting of first-name, middle-name, and last-name

•  Single-valued and multivalued attributes •  E.g. phone number can be a multivalued attribute because a person may have

zero, one, or several phone numbers •  Derived attributes

•  E.g. age is a derived attribute, which can be computed from date-of-birth and the current date. In this case, date-of-birth may be referred to as a base attribute

14

Page 15: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Symbols used in the E-R notation

15

E

E E

R

R R A

A

A ---

A A

A

ISA

entity set

weak entity set

relationship set

relationship set for weak entity set

Is-A (specialization or generalization)

attribute

multivalued attribute

derived attribute

primary key

discriminating attribute of weak entity set

Page 16: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

E-R diagram with composite, multivalued, and derived attributes

16

customer

customer_id

phone_number

date_of_birth

age

name first_name

middel_name

last_name

address

street

city

state

zip_code

Page 17: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

E-R diagram with attributes attached to a relationship set

17

deposit customer

age

name address

phone_number

customer_id

date_of_birth

account

date

amount

account_number

balance

Page 18: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

E-R diagram with a weak entity set

18

loan

loan_number

amount

loan_payment payment payment

payment_number ------------------------

amount date

Page 19: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

E-R diagram with a ternary relationship

19

employee

name

address

phone_number

employee_id

branch

title

branch_name

branch_city

works_on

job

level

Page 20: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

E-R diagram with specialization and generalization

20

ISA

ISA

name

person

address

salary

employee customer

credit_rating

office_number

officer

station_number

teller

hours_worked

secretary

Page 21: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

E-R diagram for a banking enterprise

21

loan

loan_number

amount

loan_ payment payment payment

payment_number ------------------------

amount date

customer

phone

customer_id

amount

account_number balance

account

ISA

name address

borrow

deposit

saving checking

date

interest_rate overdraft_amount

employee

address

phone

employee_id

customer_ banker

works_for manager

worker

Page 22: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

E-R diagram for an e-commerce enterprise

22

author

author_id name

publisher

publisher_id

name

address

written_by published_by

book

year

title

price ISBN

contains

number

stocks

number

warehouse

shopping_ basket

basket_id

basket_of

customer

name

email

address

code

phone address

Page 23: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Relational Model •  The relational mode is today the primary data model for

commercial data-processing applications • A relational database consists of a collection of tables,

each of which is assigned a unique name •  The headers (or columns) of a table are attributes • A row in a table is a tuple of the attributes, and represents

a record • We can represent Entity-Relationship diagrams by tables

in relational model

23

Page 24: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Fundamental Operations •  The Project Operation (pi π)

•  πloan-number,amount(loan)

•  The Select Operation (sigma σ) •  σamount>1200(loan) •  σbranch-name=“Perryridge”(loan) •  σbranch-name=“Perryridge”∧amount>1200(loan)

•  Composition of Relational Operation •  πcustomer-name(σcustomer-city=“Harrison”(customer))

•  The Rename Operation (rho ρ) •  ρx(A1,A2,…,An)(E) •  returns the result of expression E under the name x, with the attributes

renamed to A1,A2,…,An.

•  Join Operation ( ) •  Combine rows from two or more tables, based on a common field between

them.

24

Page 25: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

A Sample Relational Database •  This sample database will address the following topics

•  E-R diagram to schema diagrams for relational database •  Create and delete a database (schema) •  Create and delete a table •  Data types •  Insert, update and delete records (rows in a table) •  Queries

•  Project operation •  Select operation •  Rename operation •  Join operation •  Others

25

Page 26: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

E-R Diagram

26

Courses

Houses

Students Professors

head_of

teaches enrolls_in

house_ assignment

last_name

first_name

student_id

class_year

exam

grade

classroom

course_id

name description

required_ textbook

required_ equipment

last_name

first_name

professor_id

house_name

mascot motto

house_ghost

Hogwarts E-R Diagram

Page 27: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Schema Diagram

27

student_id

first_name last_name house_name class_year

students

professor_id

first_name last_name

professors house_name

mascot motto house_ghost

houses

course_id

name description required_textbook required_equipment

courses

professor_id course_id classroom

teaches

student_id course_id exam grade

enrolls_in

professor_id house_name

head_of

Hogwarts Schema Diagram

Page 28: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Creation and Deletion of a Database

• Create a database (schema)

• Delete a database (schema)

• Show Databases

• Show Tables

28

SHOW TABLES;

SHOW DATABASES;

CREATE DATABASE hogwarts;

DROP DATABASE hogwarts;

Page 29: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Create and Delete a Table, Data Types • Create a table

• Delete all records in a table

• Delete a table

29

CREATE TABLE students ( student_id INT NOT NULL

AUTO_INCREMENT, first_name VARCHAR(255), last_name VARCHAR(255), house_name VARCHAR(100), class_year INT, PRIMARY KEY (student_id)

);

DROP TABLE students;

TRUNCATE TABLE students;

More about data types: http://dev.mysql.com/doc/refman/5.6/en/data-types.html

Page 30: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Insert, Update, and Delete a Record •  Insert a record

• Update a record

• Delete a record

30

INSERT INTO enrolls_in (student_id, course_id, exam, grade) VALUES ("1", "1", "86", "B");

UPDATE enrolls_in SET grade = 'A' WHERE student_id = '1' AND course_id = '5';

DELETE FROM enrolls_in WHERE student_id = '9' AND course_id = '5';

Page 31: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Basic Queries • Select all records in a table

• Project partial attributes

• Create View

31

SELECT * FROM students;

SELECT first_name, last_name, house_name FROM students;

CREATE VIEW house_assignments AS SELECT first_name, last_name, house_name FROM students;

Page 32: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Rename Tables and Attributes • Rename tables

• Rename attributes

32

SELECT * FROM head_of AS h, professors AS p WHERE h.professor_id = p.professor_id;

SELECT p.first_name AS "First Name", p.last_name AS "Last Name", h.house_name AS "House" FROM head_of AS h, professors AS p WHERE h.professor_id = p.professor_id;

Page 33: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Join, Natural Join, Outer Join •  Join

•  Join based on conditions

• Natural join • Outer join

•  Left join

•  Right join

33

SELECT * FROM head_of JOIN professors;

SELECT * FROM head_of JOIN professors ON head_of.professor_id = professors.professor_id;

SELECT * FROM head_of NATURAL JOIN professors;

SELECT * FROM professors LEFT JOIN head_of ON head_of.professor_id = professors.professor_id;

SELECT * FROM head_of RIGHT JOIN professors ON head_of.professor_id = professors.professor_id;

Page 34: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Group By, Order By • Group By

• Order By

34

SELECT e.student_id, s.first_name, s.last_name, COUNT(*) FROM enrolls_in AS e, students AS s WHERE e.student_id = s.student_id GROUP BY e.student_id;

SELECT s.first_name, s.last_name, AVG(e.exam) FROM enrolls_in AS e, students AS s, courses AS c WHERE e.course_id = c.course_id AND e.student_id = s.student_id GROUP BY s.first_name, s.last_name ORDER BY AVG(e.exam) DESC;

Page 35: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

Other Useful Commands • DISTINCT

• COUNT

• AVG

35

SELECT DISTINCT e.student_id, s.first_name, s.last_name FROM enrolls_in AS e, students AS s WHERE e.student_id = s.student_id;

SELECT e.student_id, s.first_name, s.last_name, COUNT(*) FROM enrolls_in AS e, students AS s WHERE e.student_id = s.student_id GROUP BY e.student_id;

SELECT s.first_name, s.last_name, AVG(e.exam) FROM enrolls_in AS e, students AS s, courses AS c WHERE e.course_id = c.course_id AND e.student_id = s.student_id GROUP BY s.first_name, s.last_name ORDER BY AVG(e.exam) DESC;

Page 36: DATABASE - Columbia Universityxie/comsw1001/slides/slides_week7.pdf · Database Systems vs File Systems • Using Files for data management • Data redundancy and inconsistency •

References & Photo Credits • Abraham Silberschatz, Henry F. Korth, S. Sudarshan.

Database System Concepts. McGraw-Hill. • Brookshear, J. Glenn (2011-04-13). Computer Science:

An Overview (11th Edition). Prentice Hall. • Harry Potter Wiki. http://harrypotter.wikia.com

36