introduction to data management ed green senior lecturer - ist © ed green penn state university...

52
Introduction to Introduction to Data Management Data Management Ed Green Senior Lecturer - IST © Ed Green © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

Upload: baldwin-perkins

Post on 02-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

Introduction to Data Introduction to Data ManagementManagement

Ed Green

Senior Lecturer - IST

© Ed Green© Ed Green Penn State UniversityPenn State University All Rights ReservedAll Rights Reserved

Page 2: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 2

Topics

History Data structures File structures Definitions Database types Why database? Processing overview

Page 3: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 3

History

Earliest computing No files All data on cards

First files Introduction of magnetic tape storage Sequential

Disk files Sequential Random access

Databases Form of random access disk files Invented to solve a specific business problem Many types and many companies

Page 4: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 4

General Electric Very first database management system

1963 Switchgear Division

Philadelphia 3-person team

Charlie Bachman John Lyon Ted Seiter

Critical business success Identified by Computer Products Department as potential commercial

opportunity Initial marketing – 1965 as IDS

Integrated Data Store Second marketing – 1974 as IDS/II Currently supported by Bull Computers

Successor to original developer

Page 5: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 5

IBM (1)

Parallel work to GE Santa Theresa Labs Similar business problem Recognized business opportunity

Followed GE to market Circa 1966

Initial product – IMS Information Management System

Currently supported by IBM Most widely used database management system – ever!

Page 6: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 6

IBM (2) Investigative work at Santa Theresa – Dr. Ted Codd

IMS Sufficient for database storing and updating Insufficient for database retrieval Data redundancy significant inaccuracies and less than optimal

decision making New theoretical approach to database architecture

Elimination of redundancy Establishment of logical relationships (versus physical relationships Two-dimensional tabular structure Critical requirements to define database

Relational database management product Initially marketed as SQL/DS – circa 1972 Emerged as DB2 – early 1980’s Acquired and integrated Informix – late 1990’s

DB2 currently in 9th major release (Version 9) Pre-eminent DBMS currently on market

Page 7: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 7

Ingres First independent relational database

management company First query language (QUEL)

Independent; out of sync with ANSI and FIPS standards groups

Eventually adopted standard query language (SQL)

Initial implementation – late 1970’s Remains supported by Computer Associates

Page 8: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 8

Oracle

Emerged in early days of relational database management products Slightly lagged Ingres and IBM

Aggressive management and marketing stance Customer-oriented Visionary leadership (Larry Ellison)

11th major release (Oracle 11g) to general availability – 2007 Current version Oracle 11g Version 2

Market leader – installed licenses

Page 9: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 9

Sybase

Introduced in mid-1970’s Alternative architecture to DB2, Ingres, and Oracle Intended to support real-time processing needs

Visionary leaders (Bob Epstein and Mark Hoffman) Database integration Distributed and replicated databases

DBMS products in 8th major release (Version 15) 3rd among relational database management

installations

Page 10: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 10

Microsoft DBMS products

Access – personal database management product SQL Server – for shared database management

Not an original developer Contracted (from Sybase) SQL Server based on Sybase DBMS

SQL Server Current release – SQL Server 2008 Strategic product in Microsoft business strategy

Windows 2000 and beyond

Page 11: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 11

CCA (Computer Corporation of America)

Initially, small research lab at MIT Federal government contract in early 1960’s

Manage intelligence data – satellite, signal, electronic Very large volumes of data Significant data correlation Rapid retrieval

Non-standard queries Productized

In mid-1960’s as Model 104 Later version in late-1970’s as Model 204 Obsolete as relational database management emerged

Page 12: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 12

Cullinet (Cullinane Database Systems) First independent database management system software

company Founded mid-1970’s Compete with IBM for DBMS software marketplace

Product – IDMS Integrated Database Management System Based on IDS

Modified to run on IBM OS/360 architecture Leading DBMS for IBM mainframe environment

Failed Inability to recognize emergence of relational database

management Currently supported by Computer Associates in a

“maintenance mode”

Page 13: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 13

And, yes, there are others

Access Adabas Progress Empress mySQL

Page 14: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 14

Data Structures

Page 15: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 15

Basic Data Structures

Array – two-dimensional construct consisting of rows and columns (a/k/a table) Homogeneous – all entries of same type Heterogeneous – entries of different types

List – a collection of entries arranged sequentially Queue – a list arranged in such a way that the first item

placed in the list is the first item to be removed (FIFO algorithm)

Stack – a list arranged in such a way that the last item placed on the list is the first item to be removed (LIFO algorithm)

Tree – a collection of entries arranged in such a way as to have a clearly defined hierarchy

Ring – a collection of entries arranged in such a way as to continuously point to the next entry

Page 16: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 16

Data Structures Representation

Array Array

1.1. 2.2. 3.3. 4.4. 5.5.

List List

1.1. 2.2. 3.3. 4.4. 5.5.

1.1. 2.2. 3.3. 4.4. 5.5.

Queue Queue Stack Stack

Hierarchy Hierarchy Ring Ring

Page 17: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 17

Storage based on data structures - arrays

yy

(x,y)(x,y)Each cell has an (Each cell has an (x,yx,y) address) address

A one-dimensional array. A one-dimensional array. Each cell has an (Each cell has an (x,1x,1) address) address

A two-dimensional array. A two-dimensional array.

Expansion in the y-direction asExpansion in the y-direction asnew rows are added.new rows are added.

xx

Page 18: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 18

Arrays in memory

yy

(x,y)(x,y)

xx

. . .. . .

Row 1Row 1 Row 2Row 2 Row 3Row 3 etcetc

An An arrayarray data structure data structure

exists in memory asexists in memory as

Page 19: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 19

Storage based on data structures – queues and stacks Queue – list structure

based on a “first in, first out” algorithm

Stack – list structure based on a “last in, first out algorithm

1.1. 2.2. 3.3. 4.4. 5.5.

New item is added after the last entryNew item is added after the last entry

Item is removed from the topItem is removed from the top

Ag

ed i

tem

s m

ove

to

to

pA

ged

ite

ms

mo

ve t

o t

op

of

list

fo

r se

rvic

eo

f li

st f

or

serv

ice

1.1. 2.2. 3.3. 4.4. 5.5.

Ag

ed i

tem

s m

ove

to

A

ged

ite

ms

mo

ve t

o

bo

tto

m o

f li

st f

or

serv

ice

bo

tto

m o

f li

st f

or

serv

ice

New item is added above

New item is added above

the first entry

the first entry

Item is removed from the topItem is removed from the top

Ag

ed i

tem

s m

ove

to

to

pA

ged

ite

ms

mo

ve t

o t

op

of

list

fo

r se

rvic

eo

f li

st f

or

serv

ice

Page 20: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 20

Linked lists as a data structure concept Generalized name applicable to

Hierarchies Rings

Utilizes a pointer to establish the order of the list

Pointer – address Identifies next in sequence

Page 21: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 21

Storage based on data structures - hierarchies

Hierarchy – unidirectional data structure based on higher-order precedence

Characterized by the concept of persistence Attributes of higher level

elements are retained by lower level dependents

Explicit dependency

RootRoot

Lea

ves

Lea

ves

datadataaddraddrsubordinate subordinate addressesaddresses

Page 22: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 22

Storage based on data structures - rings Ring – elliptical data

structure representation May or may not include

concepts of precedence and persistence

May be either unidirectional or bidirectional

datadataaddraddr nextnextaddraddr

priorprioraddraddr

headheadaddraddr

Head Head (if headed)(if headed)

always presentalways present

pre

sent

onl

y

pres

ent o

nly

if b

idire

ctio

nal

if b

idire

ctio

nal

pre

sent

onl

y

pres

ent o

nly

if h

eade

d

if h

eade

d

Page 23: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 23

Storage with linked lists

CASE 1: CASE 1: Insert a newInsert a newrecord betweenrecord betweenthe head and the head and first recordfirst record

CASE 2: CASE 2: Insert a newInsert a newrecord betweenrecord betweenthe head and the head and first record with first record with back pointersback pointers

Page 24: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 24

Storage with linked lists

datadataaddraddrsubordinate subordinate addressesaddresses

CASE 1: CASE 1: Insert a newInsert a newrecord betweenrecord betweenthe root and the root and first recordfirst record

Page 25: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 25

Custom data types

Specific items of data Support problem solution User-defined Represents a processing variable

Programmatic structures aligned with data structure types

Page 26: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 26

File Structures

Page 27: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 27

Files and File Structures

Sequential ISAM – Indexed Sequential Access

Method IDAM – Indexed Direct Access Method HISAM – Hierarchical Indexed Sequential

Access Method HIDAM – Hierarchical Indexed Direct

Access Method

Page 28: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 28

File

Files and Databases

Database – provides the “template” thatdescribes the organization within the file

Databases exist within files

Page 29: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 29

Definitions Data – a set of facts

Structured or not Ordered or not Organized or not Valuable or not

Information - organized presentation of facts that support analysis and/or decision making

Database – structured collection of (enterprise) data used for analysis and/or decision making that is required to be maintained

Database Management System – the (executive) software that manages the interface between a user’s application and the database

File Management System – that component of the operating system that manages the programmatic interface to a computer’s file system

Page 30: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 30

Components and Architectures

Page 31: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 31

Database System Components

DatabaseApplication

DatabaseManagement

System Database

Page 32: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 32

Database System Functions

DatabaseApplication

DatabaseManagement

System Database

SQL

DDLDMLDCL

•Create/process forms•Create/transmit queries•Create/process reports•Execute application logic•Control application

•Create database•Create tables•Create supporting structures•Read data (stored in database)•Update data (stored in database)•Maintain database structure•Enforce rules•Control concurrency•Provide security•Perform backup and recovery

Page 33: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 33

ApplicationMetadata

IndexesOverhead data

Database Components

Metadata

User data

Page 34: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 34

Data Dictionary

Metadata – or, data about the data Comprehensive information source about enterprise

data Documents data characteristics and employment

Data type Data size Allowable values Where used Authorizations Relationships

Ancillary structure May or may not be DBMS provided

Page 35: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 35

Database Models

Network Hierarchical Inverted List Relational Object Object Relational

Page 36: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 36

Network Model Characterized by establish a network of data components

Master Detail

Physical addresses and pointers Next Prior Master

Linkages “Forward” “Backward”

Retrieval Random based on “primary” key Direct based on a known value

Page 37: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 37

Network Model – basic structure

Current Address Data

Next Address

Prior Address

Master Address

Page 38: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 38

Hierarchical Model

Characterized by establish a network of data components Root Leaf

Physical addresses and pointers Linkages

Downward from root Retrieval

Sequential search from root Random based on “primary” key value Direct based on a known value

Page 39: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 39

Hierarchical Model

Current Address Data

Next Address(es)

Root

Leaves

Page 40: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 40

Inverted List Model

RuntimeRuntimeDataData

DictionaryDictionary

DataData• sets of filessets of files containing containing records records

FineFineIndex Index

CoarseCoarseIndex Index

•Describes data itemsDescribes data items•Identifies locationIdentifies location - Implied relationships - Implied relationships

•Identifies records with Identifies records with specified key values specified key values•Few records with a givenFew records with a given value value

•Identifies records with Identifies records with specified key values specified key values•Many records with a givenMany records with a given value value - Bit map - Bit map

Page 41: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 41

Relational Model (1)

Tabular format Columns Rows

Cell Intersection of a row and

column Relationships based on

data values Points to specific row(s)

Page 42: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 42

Relational Model (2)

Table 1

Table 2

Index Index

Relationship

Identifies row with specific key value(s) Identifies row with

specific key value(s)

Page 43: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 43

Relational Model

Presentation of data

Preparation and processing of data

Enforcement of business and integrity rules

Management of access to data

Databases/DBAPrograms

Page 44: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 44

Object Model

Presentation of data

Preparation and processing of data

Enforcement of business and integrity rules

Management of access to data

Databases/DBAPrograms

Page 45: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 45

Relational versus Object

Purchase Order

Employee

MaterialsContract

In a relational database, the datais stored in several structures andrelated to the business object viasoftware

Purchase Order

In an object database, the data isstored as the business object itself,encapsulated from the software so thataccess is via an interface language

Page 46: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 46

Object-Relational

Hybrid Relational model Object model

Evolved from competing camps Technology state-of-the-practice Marketplace realities

Page 47: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 47

Why database?

Storage management and efficiency – cost containment

Retrieval speed Information accuracy Redundancy reduction

Page 48: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 48

Database Processing Overview

Page 49: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 49

DBMS Interfacing

Data

base S

ch

em

a

I/O

Database

Physical I/O performed

by OS

Schema provides view of

enterprise data Application Program Work Area

Buffer Pool DBMS

Work Area

DBMS Executive

Operating System

Page 50: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 50

Retrieving a Logical Record from Disk Storage

Application requires data

stored on database

DBMS is re-engagedby OS

Application issues a data retrieval request

to DBMS

DBMS determines data exists; identifies

location

DBMS issues I/O interrupt to OS

OS determines physicallocation of data

DB buffer within DBMSis assigned

OS issues “get I/O”instruction

Data retrieved fromdisk & transferred to

DB buffer via OS

Application programnotified that data is

available

Retrieval

Page 51: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 51

Storing a New Logical Record

Application needs to store data

database

Application issues data storage request to

D BMS

Application issues a data retrieval request

to DBMS

DBMS determines data does not exist

DBMS creates “template” of logical

record

DBMS returns controlto application program

DB buffer within DBMSis assigned

Application programbuilds new logical

record

Application indicatesnew logical record

can be written

DBMS determines appropriate storage

location

Data is transferredfrom buffer; physically

written on disk

DBMS issues I/O interrupt to OS

OS determines physicallocation of data

OS issues “put I/O”instruction

DBMS is re-engagedby OS

Application programnotified that data has

been stored

Page 52: Introduction to Data Management Ed Green Senior Lecturer - IST © Ed Green Penn State University Penn State University All Rights Reserved All Rights Reserved

04/20/23Introduction to Data

management 52

Changing Data in an Existing Logical Record

Changes are madeusing image in bufferRetrieval

Application determines thatdata in a record alreadyexisting on the database isto have some of its data changed . . . Executes the retrieval process

Application determinesreadiness to store this

data

Application issues data storage request to

D BMS

DBMS re-validates appropriate storage

location

Data is transferredfrom buffer; physically

written on disk

DBMS issues I/O interrupt to OS

OS determines physicallocation of data

OS issues “put I/O”instruction

DBMS is re-engagedby OS

Application programnotified that data has

been stored