introduction to data management ed green senior lecturer - ist © ed green penn state university...
TRANSCRIPT
Introduction to Data Introduction to Data ManagementManagement
Ed Green
Senior Lecturer - IST
© Ed Green© Ed Green Penn State UniversityPenn State University All Rights ReservedAll Rights Reserved
04/20/23Introduction to Data
management 2
Topics
History Data structures File structures Definitions Database types Why database? Processing overview
04/20/23Introduction to Data
management 3
History
Earliest computing No files All data on cards
First files Introduction of magnetic tape storage Sequential
Disk files Sequential Random access
Databases Form of random access disk files Invented to solve a specific business problem Many types and many companies
04/20/23Introduction to Data
management 4
General Electric Very first database management system
1963 Switchgear Division
Philadelphia 3-person team
Charlie Bachman John Lyon Ted Seiter
Critical business success Identified by Computer Products Department as potential commercial
opportunity Initial marketing – 1965 as IDS
Integrated Data Store Second marketing – 1974 as IDS/II Currently supported by Bull Computers
Successor to original developer
04/20/23Introduction to Data
management 5
IBM (1)
Parallel work to GE Santa Theresa Labs Similar business problem Recognized business opportunity
Followed GE to market Circa 1966
Initial product – IMS Information Management System
Currently supported by IBM Most widely used database management system – ever!
04/20/23Introduction to Data
management 6
IBM (2) Investigative work at Santa Theresa – Dr. Ted Codd
IMS Sufficient for database storing and updating Insufficient for database retrieval Data redundancy significant inaccuracies and less than optimal
decision making New theoretical approach to database architecture
Elimination of redundancy Establishment of logical relationships (versus physical relationships Two-dimensional tabular structure Critical requirements to define database
Relational database management product Initially marketed as SQL/DS – circa 1972 Emerged as DB2 – early 1980’s Acquired and integrated Informix – late 1990’s
DB2 currently in 9th major release (Version 9) Pre-eminent DBMS currently on market
04/20/23Introduction to Data
management 7
Ingres First independent relational database
management company First query language (QUEL)
Independent; out of sync with ANSI and FIPS standards groups
Eventually adopted standard query language (SQL)
Initial implementation – late 1970’s Remains supported by Computer Associates
04/20/23Introduction to Data
management 8
Oracle
Emerged in early days of relational database management products Slightly lagged Ingres and IBM
Aggressive management and marketing stance Customer-oriented Visionary leadership (Larry Ellison)
11th major release (Oracle 11g) to general availability – 2007 Current version Oracle 11g Version 2
Market leader – installed licenses
04/20/23Introduction to Data
management 9
Sybase
Introduced in mid-1970’s Alternative architecture to DB2, Ingres, and Oracle Intended to support real-time processing needs
Visionary leaders (Bob Epstein and Mark Hoffman) Database integration Distributed and replicated databases
DBMS products in 8th major release (Version 15) 3rd among relational database management
installations
04/20/23Introduction to Data
management 10
Microsoft DBMS products
Access – personal database management product SQL Server – for shared database management
Not an original developer Contracted (from Sybase) SQL Server based on Sybase DBMS
SQL Server Current release – SQL Server 2008 Strategic product in Microsoft business strategy
Windows 2000 and beyond
04/20/23Introduction to Data
management 11
CCA (Computer Corporation of America)
Initially, small research lab at MIT Federal government contract in early 1960’s
Manage intelligence data – satellite, signal, electronic Very large volumes of data Significant data correlation Rapid retrieval
Non-standard queries Productized
In mid-1960’s as Model 104 Later version in late-1970’s as Model 204 Obsolete as relational database management emerged
04/20/23Introduction to Data
management 12
Cullinet (Cullinane Database Systems) First independent database management system software
company Founded mid-1970’s Compete with IBM for DBMS software marketplace
Product – IDMS Integrated Database Management System Based on IDS
Modified to run on IBM OS/360 architecture Leading DBMS for IBM mainframe environment
Failed Inability to recognize emergence of relational database
management Currently supported by Computer Associates in a
“maintenance mode”
04/20/23Introduction to Data
management 13
And, yes, there are others
Access Adabas Progress Empress mySQL
04/20/23Introduction to Data
management 14
Data Structures
04/20/23Introduction to Data
management 15
Basic Data Structures
Array – two-dimensional construct consisting of rows and columns (a/k/a table) Homogeneous – all entries of same type Heterogeneous – entries of different types
List – a collection of entries arranged sequentially Queue – a list arranged in such a way that the first item
placed in the list is the first item to be removed (FIFO algorithm)
Stack – a list arranged in such a way that the last item placed on the list is the first item to be removed (LIFO algorithm)
Tree – a collection of entries arranged in such a way as to have a clearly defined hierarchy
Ring – a collection of entries arranged in such a way as to continuously point to the next entry
04/20/23Introduction to Data
management 16
Data Structures Representation
Array Array
1.1. 2.2. 3.3. 4.4. 5.5.
List List
1.1. 2.2. 3.3. 4.4. 5.5.
1.1. 2.2. 3.3. 4.4. 5.5.
Queue Queue Stack Stack
Hierarchy Hierarchy Ring Ring
04/20/23Introduction to Data
management 17
Storage based on data structures - arrays
yy
(x,y)(x,y)Each cell has an (Each cell has an (x,yx,y) address) address
A one-dimensional array. A one-dimensional array. Each cell has an (Each cell has an (x,1x,1) address) address
A two-dimensional array. A two-dimensional array.
Expansion in the y-direction asExpansion in the y-direction asnew rows are added.new rows are added.
xx
04/20/23Introduction to Data
management 18
Arrays in memory
yy
(x,y)(x,y)
xx
. . .. . .
Row 1Row 1 Row 2Row 2 Row 3Row 3 etcetc
An An arrayarray data structure data structure
exists in memory asexists in memory as
04/20/23Introduction to Data
management 19
Storage based on data structures – queues and stacks Queue – list structure
based on a “first in, first out” algorithm
Stack – list structure based on a “last in, first out algorithm
1.1. 2.2. 3.3. 4.4. 5.5.
New item is added after the last entryNew item is added after the last entry
Item is removed from the topItem is removed from the top
Ag
ed i
tem
s m
ove
to
to
pA
ged
ite
ms
mo
ve t
o t
op
of
list
fo
r se
rvic
eo
f li
st f
or
serv
ice
1.1. 2.2. 3.3. 4.4. 5.5.
Ag
ed i
tem
s m
ove
to
A
ged
ite
ms
mo
ve t
o
bo
tto
m o
f li
st f
or
serv
ice
bo
tto
m o
f li
st f
or
serv
ice
New item is added above
New item is added above
the first entry
the first entry
Item is removed from the topItem is removed from the top
Ag
ed i
tem
s m
ove
to
to
pA
ged
ite
ms
mo
ve t
o t
op
of
list
fo
r se
rvic
eo
f li
st f
or
serv
ice
04/20/23Introduction to Data
management 20
Linked lists as a data structure concept Generalized name applicable to
Hierarchies Rings
Utilizes a pointer to establish the order of the list
Pointer – address Identifies next in sequence
04/20/23Introduction to Data
management 21
Storage based on data structures - hierarchies
Hierarchy – unidirectional data structure based on higher-order precedence
Characterized by the concept of persistence Attributes of higher level
elements are retained by lower level dependents
Explicit dependency
RootRoot
Lea
ves
Lea
ves
datadataaddraddrsubordinate subordinate addressesaddresses
04/20/23Introduction to Data
management 22
Storage based on data structures - rings Ring – elliptical data
structure representation May or may not include
concepts of precedence and persistence
May be either unidirectional or bidirectional
datadataaddraddr nextnextaddraddr
priorprioraddraddr
headheadaddraddr
Head Head (if headed)(if headed)
always presentalways present
pre
sent
onl
y
pres
ent o
nly
if b
idire
ctio
nal
if b
idire
ctio
nal
pre
sent
onl
y
pres
ent o
nly
if h
eade
d
if h
eade
d
04/20/23Introduction to Data
management 23
Storage with linked lists
CASE 1: CASE 1: Insert a newInsert a newrecord betweenrecord betweenthe head and the head and first recordfirst record
CASE 2: CASE 2: Insert a newInsert a newrecord betweenrecord betweenthe head and the head and first record with first record with back pointersback pointers
04/20/23Introduction to Data
management 24
Storage with linked lists
datadataaddraddrsubordinate subordinate addressesaddresses
CASE 1: CASE 1: Insert a newInsert a newrecord betweenrecord betweenthe root and the root and first recordfirst record
04/20/23Introduction to Data
management 25
Custom data types
Specific items of data Support problem solution User-defined Represents a processing variable
Programmatic structures aligned with data structure types
04/20/23Introduction to Data
management 26
File Structures
04/20/23Introduction to Data
management 27
Files and File Structures
Sequential ISAM – Indexed Sequential Access
Method IDAM – Indexed Direct Access Method HISAM – Hierarchical Indexed Sequential
Access Method HIDAM – Hierarchical Indexed Direct
Access Method
04/20/23Introduction to Data
management 28
File
Files and Databases
Database – provides the “template” thatdescribes the organization within the file
Databases exist within files
04/20/23Introduction to Data
management 29
Definitions Data – a set of facts
Structured or not Ordered or not Organized or not Valuable or not
Information - organized presentation of facts that support analysis and/or decision making
Database – structured collection of (enterprise) data used for analysis and/or decision making that is required to be maintained
Database Management System – the (executive) software that manages the interface between a user’s application and the database
File Management System – that component of the operating system that manages the programmatic interface to a computer’s file system
04/20/23Introduction to Data
management 30
Components and Architectures
04/20/23Introduction to Data
management 31
Database System Components
DatabaseApplication
DatabaseManagement
System Database
04/20/23Introduction to Data
management 32
Database System Functions
DatabaseApplication
DatabaseManagement
System Database
SQL
DDLDMLDCL
•Create/process forms•Create/transmit queries•Create/process reports•Execute application logic•Control application
•Create database•Create tables•Create supporting structures•Read data (stored in database)•Update data (stored in database)•Maintain database structure•Enforce rules•Control concurrency•Provide security•Perform backup and recovery
04/20/23Introduction to Data
management 33
ApplicationMetadata
IndexesOverhead data
Database Components
Metadata
User data
04/20/23Introduction to Data
management 34
Data Dictionary
Metadata – or, data about the data Comprehensive information source about enterprise
data Documents data characteristics and employment
Data type Data size Allowable values Where used Authorizations Relationships
Ancillary structure May or may not be DBMS provided
04/20/23Introduction to Data
management 35
Database Models
Network Hierarchical Inverted List Relational Object Object Relational
04/20/23Introduction to Data
management 36
Network Model Characterized by establish a network of data components
Master Detail
Physical addresses and pointers Next Prior Master
Linkages “Forward” “Backward”
Retrieval Random based on “primary” key Direct based on a known value
04/20/23Introduction to Data
management 37
Network Model – basic structure
Current Address Data
Next Address
Prior Address
Master Address
04/20/23Introduction to Data
management 38
Hierarchical Model
Characterized by establish a network of data components Root Leaf
Physical addresses and pointers Linkages
Downward from root Retrieval
Sequential search from root Random based on “primary” key value Direct based on a known value
04/20/23Introduction to Data
management 39
Hierarchical Model
Current Address Data
Next Address(es)
Root
Leaves
04/20/23Introduction to Data
management 40
Inverted List Model
RuntimeRuntimeDataData
DictionaryDictionary
DataData• sets of filessets of files containing containing records records
FineFineIndex Index
CoarseCoarseIndex Index
•Describes data itemsDescribes data items•Identifies locationIdentifies location - Implied relationships - Implied relationships
•Identifies records with Identifies records with specified key values specified key values•Few records with a givenFew records with a given value value
•Identifies records with Identifies records with specified key values specified key values•Many records with a givenMany records with a given value value - Bit map - Bit map
04/20/23Introduction to Data
management 41
Relational Model (1)
Tabular format Columns Rows
Cell Intersection of a row and
column Relationships based on
data values Points to specific row(s)
04/20/23Introduction to Data
management 42
Relational Model (2)
Table 1
Table 2
Index Index
Relationship
Identifies row with specific key value(s) Identifies row with
specific key value(s)
04/20/23Introduction to Data
management 43
Relational Model
Presentation of data
Preparation and processing of data
Enforcement of business and integrity rules
Management of access to data
Databases/DBAPrograms
04/20/23Introduction to Data
management 44
Object Model
Presentation of data
Preparation and processing of data
Enforcement of business and integrity rules
Management of access to data
Databases/DBAPrograms
04/20/23Introduction to Data
management 45
Relational versus Object
Purchase Order
Employee
MaterialsContract
In a relational database, the datais stored in several structures andrelated to the business object viasoftware
Purchase Order
In an object database, the data isstored as the business object itself,encapsulated from the software so thataccess is via an interface language
04/20/23Introduction to Data
management 46
Object-Relational
Hybrid Relational model Object model
Evolved from competing camps Technology state-of-the-practice Marketplace realities
04/20/23Introduction to Data
management 47
Why database?
Storage management and efficiency – cost containment
Retrieval speed Information accuracy Redundancy reduction
04/20/23Introduction to Data
management 48
Database Processing Overview
04/20/23Introduction to Data
management 49
DBMS Interfacing
Data
base S
ch
em
a
I/O
Database
Physical I/O performed
by OS
Schema provides view of
enterprise data Application Program Work Area
Buffer Pool DBMS
Work Area
DBMS Executive
Operating System
04/20/23Introduction to Data
management 50
Retrieving a Logical Record from Disk Storage
Application requires data
stored on database
DBMS is re-engagedby OS
Application issues a data retrieval request
to DBMS
DBMS determines data exists; identifies
location
DBMS issues I/O interrupt to OS
OS determines physicallocation of data
DB buffer within DBMSis assigned
OS issues “get I/O”instruction
Data retrieved fromdisk & transferred to
DB buffer via OS
Application programnotified that data is
available
Retrieval
04/20/23Introduction to Data
management 51
Storing a New Logical Record
Application needs to store data
database
Application issues data storage request to
D BMS
Application issues a data retrieval request
to DBMS
DBMS determines data does not exist
DBMS creates “template” of logical
record
DBMS returns controlto application program
DB buffer within DBMSis assigned
Application programbuilds new logical
record
Application indicatesnew logical record
can be written
DBMS determines appropriate storage
location
Data is transferredfrom buffer; physically
written on disk
DBMS issues I/O interrupt to OS
OS determines physicallocation of data
OS issues “put I/O”instruction
DBMS is re-engagedby OS
Application programnotified that data has
been stored
04/20/23Introduction to Data
management 52
Changing Data in an Existing Logical Record
Changes are madeusing image in bufferRetrieval
Application determines thatdata in a record alreadyexisting on the database isto have some of its data changed . . . Executes the retrieval process
Application determinesreadiness to store this
data
Application issues data storage request to
D BMS
DBMS re-validates appropriate storage
location
Data is transferredfrom buffer; physically
written on disk
DBMS issues I/O interrupt to OS
OS determines physicallocation of data
OS issues “put I/O”instruction
DBMS is re-engagedby OS
Application programnotified that data has
been stored