1 chapter 9 database management systems. 2 objectives for chapter 9 zproblems in the flat-file...
Post on 29-Dec-2015
214 Views
Preview:
TRANSCRIPT
1
Chapter 9
Database Management Systems
2
Objectives for Chapter 9 Problems in the flat-file approach to data management
Why these gave rise to the database concept Relationships among the elements of the database
environment Characteristics of the relational database model Three stages in database design including
Conceptual design Logical design Physical design
Anomalies caused by unnormalized databases and the need for data normalization
Features of distributed databases Considerations in deciding on a particular database
configuration
3
Flat-File Versus Database Environments
Computer processing involves two components: data and instructions (programs).
Conceptually, there are two methods for designing the interface between program instructions and data: File-oriented processing: A specific data file was
created for each application Data-oriented processing: Creates a single data
repository to support numerous applications Disadvantages of file-oriented processing include
redundant data and programs and varying formats for storing the redundant data.
The format for similar fields may vary because the programmer used inconsistent field formats.
Flat-File Environment
Program 1
Program 2
Program 3
A,B,C
X,B,Y
L,B,M
User 2Transactions
User 1Transactions
User 3Transactions
Data
5
Data Redundancy & Flat-File Problems
Data Storage - creates excessive storage costs of paper documents and/or magnetic form
Data Updating - any changes or additions must be performed multiple times
Currency of Information - potential problem of failing to update all affected files
Task-Data Dependency - user’s inability to obtain additional information as his or her needs change
Program 1
Program 2
Program 3
User 2Transactions
User 1Transactions
User 3Transactions
Database
DBMS
A,B,C,X,Y,L,M
Database Approach
7
Advantages of the Database Approach
Data sharing/centralize database resolves flat-file problems:
No data redundancy - Data is stored only once, eliminating data redundancy and reducing storage costs.
Single update - Because data is in only one place, it requires only a single update procedure, reducing the time and cost of keeping the database current.
Current values - A change to the database made by any user yields current data values for all other users.
Task-data independence - As users’ information needs expand beyond their immediate domain, the new needs can be more easily satisfied than under the flat-file approach.
8
Disadvantages of the Database Approach
Can be costly to implement additional hardware, software, storage, and
network resources are requiredCan only run in certain operating
environments may make it unsuitable for some system
configurationsBecause it is so different from
the file-oriented approach, the database approach requires training users may be inertia or resistance
Elements of the Database Approach
System DevelopmentProcess
Database Administrator
USERS
DBMS
HostOperatingSystem
PhysicalDatabase
UserPrograms
UserPrograms
UserPrograms
Applications
DataDefinitionLanguage
DataManipulationLanguage
QueryLanguage
User Queries
Transactions
Transactions
Transactions
Sys
tem
Req
ue
sts
10
DBMS FeaturesUser Programs - make the presence of the
DBMS transparent to the userDirect Query - allows authorized users to
access data without programmingApplication Development - user created
applicationsBackup and Recovery - copies databaseDatabase Usage Reporting - captures
statistics on database usage (who, when, etc.)Database Access - authorizes access to
sections of the database
11
Internal Controls and DBMS
The purpose of the DBMS is to provide controlled access to the database.
The DBMS is a special software system programmed to know which data elements each user is authorized to access and deny unauthorized requests of data.
12
Data Definition Language (DDL)
DDL is a programming language used to define the database to the DBMS.
The DDL identifies the names and the relationship of all data elements, records, and files that constitute the database.
Viewing Levels: internal view - physical arrangement of
records (1) conceptual view - representation of
database (1) user view - the portion of the database
each user views (many)
External User Views
Conceptual Model
Internal Model
Physical Database
ANSI Model
14
Data Manipulation Language (DML)
DML is the proprietary programming language that a particular DBMS uses to retrieve, process, and store data.
Entire user programs may be written in the DML, or selected DML commands can be inserted into universal programs, such as COBOL and FORTRAN.
15
Query Language
The query capability permits end users and professional programmers to access data in the database without the need for conventional programs.
IBM’s Structured Query Language (SQL) is a fourth-generation language that has emerged as the standard query language.
16
Three Steps in Designing a Database
Prepare the conceptual model Identify the entities Identify the relationships between the entities Prepare the ER diagram
Specify the logical design Select the logical database model (relational) Transform the conceptual data model using a logical
database model
Implement the physical design Physical structures Access methods
17
Phase 1
Prepare the Conceptual Model
Draw an ERD to capture the process.
ER-Diagram Symbols
Entity Relationship Attribute
Primary Key
ER-Diagram Symbols
Example of a Relationship LinkingTwo Entities
CUSTOMER places ORDER
Name Number
Order Number Item #
1 M
21
An Entity
...is an individual object concept event
...may be a specific tangible object intangible object
Entity Class is a collection of entities with similar attributes.
22
Attributes
A property of an entity that we choose to record (of interest to an organization).CUSTOMER (entity) PRODUCT (entity)customer # product #name descriptionaddress finishtelephone no. pricebalance qty. on hand
Sales-person
Car
Customer Order
Vendor Inventory
Assigned
Places
Supply
Entity Relationship Entity
1
M
M M
1
1
Cardinalities
34
ER-Diagram using REA Model
Inventory Line items Sales Party to Salesperson
Pays for
Cash CollectionsIncreases
Cash
Made toCustomer
Cashier
Receivedfrom
Received by
M
1
M
M
M
M
M
M
M
M
1
1
1
1
R E A
25
Phase 2
Specify the Logical Design
Create relational tables.
26
Logical Data Structures
A particular method used to organize records in a database is called the database’s structure.
The objective is to develop this structure efficiently so that data can be accessed quickly and easily.
Four types of structures are: hierarchical (AKA the tree structure) network relational object-oriented
27
The Relational Model
The relational model portrays data in the form of two dimensional tables: relation - the database table attributes (data elements) - form
columns tuples (records) - form rows data - the intersection of rows and
columns
RESTRICT - filtering out rows, such as the purple
PROJECT - filtering out columns,such as the purple
X1 X1
X2 X2
X3 X3
Y1
Y1
Y1 Y1
Y1
Y2 Y2 Y2
Y3
Z1 Z1
Z2 Z2
Z3 Z1
JOIN
29
Properly Designed Relational Tables
No repeating values - All occurrences at the intersection of a row and column are a single value.
The attribute values in any column must all be of the same class.
Each column in a given table must be uniquely named.
Each row in the table must be unique in at least one attribute, which is the primary key.
30
Relational Model Data Linkages (>1 table)
No explicit pointers are present. The data are viewed as a collection of independent tables.
Relations are formed by an attribute that is common to both tables in the relation.
Assignment of foreign keys: if 1 to 1 association, either of the table’s primary
keys may be the foreign key. if 1 to many association, the primary key on one of
the sides is embedded as the foreign key on the other side.
if many to many association, may embed foreign keys or create a separate linking table.
31
Three Types of Anomalies
Insertion Anomaly: A new item cannot be added to the table until at least one entity uses a particular attribute item.
Deletion Anomaly: If an attribute item used by only one entity is deleted, all information about that attribute item is lost.
Update Anomaly: A modification on an attribute must be made in each of the rows in which the attribute appears.
Anomalies can be corrected by creating relational tables.
32
Advantages of Relational Tables
Removes all three anomaliesVarious items of interest (customers,
inventory, sales) are stored in separate tables.
Space is used efficiently.Very flexible. Users can form ad hoc
relationships.
33
The Normalization Process
A process which systematically splits unnormalized complex tables into smaller tables that meet two conditions: all nonkey (secondary) attributes in the
table are dependent on the primary key all nonkey attributes are independent of the
other nonkey attributesWhen unnormalized tables are split and
reduced to third normal form, they must then be linked together by foreign keys.
Steps in Normalization
Table withrepeating groups
First normalform 1NF
Second normalform 2NF
Third normalform 3NF
Higher normalforms
Removerepeating
groups
Remove partial
dependencies
Removetransitive
dependencies
Removeremaininganomalies
35
Accountants and Data Normalization
The update anomaly can generate conflicting and obsolete database values.
The insertion anomaly can result in unrecorded transactions and incomplete audit trails.
The deletion anomaly can cause the loss of accounting records and the destruction of audit trails.
Accountants should have an understanding of the data normalization process and be able to determine whether a database is properly normalized.
36
Phase 3
Implement the Physical Design
Decide about software and hardware.
37
Physical Database Design
Transition from theoretical to physical aspects of database IS IT
Decisions about software and hardware Implementation
Populate the database with data Produce physical user views (multiple)
38
Data Structures
allow records to be located, stored, and retrieved and allow movement through the database. Two components: The organization of a file is the physical
arrangement of records. The access method is the technique
used to locate records and to navigate through the database.
Distributed Data Processing
Site C Site BSite A
Centralized Database
Central Site
40
Distributed Data Processing
DP is organized around several information processing units (IPUs) distributed throughout the organization and placed under the control of the end users.
DDP does NOT mean Decentralization! IPUs are connected to
one another and coordinated.
41
Potential Advantages of DDP
Cost reductions in hardware and data entry tasks
Improved cost control responsibilityImproved user satisfaction since
control is closer to the user levelBackup of data can be improved
through the use of multiple data storage sites
42
Potential Disadvantages of DDP
Loss of controlMismanagement of organization-wide
resourcesHardware and software incompatibilityRedundant tasks and dataConsolidating incompatible tasksDifficulty attracting qualified
personnelLack of standards
43
The data is retained in a central location.
Remote IPUs send requests for data.Central site services the needs of
the remote IPUs.The actual processing of the data is
performed at the remote IPU.
Centralized Databases in DDP Environment
44
Data Currency
Occurs in DDP with a centralized database
During transaction processing, the data will temporarily be inconsistent as a record is being read and updated.
Database lockout procedures are necessary to keep IPUs from reading inconsistent data and from writing over a transaction being written by another IPU.
45
Distributed Databases: Partitioning
Splits the central database into segments that are distributed to their primary users
Advantages: users’ control is increased by having data
stored at local sites transaction processing response time is
improved the volume of transmitted data between IPUs is
reduced reduces the potential data loss from a
disaster
46
The Deadlock Phenomenon
Especially a problem with partitioned databases
Occurs when multiple sites lock each other out of data that they are currently using One site needs data locked by another site.
Special software is needed to analyze and resolve conflicts. Transactions may be terminated and have
to be restarted.
47
The Deadlock Phenomenon
A,BE, F
C,D
Locked A, waiting for C
Locked C, waiting for E
Locked E, waiting for A
48
Distributed Databases: Replication
The duplication of the entire database for multiple IPUs
This method is effective for situations with a high degree of data sharing, but no primary user, and supports read-only queries.
The data traffic between sites is reduced considerably.
49
Concurrency Problems and Control Issues
Database concurrency is the presence of complete and accurate data at all IPU sites. With replicated databases, maintaining current data at all locations is a difficult task.
Time stamping may be used to serialize transactions and to prevent and resolve any potential conflicts created by updating data at various IPUs.
top related