file organization terms and concepts bit: smallest unit of data; binary digit (0,1) byte: group of...
Post on 20-Dec-2015
216 views
TRANSCRIPT
File Organization Terms and Concepts
Bit: Smallest unit of data; binary digit (0,1)
Byte: Group of bits that represents a single character
Field: Group of words or a complete number
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
Record: Group of related fields
File: Group of records of same type
Database: Group of related files
File Organization Terms and Concepts
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
Data Hierarchy in a Computer System
Figure 7-1
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
Entity: Person, place, thing, event about which information is maintained
Attribute: Description of a particular entity
Key field: Identifier field used to retrieve, update, sort a record
File Organization Terms and Concepts
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
Figure 7-2
Entitities and Attributes
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
Traditional File Processing
Figure 7-3
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
ORGANIZING DATA IN A TRADITIONAL FILE ENVIRONMENT
Data redundancy
Program-Data dependence
Lack of flexibility
Poor security
Lack of data-sharing and availability
Problems with the Traditional File Environment
The presence of duplicate data in multiple data files
Different functions collect the same information independently
May have different meanings in different parts of the organisation
DATA REDUNDANCY
7
Data Redundancy
8
Data Redundancy
Staff_Branch relation has redundant data; the details of a branch are repeated for every member of staff.
In contrast, the branch information appears only once for each branch in the Branch relation and only the branch number (Branch_No) is repeated in the Staff relation, to represent where each member of staff is located.
Program Data Dependence
The tight relationship between data stored in files and the specific programs required to update and maintain those files
Every program must describe the nature In traditional file environment any changes to
data requires a change in all programs that access the data
A change in tax rates for example !!
Lack of Flexibility
Traditional File system can deliver routine scheduled reports after a significant programming efforts
An ad hoc/ unanticipated request for information, would require a lot of time
The information is somewhere in the system but too expensive to locate/retrieve
Compiling the data could take weeks
Poor Security
There is little or no control and management of data
Data could be disseminated all over the organisation without control
Who is accessing the data and making changes?
Lack of Data-sharing
Lack of control over access Hard to get hands on information Different pieces of information in different files
and different physical locations Since files in different locations can’t be related
hard to share or access in a timely manner Impossible for information to flow freely
Database Technology
DATABASE: A collection of data organised to serve many
applications efficiently by centralising the data and minimising redundant data.
Historical context
Why develop DBMS at all? Manage flood of data from Transaction
Processing Systems Integrate data across organisation “Data glare”
DBMS
A Database Management System (DBMS) is general purpose software and hardware facility to:
Create, delete, reorganize, and manipulate data in a database
Store, retrieve, share, and maintain data in a database
Maintain relationships between the database components
THE DATABASE APPROACH TO DATA MANAGEMENT
• Creates and maintains databases
• Eliminates requirement for data definition statements
• Acts as interface between application programs and physical data files
Database Management System (DBMS)
DBMS Cont’d Provide security and procedures relating to
privilege and access. Authenticates the integrity of all the updates and
transactions that are carried out. interface for the access, deletion and addition of
data and for redefining the relationships within the database.
A DBMS is a collection of programs that manages the database structure and controls access to the data stored in the database.
DBMS
Relieves the programmer or end user from the task of understanding where and how data are actually stored
Seperates the logical view from the physical view
Logical View- How data perceived by end users or business specialists
Physical View- How data is actually organised and structured on phsical storage media
The Contemporary Database Environment
Figure 7-4
THE DATABASE APPROACH TO DATA MANAGEMENT
Types of Databases
• Relational DBMSRelational DBMS
• Hierarchical and Network DBMSHierarchical and Network DBMS
• Object-Oriented DatabasesObject-Oriented Databases
THE DATABASE APPROACH TO DATA MANAGEMENT
Relational DBMS
• The most popular type of DBMS today for PCs as well as for larger companies and mainframes
• Represents all data in DB as two-dimensional tables called relations
• Similar to flat files but information in more than one file can easily be extracted and combined
• Relates data across tables based on common data element
• Examples: DB2, Oracle, MS SQL Server
THE DATABASE APPROACH TO DATA MANAGEMENT
Figure 7-6
Relational Data Model
THE DATABASE APPROACH TO DATA MANAGEMENT
Three Basic Operations in a Relational Database
• Select:Select: Creates subset of rows that meet Creates subset of rows that meet specific criteriaspecific criteria
• Join:Join: Combines relational tables to provide Combines relational tables to provide users with informationusers with information
• Project:Project: Enables users to create new tables Enables users to create new tables containing only relevant informationcontaining only relevant information
THE DATABASE APPROACH TO DATA MANAGEMENT
Figure 7-7
THE DATABASE APPROACH TO DATA MANAGEMENT
Three Basic Operations in a Relational Database
Hierarchical and Network DBMS
Hierarchical DBMSHierarchical DBMS
• Organizes data in a tree-like structureOrganizes data in a tree-like structure
• Supports one-to-many parent-child Supports one-to-many parent-child relationshipsrelationships
• Prevalent in large legacy systemsPrevalent in large legacy systems
THE DATABASE APPROACH TO DATA MANAGEMENT
Hierarchical DBMS
Figure 7-8
THE DATABASE APPROACH TO DATA MANAGEMENT
Disadvantages Knowledge of physical level required Does not support logical data independence and
does not support all physical data independence operations
Not all problems are one-to-many types Problems with multiple parent implementation Problems with anomalies for parent deletion Application development in 3GL time-consuming Support programs are not part of the DBMS “System created by programmers for
programmers!”
Hierarchical
Network DBMSNetwork DBMS
• Depicts data logically as many-to-many Depicts data logically as many-to-many relationshipsrelationships
THE DATABASE APPROACH TO DATA MANAGEMENT
Network DBMS
THE DATABASE APPROACH TO DATA MANAGEMENT
Disadvantages
Outdated
Less flexible compared to RDBMS
Lack support for ad-hoc and English language-like queries
THE DATABASE APPROACH TO DATA MANAGEMENT
Object-oriented DBMS: Stores data and procedures as objects that can be retrieved and shared automatically
Object-relational DBMS: Provides capabilities of both object-oriented and relational DBMS
Object-Oriented databases
DBMS Disadvantages DBMSs are complex;
Need for explicit backup and control;
Costs associated with development and operation can be substantial;
Consolidation of an entire business’ information resources can create a high level of vulnerability.
Conceptual design: Abstract model of database from a business perspective
Physical design: How data is actually stored on direct access storage devices
Designing Databases
Entity-relationship diagram: Methodology for documenting databases illustrating relationships between database entities
Normalization: Process of creating small stable data structures from complex groups of data
CREATING A DATABASE ENVIRONMENT
Designing Databases
An Entity-Relationship Diagram
Figure 7-10
An Unnormalized Relation of ORDER
Figure 7-11
An Normalized Relation of ORDER
Figure 7-12
Centralized database
Used by single central processor or multiple processors in client/server network
Distributing Databases
Distributed database
Stored in more than one physical location
Partitioned database
Duplicated database
Distributing Databases
Distributed Databases
Figure 7-13
Data warehouse
Supports reporting and query tools
Stores current and historical data
Consolidates data for management analysis and decision making
Data Warehousing and Datamining
DATABASE TRENDS
Components of a Data Warehouse
DATABASE TRENDS
Figure 7-16
Datamining
Tools for analyzing large pools of data
Find hidden patterns and infer rules to predict trends
Data Warehousing and Datamining
DATABASE TRENDS
Improved and easy accessibility to information
Ability to model and remodel the data
Benefits of Data Warehouses
DATABASE TRENDS
Database server
Computer in a client/server environment runs a DBMS to process SQL statements and perform database management tasks
Application server
Software handling all application operations
Databases and the Web
DATABASE TRENDS
Linking Internal Databases to the Web
Figure 7-18
DATABASE TRENDS