2.7 use of ict in data management

2.7 Use of ICT in data management

By H’MM

What is a database?

• A database or database system is a collection of related data. In its simplest form a database consists of a collection of records and fields. Each record contains the same set of fields, each of which contains one piece of information.

Database Management System (DBMS)

Definition: A database management system (DBMS) is, as its name suggests, the software used to manage a database system.• It manages:

the structure of the individual data filesthe relationships between data items and between data fileshow the data is interrogated (i.e. how you get information from the database)the properties of the database, i.e. ensuring that all queries, updating and amendments to structure are processed reliably.

Sequential Files• In a sequential file, records are stored one after the other, in the order in

which they were added to the storage medium, usually magnetic tape. To read data from or write data to tape, sequential files must be used.

• There are two ways that records can be arranged in a sequential file. One way is to have the records in some sort of order using a key field. A key field is one which is unique to every record, i.e. every record has a different value in that field. This is called ordered sequential.

• Alternatively, the records might be arranged with no thought given to their order so they appear to be unordered. Whether the file is ordered or unordered affects the way in which the data is processed as well as the type of processing that can be used.

• An unordered sequential file is often referred to as a serial file, as the only method for retrieving information is to go through each record one by one.

• Whether the file is ordered or unordered affects the way in which the data is processed as well as the type of processing that can be used. An unordered sequential file is often referred to as a serial file, as the only method for retrieving information is to go through each record one by one.

• In an ordered file, the records are put in order of a key field such as customer ID, as shown above. In an unordered file, the records are not in any particular order.

Disadvantages to using sequential files

There are a number of disadvantages to using sequential files:

The only way to add new records to a sequential file is to store them at the end of the file.

A record can only be replaced if the new record is exactly the same length as the original.

Records can only be updated if the data item used to replace the existing data is exactly the same length.

• The processing of records in a sequential file is slower than with other types of file.

• In order to process a particular record all the records before the one you want have to be read in sequence until you get to the one you want.

• The use of sequential files is recommended only for those types of application where most or all the records have to be processed at one time.

• Adding records to the end of the file is fairly straightforward. However, amending or deleting records is not so easy.

• If the file is an unordered sequential file, then it cannot be easily done.

• If it is an ordered sequential fi le, then the changes can be made relatively easily providing the transaction tile – which contains the actions to be carried out on the records - has been sorted into the same order as the master file, using the key field.

The letter in the Trans. column is the type of transaction. D is a deletion of, C is a change to and A is an addition of a record.

The computer reads the first record in the transaction file and the first record in the old master file. If the 10 doesn't match, the computer writes the master file record to the new master file. The next record of the old master file is read and if it matches, as it does in this example, the computer carries out the transaction.

• In this case the record has to be deleted, so instead of writing this old master file record to the new master file the computer ignores it and reads the next old master file record and the next transaction record.

• We are now on the second record of the transaction file and the third record of the old master file. If they don't match, the old master file record is written to the new master file and the next record (the fourth) of the old master file is read. This carries on until the next old master file record is found which matches the transaction file record.

• In this case, the fifth old master file record 10 matches the second transaction record. This requires a change, so data in the transaction file is written to the new master file (not the old master file record). This whole procedure carries on until the transaction type ‘A’ is met. After this, all the remaining records of the old master file are written unchanged to the new master file and then the remaining records of the transaction file are added to the master file.

Indexed sequential files

• Indexed sequential files are stored in order. Ordinary sequential or serial files can be stored on tape.

• An indexed sequential file is stored on disk to enable some form of direct access.

• Each record consists of fixed length fields. • This is a leftover from the use of magnetic tapes where

records had to be stored in the order they were written to the file.

• The use of ordering facilitated a greater speed of access.

• With an indexed sequential system the records are in some form of order.

• For example by Surname for a record of employees. The index is a pointer to whereabouts on the disk the record is stored.

• In simple terms, the table might be numbered 1 to 26 (A to Z) and the whereabouts on the tape that all the As can be found, all the Bs, and so on, is stored in this index.

• This means that when a name beginning with S is required the part of the file containing all the As to Rs can be ignored and the disk is accessed where the Ss begin. All the records beginning with S still have to be read one by one until the appropriate record is found, but it does mean that not every record from A onwards has to be read.

Applications of indexed sequential files

• Banks use sequential access systems for batch processing cheques.

• This system would have to be at least indexed sequential for faster access to records for online banking.

• Indexed sequential files are used with hybrid batch –processing systems, such as employee records. The index will allow for direct access when individual records are required for human resource/personnel use.

• The records will be held sequentially to allow for serial access when producing a payroll, since all records will be processed o ne after the other.

Random Access files• Random access is the quickest form of access. • It does not matter whereabouts in the file the desired record is;

it will take the same amount of time to access any particular record.

• Each record is fixed length and each has a key. "The computer looks up the key and goes to the appropriate place on the disk to access it.

Random vs. Sequential

Hierarchical database management systems• Hierarchical DBMS are no longer

used as a form of file management to any extent, as they suffer from the problem of one-way relationships.

• Hierarchical DBMS use a tree-like structure similar to a family tree system.

• Its main use is in file organization within computer directory structures.

• It enables fast access to data, however, as large amounts of data are bypassed as you go down the levels.

History• The hierarchical structure was used in early mainframe DBMS. Records'

relationships form a treelike model. This structure is simple but inflexible because the relationship is confined to a one-to-many relationship. The IBM Information Management System (IMS) and the RDM Mobile are examples of a hierarchical database system with multiple hierarchies over the same data. RDM Mobile is a newly designed embedded database for a mobile computer system.

• The hierarchical data model lost traction as Codd's relational model became the de facto standard used by virtually all mainstream database management systems. A relational-database implementation of a hierarchical model was first discussed in published form in 1992. Hierarchical data organization schemes resurfaced with the advent of XML in the late 1990s. The hierarchical structure is used primarily today for storing geographic information and file systems. Currently the most widely used hierarchical databases are IMS and Windows Registry by Microsoft.

http://en.wikipedia.org/wiki/Edgar_F._Codd

Network database management systems• Network DBMS were developed to

overcome a lot of the faults of the hierarchical type. Although the technology is outdated, many existing databases still rely on this form of DBMS.

• Many are distributed database systems. Parts of the database are usually stored on a number of computers that are linked through a WAN or LANs.

• Many of the parts of the database are duplicated so that it is unlikely that any data is lost.

• Despite this, it appears to each user to be a single system. The duplication also enables faster processing.

• The system caters for very complex searches or filters but does not necessarily carry out the processing at the site where the user is.

• Another type of network database is stored on one device but can be accessed from a number of network locations through either a LAN or a WAN.

• Users of the database can access the system simultaneously without affecting the speed of accessing data. Examples of this type are the Police National Computer (PNC) and the Driver and Vehicle Licensing Authority (DVLA) in the UK. Both of these can be accessed by police officers from their cars.

Relational database systems• The term "relational database" was invented by E. F. Codd at IBM

in 1970, Codd introduced the term in his seminal paper "A Relational Model of Data for Large Shared Data Banks“.

• In this paper and later papers, he defined what he meant by "relational". One well-known definition of what constitutes a relational database system is composed of Codd's 12 rules.

• However, many of the early implementations of the relational model did not conform to all of Codd's rules, so the term gradually came to describe a broader class of database systems, which at a minimum:– Present the data to the user as relations (a presentation in tabular

form, i.e. as a collection of tables with each table consisting of a set of rows and columns);

– Provide relational operators to manipulate the data in tabular form.

• A relational database consists of a number of separate tables that are related in some way.

• Each table has a key field that is a field in at least one other table. Data from one table can then be combined with data from another table when producing reports.

• It is possible to select different fields from each table for output, using the key field as a reference point. For example, relational tables could be used to represent data from a payroll application and from a human resources application.

• The key field could be the works number. Fields of personal data from the human resources table could be combined with fields from the payroll in a report.

• The standard programming language in large applications to deal with relational tables is the structured query language (SQL), which is used for queries and producing reports.

• An advantage of relational databases is that data is not repeated and therefore doesn't waste valuable storage capacity.

• ln contrast, the problem with flat file databases is that they repeat data. A payroll file may have the name and contact details of a worker and this would be duplicated in a human resources file.

• In a relational database, these would be in separate tables connected by the key field - worker number.

• Data retrieval is quicker.• Duplicated data can mean that hackers have

easier access to personal data that might be repeated across different files, so relational databases reduce this risk.

• Allows room for expansion.

2.7 use of ict in data management

Technology