14. files - data structures using c++ by varsha patil
TRANSCRIPT
![Page 1: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/1.jpg)
1Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
14 Files
![Page 2: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/2.jpg)
2Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
OBJECTIVES
The purpose of standard data organization methods
Various file organizations and their application-specific suitability such as sequential file, indexed sequential file, and direct access file
The advantages and disadvantages of file organizations
![Page 3: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/3.jpg)
3Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
INTRODUCTION Records that hold information about similar items
of data are usually grouped together into ‘file’ A file is a collection of records where each record
consists of one or more fields
![Page 4: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/4.jpg)
4Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
External storage devices are mainly used for the following:
Overlay or backup of programs during execution
Storage of programs for future use Storage of information in files
External storage device
![Page 5: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/5.jpg)
5Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
EXTERNAL STORAGE DEVICES
The most common external storage devices in order of their use, and study :
magnetic tapes, drums, and disk drives
When we organize data in ‘file’ data structure, the data is non-volatile, which means data will reside on storage after data processing is over
![Page 6: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/6.jpg)
6Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Magnetic Tape A tape is made up of a plastic material
coated with a ferrite substance that is easily magnetized
The physical appearance of the tape is similar to the tape used for sound recording.
A limitation of magnetic tape devices is that records must be processed in the order in which they reside on the tape
![Page 7: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/7.jpg)
7Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Magnetic Drums A magnetic drum is a metal cylinder, from 10 to
36 inches in diameter, which has an outside surface coated with a magnetic recording material
The cylindrical surface of the drum is divided into a number of parallel bands called tracks
The tracks are further divided into either sectors or blocks, depending on the nature of the drum
The sector or block is the smallest addressable unit
![Page 8: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/8.jpg)
8Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Magnetic Disks The magnetic disk is a direct access
storage device, which has become more widely used than the magnetic drum, mainly because of its lower cost
![Page 9: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/9.jpg)
9Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
FILE ORGANIZATION
The proper arrangement of records within a file is called as file organization
![Page 10: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/10.jpg)
10Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
FILE ORGANIZATIONDifferent Schemes of File Organizations
Sequential file Direct or random access file Indexed sequential file Multi-Indexed file
![Page 11: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/11.jpg)
11Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Sequential file In sequential file, records are stored
in the sequential order of their entry This is the simplest kind of data
organization
![Page 12: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/12.jpg)
12Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Direct or random access file
Though we search records using key, we still need to know the address of the record to retrieve it directly
The file organization that supports Files such access is called as direct or random file organization
![Page 13: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/13.jpg)
13Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Indexed sequential file
File—Records are stored sequentially but the index file is prepared for accessing the record directly
An index file contains records ordered by a record key
The record key uniquely identifies the record and determines the sequence in which it is accessed with respect to other records
![Page 14: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/14.jpg)
14Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Multi-Indexed file In multi-indexed file, the data file is associated
with one or more logically separated index files Inverted files and multilist files are examples
of multiindexed files
![Page 15: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/15.jpg)
15Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
The factors that affect file organization are mainly the following:
Storage device Type of query Number of keys Mode of retrieval/update of
record
Factors Affecting File Organization
![Page 16: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/16.jpg)
16Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Speed Operations Capacity Size Security
Factors Involved in Selecting File Organization
![Page 17: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/17.jpg)
17Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
File I/O Classes Primitive Functions Files Binary and Text File
FILES USING C++
![Page 18: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/18.jpg)
18Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
The I/O system of C++ contains a set of classes that define the file handling methods
They are ifstream, ofstream, and fstream These classes are included in ‘fstream.h’
header file ifstream—This class provides input
operations ofstream—This class provides output operations
fstream—This class provides both input and output operations
File I/O Classes
![Page 19: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/19.jpg)
19Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
There are several ways of reading (or writing) the text from (or to) a file
But all of them have a common approach as follows
Open the file Read (or write) the data Close the file
Primitive Functions
![Page 20: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/20.jpg)
20Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Binary and Text File Binary and Text File
The binary file consists of binary data It can store text, graphics, sound data in binary
format The binary files cannot be read directly
Text File The text file contains the plain ASCII characters It contains text data which is marked by ‘end of
line’ at the end of each record This end of record marks help easily to perform
operations such as read and write Text file cannot store graphical data
![Page 21: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/21.jpg)
21Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Sequential File Organization
The order of the records is fixed Within each block, the records are in sequence A sequential file stores records in the order they are
entered New records always appear at the end of the file Primitive Operations Features of Sequential File Organization Drawbacks of Sequential File Organization
![Page 22: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/22.jpg)
22Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Primitive Operations
Open—This opens the file and sets the file pointer to immediately before the first record
Read-next—This returns the next record to the user. If no record is present, then EOF condition will be set
Close—This closes the file and terminates the access to the file
Write-next—File pointers are set to next of last record and write the record to the file
![Page 23: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/23.jpg)
23Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Primitive Operations
Write-next—File pointers are set to next of last record and write the record to the file
EOF—If EOF condition occurs, it returns true, otherwise it returns false
Search—Search for the record with a given key
Update—Current record is written at the same position with updated values
![Page 24: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/24.jpg)
24Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Drawbacks of Sequential File Organization
Insertion and deletion of records in in-between positions huge data movement
Accessing any record requires a pass through all the preceding records, which is time consuming. Therefore, searching a record also takes more time.
Needs reorganization of file from time to time. If too many records are deleted logically, then the file must be reorganized to free the space occupied by unwanted records
![Page 25: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/25.jpg)
25Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Direct Access File Organization
Files that have been designed to make direct record retrieval as easy and efficiently as possible is known as directly organized files
Direct access files are of great use for immediate access to large amounts of information
They are often used in accessing large databases
![Page 26: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/26.jpg)
26Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Direct Access File Organization
Beginning of file
Current position
End of file
0
1
2
SEEK_SET
SEEK_CUR
SEEK_END
Origin Value Macro Name
![Page 27: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/27.jpg)
27Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
File position indicators
0
Origin
Value Macro Name
Beginning of file
Current position
End of file
0
1
2
SEEK_SET
SEEK_CUR
SEEK_END
![Page 28: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/28.jpg)
28Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
File position indicators
0
Origin
Value Macro Name
Beginning of file
Current position
End of file
0
1
2
SEEK_SET
SEEK_CUR
SEEK_END
![Page 29: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/29.jpg)
29Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
INDEX SEQUENTIALFILE ORGANIZATION
A file that is loaded in key sequence but can be accessed directly by use of one or more indices is known as an indexed sequential file
A sequential data file that is indexed is called as indexed sequential file
A solution to improve speed of retrieving target is index sequential file
An indexed file contains records ordered by a record key
Each record contains a field that contains the record key
![Page 30: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/30.jpg)
30Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Advantages Accessing any record is more efficient than
sequential file organization Large amount of data can be stored using this
type of file organization
Disadvantages Often more than one indices are needed which
occupies large storage area
![Page 31: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/31.jpg)
31Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Types of Indexes(CONTD…) Primary index
It is an index ordered in the same way as the data file, which is sequentially ordered according to a key
The indexing field is equal to this key Secondary index
An index that is defined on a non-ordering field of the data file
In this case, the indexing field need not contain unique values
![Page 32: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/32.jpg)
32Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Types of Indexes
Clustering indexes A data file can associate with utmost one
primary index and several secondary indexes The single-level indexing structure is the
simplest one where a file, whose records are pairs, contains a key and a pointer
The single-level indexing structure is the simplest one where a file, whose records are pairs, contains a key and a pointer
![Page 33: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/33.jpg)
33Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Structure of Index Sequential File
Index file consists of three areas: A primary storage area—This includes some
unused space to allow for additions made in data A separate index or indexes—Each query will
reference this index first; it will redirect query to part of data file in which the target record is saved
An overflow area—This is optional separate overflow area
![Page 34: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/34.jpg)
34Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Characteristics of Indexed Sequential
File Records are stored sequentially but the index
file is prepared for accessing the record directly Records can be accessed randomly File has records and also the indexMagnetic tape is not suitable for index
sequential storage Index is the address of physical storage of a
record When randomly very few are required/accessed,
then index sequential is better Faster access methodAddition overhead is to maintain indexIndex sequential files are popularly used in many
applications like digital library
![Page 35: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/35.jpg)
35Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
LINKED ORGANIZATION
In linked organization, the physical sequence of records is different from the logical sequence of records
The next logical record is obtained by following a link value from the present record
Multilist Files Coral Rings Inverted Files Cellular Partitions
![Page 36: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/36.jpg)
36Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
To make searching easy, several indexes are maintained as per primary key and secondary keys, one index per key
The record may be present in different lists as per key
Multilist Files
![Page 37: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/37.jpg)
37Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Consider the following file of office staff
Multilist Files
Staff ID Occupation Salary Record 106 Clerk 5000 A150 Account 4000 B360 clerk 3000 C400 Account 3500 D700 Clerk 2000 E
![Page 38: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/38.jpg)
38Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
We can maintain index on the staff ID
We can group staff ID with range 101–300, 301–600, 601–900, etc
Now all the records with staff ID in the same range will be linked together as shown in Figure
Multilist Files
Sample multilist file
Staff range Link
101-300
301-600
601-900
rec A
rec C
rec E
rec B
rec D
![Page 39: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/39.jpg)
39Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Coral Rings In this doubly linked multilist structure is
used. Each list is circular list with headnode
A C E Alink Clerk Blink S Blink
Sample doubly linked list
![Page 40: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/40.jpg)
40Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Coral Rings ‘Alink’ field is used to link all records with
same key value ‘Blink’ is used for some records back
pointer and for others it is pointer to head node
Owing to these back pointers, deletion is easy without going to start.
Indexes are maintained as per multilists
![Page 41: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/41.jpg)
41Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Inverted Files
With the same key value are linked together and links are kept in each record
But in the inverted files, the link information is kept in the index itself
![Page 42: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/42.jpg)
42Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
The indices for fully inverted file are shown in Figure
Occupation index 106 A
150 B
360 C
400 D
700 E
Accountant B,DClerk A,C,E
Salary Index2000 E4000 B,C,D6000 A
![Page 43: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/43.jpg)
43Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Inverted Files The inversion process is associated with the
information of inverted list In inverted files, the record is accessed in two
steps. First, the indices are searched to obtain a list of required records and then second, records are retrieved using these lists
The number of disk accesses required is equal to the number of records being retrieved plus the number to process the indices
In inverted files, only the index structures are important
![Page 44: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/44.jpg)
44Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Cellular Partitions To decrease file search time, the storage
media may be divided into cells A cell may be an entire disk or a cylinder.
Lists are localized to lie within a cell If a cylinder is used as a cell, then all records
on the same cylinder may be accessed without moving the read/write heads
We divide multilists organized on several different cylinders into several small lists which are stored on the same cylinder
![Page 45: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/45.jpg)
45Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Multilist structure with cellular partitioning
Position Student ID T. ID Link1 100 A2 200 B Null3 300 C Null4 400 A Null1 500 D2 600 B Null3 700 A Null4 800 D Null1 900 E Null2 1000 C Null3 1100 D Null4 1200 A Null
Primary key Secondary key
![Page 46: 14. Files - Data Structures using C++ by Varsha Patil](https://reader035.vdocuments.us/reader035/viewer/2022062503/5876d4b11a28ab1d238b5491/html5/thumbnails/46.jpg)
46Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
End of Chapter 14 …!