cs215 - lec 9 indexing and reclaiming space in files
TRANSCRIPT
� Maintain Indexes.
� Adding a data record with Indexing.
� Deleting a data record with Indexing.
� Reclaiming space.
� Multilevel Index.
Dr. Hussien M.
Sharaf2
Dr. Hussien M.
Sharaf3
Structure of Indexes
� Indexes must be sorted on ascending or descending order with respect to a (one or more ) field(s).
CompanyName offset
Google 211Record1
\n
\n
IBM 0Record2 \n
ITE 643Record3 \n
Microsoft 462Record4 \n
Apple Mac 985New
record\n
Dr. Hussien M.
Sharaf4
Operations needed for an Index:1. Create an index at memory by
looping on all records from the original data file.
2. If the there is an index file, load it into memory before using it.
3. Write the index into file at the closing of the program.
Dr. Hussien M.
Sharaf5
-Now Index is loaded at memory, the following operations are needed:
1. Add: Add data records to the data file and insert an index record at the correct position.
2. Delete: mark the record at data file as deleted and delete the related record from the index.
3. Deleting and updating data records requires updating the offsets of all index records. Is it the same for the adding a data record?
Dr. Hussien M.
Sharaf6
R1
R2
R3
R4
R5
Data recordsR4
R3
R2
R5
R1
Index on Name
R2
R3
R1
R4
R5
Index on Phone
Dr. Hussien M.
Sharaf7
R1
R2
R3
R4
R5
Data records on disk
R4
R3
R2
R5
R1
Name Index on RAM
R2R3
R1R4R5
Phone Index on RAM
R6R6
R6
Dr. Hussien M.
Sharaf8
1. Go to the end of data file, get current offset.
2. Data record is appended to the end of data file.
3. An index entry is built using offset and key of the new data record. (offset, Key)
4. The new index entry is inserted into its correct position at sorted index list.
5. At the end of the program the index list is saved into disk.
Dr. Hussien M.
Sharaf9
1. Search for index entry by comparing target value with the key field value.
2. Mark the index entry as deleted.
3. Get the offset of the target data record.
4. Seek for the target offset , mark the data record as deleted.
NOTE: Data record is not actually deleted immediately. Space reclaiming function is required to run.
Dr. Hussien M.
Sharaf10
R1
R2
R3
R4
R5
Data records on disk
R4
R6
R2
R5
R1
Name Index on RAM
R2R6
R1R4R5
Phone Index on RAMR6
R3
R3
Dr. Hussien M.
Sharaf11
A. Create a new file stream.
B. While not end of records1. Read a collection of records into buffer.
2. For each record in the buffer:
� If record is marked deleted, go to the next record.
� Else copy record to the new file stream.
C. End While
D. Rebuild all indexes based on the new data file.
NOTE: in the process of copying data to the new stream, buffering is used.
Dr. Hussien M.
Sharaf12
� When an Index gets very big, it can not be stored in RAM.� It should be stored on file, hence another level of index that can be loaded into memory is required.� Hence we need multilevel of indexing.
Dr. Hussien M.
Sharaf13
� Level #4 Index can be loaded into memory