13. indexing mtrees - data structures using c++ by varsha patil
TRANSCRIPT
![Page 1: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/1.jpg)
1Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
13. Indexing and Multiway Trees
![Page 2: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/2.jpg)
2Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Objectives Indexing techniques
B-trees which prove invaluable for problems of external information retrieval
A class of trees called tries, which share some properties of table lookup
Important uses of trees in many search techniques
![Page 3: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/3.jpg)
3Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Introduction A file is a collection of records, each record
having one or more fields
The fields used to distinguish among the records are known as keys
File organization describes the way where the records are stored in a file
File organization is concerned with representing data records on an external storage media
![Page 4: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/4.jpg)
4Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
The file organization breaks down into two more aspects:
Directory—for collection of indices
File organization—for the physical organization of records
![Page 5: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/5.jpg)
5Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
File organization is the way records are organized on a physical storage
One of such organizations is sequential (ordered and unordered)
In this general framework, processing a query or updating a request would proceed in two steps:
The indices would be interrogated to determine the parts of the physical file to be searched
These parts of the physical file will be searched
![Page 6: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/6.jpg)
6Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Indexing An index, whether it is a book or a data file
index (in computer memory), is based on the basic concepts such as keys and reference fields
The index to a book provides a way to find a topic quickly
![Page 7: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/7.jpg)
7Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Indexing An index, whether it is a book or a data file
index (in computer memory), is based on the basic concepts such as keys and reference fields
The index to a book provides a way to find a topic quickly
![Page 8: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/8.jpg)
8Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Cylinder-Surface Indexing This is the simplest type of index organization.
It is useful only for the primary key index of a sequentially ordered file
In a sequentially ordered file, the physical sequence of records is ordered by the key, called the primary key
The cylinder-surface index consists of a cylinder index and several surface indexes
![Page 9: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/9.jpg)
9Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
For each cylinder, there is a surface index. If the disk has S usable surfaces, then each surface index has s entries. The total number of surface index entries is C.SEmp. No. Emp.
NameCylinder Surface
12345678
AboleeAnandAmitAmolRohit
SantoshSaurabh
Shila
11112222
11221122
![Page 10: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/10.jpg)
10Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Let there be two surfaces and two records stored per track. The file is organized sequentially on the field ‘Emp. name’
The cylinder index is shown in following tableEmp. No. Highest Key Value
1
2
Amol
Shila
![Page 11: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/11.jpg)
11Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
This method of maintaining a file and index is referred to as ISAM (indexed sequential access method)
It is the simplest file organization for single key files but not useful for multiple key files
![Page 12: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/12.jpg)
12Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Hashed Indexes The operations related to hashed indexes are
the same as those for hash tables
![Page 13: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/13.jpg)
13Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Multiway Search Trees
A multiway search tree is a tree of order m, where each node has utmost m children
Fig. shows way search tree:
d e p v
w x y z
rh j k l
b c
qia f g
m n o
s t u
![Page 14: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/14.jpg)
14Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
B-trees A B-tree is a balanced M-way tree. A node of the
tree contains many records or keys of records and pointers to children
To reduce disk access, the following points are applicable: Height is kept minimum
All leaves are kept at the same level
All other than leaves must have at least minimum number of children
![Page 15: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/15.jpg)
15Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
B-trees Definition: A B-tree of order m is an m-way tree with
the following properties: The number of keys in each internal node is one
less than the number of its non-empty children, and these keys partition the keys in the children in the fashion of the search tree
All leaves are on the same level All internal nodes except the root have utmost m
non-empty children and at least [m/2] non-empty children
The root is either a leaf node, or it has from two to m children
A leaf node contains no more than m − 1 keys
![Page 16: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/16.jpg)
16Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Node structure
Ptr1 Key1 Ptr2 Key2 Ptri Keyi …….. Keyn-1
Ptrn
X XXX<Key1 Keyi-1<X<Keyi X>Keyn
-1
![Page 17: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/17.jpg)
17Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Operations on B-tree Search a node
Insertion of a key into a B-tree
Deletion from a B-tree
![Page 18: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/18.jpg)
18Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
B+ Tree B+ trees are internal data structures That is, the nodes contain whatever information
is associated with the key as well as the key values
![Page 19: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/19.jpg)
19Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
B+ Tree Structure The structure of a B+ tree can be
understood from the following points: A B+ tree is in the form of a balanced tree
where every path from the root of the tree to a leaf of the tree is of the same length
Each non-leaf node (internal node) in the tree has between [n/2] and n children, where n is fixed
The pointer (Ptr) can point to either a file record or a bucket of pointers which each point to a file record
Searching time is less in B+ trees but has some problem of wasted space
![Page 20: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/20.jpg)
20Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Nodes of B+ Tree Internal node of a B+ tree with q −1 search
values
Leaf node of a B+ tree with q − 1 search values and q − 1 data pointers
![Page 21: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/21.jpg)
21Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Node structure
Ptr1 Key1 Ptr2 Key2 Ptri Keyi …….. Keyn-1
Ptrn
X XX
X<Key1 Keyi-1<X<Keyi X>Keyn-1
Tree PointerTree Pointer
Tree Pointer
![Page 22: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/22.jpg)
22Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Advantages of B+ trees over Indexed Sequential Access Method
A dynamic index structure that adjusts gracefully to inserts and deletes
A balanced tree
Leaf pages are not allocated sequentially. They are linked together through pointers (a doubly linked list)
![Page 23: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/23.jpg)
23Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Trie Tree One solution is to prune from the tree all the
branches that do not lead to any key
The resulting tree is called a trie (short for reTRIEvaL and pronounced ‘try’)
The number of steps needed to search a trie is proportional to the number of characters in a key
![Page 24: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/24.jpg)
24Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Splay Trees Splay trees are a form of a BST. A splay tree
maintains a balance without any explicit balance condition such as color
Instead, ‘splay operations’, which involve rotations, are performed within the tree every time an access is made
![Page 25: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/25.jpg)
25Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Splay Trees If we use a BST or even an AVL tree, then the records
of the newly admitted patient’s records will go to a leaf position, far from the root, and the access will be slower
Instead, we want to keep the records that are newly inserted or frequently accessed very near to the root, while the inactive records far off, in the leaf positions
However, we do not want to rebuild the tree into the desired shape. Instead, we need to make a tree a self-adjusting data structure that automatically changes its shape to bring the records closer to the root as they are used frequently, allowing inactive records to drift slowly down towards the leaves. Such trees are called as splay trees
![Page 26: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/26.jpg)
26Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Red-black Trees A red-black tree is a BST with one extra bit of
storage per node: its colour, which can either be red or black
![Page 27: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/27.jpg)
27Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Properties of red-black trees
Every node is either red or black All the external nodes (leaf nodes) are black The rank in a tree goes from zero upto the maximum
rank which occurs at the root. The rank of two consecutive nodes differs by utmost 1. Each leaf node has a rank 0
If a node is red, then both its children are black. In other words, consecutive red nodes are disallowed. This means every red node is followed by a black node; on the other hand, a black node may be followed by a black or a red node
This implies that utmost 50% of the nodes on any path from external node to root are red
![Page 28: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/28.jpg)
28Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Properties of red-black trees
The number of black nodes on any path from but not including the node x to leaf is called as black height of the node x, denoted as bh(x)
Every simple path from the root to a leaf contains the same number of black nodes
In addition, every simple path from a node to a descendent leaf contains the same number of black nodes
If a black node has a rank r, then its parent has the rank r + 1
If a red node has a rank r, then its parent will have the rank r as well
![Page 29: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/29.jpg)
29Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
KD-Trees A KD-tree is a data structure used in computer
science during orthogonal range searching, for instance, to find the set of points that fall into a given rectangle
![Page 30: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/30.jpg)
30Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
AA TreeAn AA tree is a balanced BST with the following
properties:
Every node is colored either red or black
The root is black
If a node is red, both of its children are black
Every path from a node to a null reference has the same number of black nodes
Left children may not be red
![Page 31: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/31.jpg)
31Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Advantages of AA Trees
They eliminate half the reconstructing cases
They simplify deletion by removing an annoying case If an internal node has only one child, that child
must be a red child We can always replace a node with the smallest
child in the right subtree; it will either be a leaf node or have a red child
AA tree, balanced BST, supports efficient operations, since most operations only have to traverse one or two root-to-leaf paths
![Page 32: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/32.jpg)
32Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Representing Balance Information in AA Tree
In each node of AA tree, we store a level. The level is defined by the following rules: If a node is a leaf, its level is one
If a node is red, its level is the level of its parent
If a node is black, its level is one less than the level of its parent
Here, the level is the number of left links to a null reference
![Page 33: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/33.jpg)
33Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Links in an AA tree A horizontal link is a connection between
a node and a child with equal levels The properties of such horizontal links are
as follows:
Horizontal links are right references
There cannot be two consecutives horizontal links
Nodes at level two or higher must have two children
If a node has no right horizontal link, its two children are at the same level
![Page 34: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/34.jpg)
34Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Summary
A node of a BST has only one key value entry stored in it. A multiway tree has many key values stored in each node and thus each node may have multiple subtrees
Different indexing techniques are used to search a record in O(1) time. The index is a pair of key value and address. It is an indirect addressing that imposes order on a file without rearranging the file
Indexing techniques are classified as Hashed indexing, Tree indexing, B-tree, B+ tree, Trie tree
Splay trees are self-adjusting trees
![Page 35: 13. Indexing MTrees - Data Structures using C++ by Varsha Patil](https://reader034.vdocuments.us/reader034/viewer/2022052418/5876d4b11a28ab1d238b548d/html5/thumbnails/35.jpg)
35Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
END Of
Chapter 13….!