database tuning prerequisite cluster index b+tree indexing hash indexing isam (indexed sequential...
TRANSCRIPT
Database Tuning
PrerequisiteCluster Index
B+Tree IndexingHash Indexing
ISAM (indexed Sequential access)
Physical Data base Design and Tuning
• The creation of physical schema after the creation of conceptual schema and integrity constrains is physical data base design
• Refining the conceptual schema and hence modifying physical schema and indexing schemes to improve the performance of the underlying database is called tuning
• Work load. An estimate of the frequency and selectivity of a list of expected queries and update operations, with user performance preferences if any is called work load
Work load• Work load
– List of queries and their frequencies– List of updates and their frequencies– Performance goals (as expected by users) on the listed queries and updates
• Analysis on the work load– For each query Identify
• the relations which are accessed by the operation• Attributes which are retained in the (SELECT CLAUSE)• Attributes which have selection or join conditions on them in (WHERE CLAUSE)• Selectivity of the conditions
– For each update operation Identify • the relations which are accessed by the operation• Kind of update (DELETE, INSERT or UPDATE)• The fields that are modified by the UPDATE
Important decisions made during tuning
• Creation of Indices– What type of index to use– Should the index be clustered or not
• Alteration to the conceptual schema– Choosing alternative normalization schemes– Denormalization– Vertical Partitioning– Views
• Rewriting frequently run queries and transactions effectively
Guidelines with example
• Decide whether to index a particular query attribute or notIssues– When there is no index the time to search a tuple is O(n) – b is
the total number of tuples– Indexing an attribute does increase the query performance,
since the searched tuple can easily be traced without a sequential scan less time
• Caution– When there are more updates on the attribute than query
then adding an index may increase the overhead of update operations
Guideline 2
• Selection conditions decide Indexing techniques– An equality selection would be faster with a hash
indexing schemeSELECT E.DnoFROM Employee EWHERE E.ename = ‘Roy’– A range selection would be faster with a B+Tree indexingSELECT E.DnoFROM Employee EWHERE E.Age < 40
Guideline 3
• Join conditions decide the indexing Scheme– When there is a join condition on the attributes to
be indexed– Selecting a hash indexing scheme on the inner
relation attribute would be better than B+TreeSELECT E.ename, D.mgrFROM Employee E, Department DWHERE E.dno = D.dno A hash indexing on D.dno may be faster than B+Tree
Guideline 4• A very important decision regarding which attribute should be
chosen for clustering– At most one attribute can be used for clustering (i.e., physical ordering
of the records in the disk)– Only sequential scan on the attribute benefits from clustering since
there will be fewer disk accesses on the tuple– If a selectivity of a query is low or if the frequency of a particular query
is low, clustering the corresponding attributes may not be beneficial– If there is no sequencial access but only random accesses on the
attribute, then a hash indexing without clustering may be enough.– To choose the best attribute for clustering, consider all the queries,
updates, their frequencies and selectivity and choose the attributes which in frequent queries with range conditions with high selectivity.
Guideline 5
• Balancing the cost of index maintenance– Indexes may need to be dropped if they slow
down any of the frequent update operation– Indexes may also speed up update operations
when there is a WHERE condition in the update operation• UPDATE SET E.Income = 30000 WHERE E.eid = 123
Additonal Guideline 1Analyse the query evaluation plan of the underlying
database engine • E.g.,• SELECT E.ename, D.mgr FROM Employee E, Department D WHERE D.dname = ‘Toy’ AND E.dno = D.dno
Here both D.dname and D.dno may be indexed
Query Evaluation Plan guides index choices
Department D
• Department
σ Dname =‘Toy’
D.dno = E.dno
∏ E.ename
Employee E
the join is done on the resulting tuple and not on the database. So an
index on D.dno is unnecessary
All tuples with Dname = ‘Toy’ gets selected from the
database
Always the attributes in the lowest point of the plan are to be indexed. Others need not be indexed
Clustering IndicesClustered B+Tree UnClustered B+Trees
22 53
20 22 23 53 55 60
20 22 23 53 55 60
22 53
20 22 23 53 55 60
53 55 20 22 60 23
Physical Storage in Secondary Storage devices
If three records fit in a pageA range operation < 60 will make 2 access to the disk
If three records fit in a pageA range operation < 60 will make6 access to the disk once for every tuple
Selectivity decides clustering and non clustering index schemes
• When there are more than one query plan available, the selectivity of a condition decides the best plan and also the attribute to be clustered
• SELECT E.ename , D.Mgr FROM Employee E, Department D WHERE E.hobby = ‘stamps’ E.salary Between 10000 AND 30000 E. Dno = D.Dno
Query Plan 1• Is almost all are collecting stamps and are salaried in the given
range, then an index need not be present on the two attributes (High selectivity).
Department D
E.dno = D.dno
∏ E.ename
Employee E
Selection is done on the resulting tuple. So an index on E.hobby and
E.salary are unnecessaryσ E.Hobby =‘Stamps’
σ Salary Between(10000,30000)
A join is done using E.dno as external tuple. So a
clustered index on E.dno is necessary A hash index is desirable for
internal tuple attribute D.dno. It need not be clustered
because it is not accessed sequentially
Query Plan 2• If only a few are collecting stamps (Low selectivity)
Employee E
D.dno = E.dno
∏ E.ename
Department D
Selection is done on the resulting tuple. So an index on E.Dno and
E.salary are not necessary
σ E.Hobby =‘Stamps’
σ Salary Between(10000,30000)
Selection is done on the resulting tuple. So an index on E.Hobby is necessary and also a
clustered index
Impact of clustering on Cost of operation
Cost
Percentage of tuples retrieved (selectivity)
Unclustered index scheme
Percentage of tuples retrieved (selectivity)
Clustered index scheme
Percentage of tuples retrieved (selectivity)
Sequential scan
Cost
Cost
Percentage of tuples retrieved (selectivity)
All schemes
Cost
The range for which unclustered index is
better than sequential scan
Co-clustered indexing• Co clustered index on manager income in department tuple and employee income in Employee
tuple• Each Department record is followed by its employees’ records in the secondary storage device.
Clustered B+Tree
22 53
20 22 23 53 55 60
20 22 23 53 55 60
Clustered Index on manager income (22 => 22000)
12 15 18 20 21 21 30 35 38 44 33 48
SELECT ALL E.ename FROM Employee E , Department D WHERE D.MgrIncome > 53 AND D.Dno = E.Dno
• Selecting all employees who work under managers with income > 53• Only looks through the index on manager’s income an directly accesses his employees records on the disk
Indexes on multiple attribute search keysThe following query returns all employees with age
between 20 and 30 and having a salary from 3000 to 5000
SELECT E.eidFROM Employee EWHERE E.age BETWEEN 20 and 30AND E.sal BETWEEN 3000 and 5000
A composite clustered B+ tree index on (age ,salary) would be a better choice than one on (salary,age) since more employees would be
having same age than the same salary
A composite clustered index on (age,sal) first sorts tuples according to age and then within each age group sorts them according to salary
Index only plans• If only index files are used to answer the query wthout
accessing the actual data from the table then it is called index only plan
• If a composite index happens to have all relevant attributes of a query (including the one after the SELECT clause) as a search key then no database access is needed.
SELECT E.eidFROM Employee EWHERE E.age BETWEEN 20 and 30AND E.sal BETWEEN 3000 and 5000
A composite index on (age, sal, eid) is defined then no database access is needed. All eids in the index file satisfying the condition can be
displayed as result
In an index only access like this one clustering of physical records is not necessary.
Index only plans
SELECT E.dno, COUNT(*)FROM Employee EGROUP BY E.dno
SELECT E.dno, MIN(E.sal)FROM Employee EGROUP BY E.dno
In the example if there is an index on E.dno we can just count the number of entries for each dno and give that as an answer for count(*). We need not access the secondary storage for tuples. This is also an example of index only plans
Since index only schemes are faster, composite indices may be defined for attributes which are just projected.
In the example salary is not used in conditions but having a composite (dno,sal) will save us from accessing the secondary device and just get away with the index filesSee for more examples from pages 472 an 473
in Gehrke
Data base Tuning Query Rewriting
SELECT MIN (E.age)FROM Employee EGROUP BY E.DnoHAVING E.Dno = 102
SELECT MIN (E.age)FROM Employee EWHERE E.Dno = 102
SELECT * INTO TEMPFROM Employee E, Department DWHERE E.Dno = D.dno AND D.mgrname = ‘robby’
SELECT T.Dno, Avg(T.Sal)FROM TEMP TGROUP BY T.Dno
SELECT E.Dno, Avg(E.Sal)FROM Employee E, Department DWHERE E.Dno = D.dno AND D.mgrname = ‘robby’GROUP BY E.Dno
Eliminating GROUP operation avoids expensive sorting
Combining steps to form a single query avoids creating an unnecessary table
Temp
Impact of concurrency
• The duration for which transactions hold a lock can affect performance significantly. Tuning transactions by writing to local variables and deferring database access can improve performance
• Replacing a transaction with several smaller transactions improve performance
• A careful partitioning of tuples in a relation and its associated indexes across a collection of disks can improve concurrent access
• If DBMS uses specialized locking protocols for tree indexes and sets fine – granularity locks concurrencu improves
Performance benchmarks• To assist users in choosing a DBMS that well suits their needs,
several performance benchmarks have been developed.• They should be portable, easy to understand, and scale naturally
to larger problem instances• They should measure peak performance as well as
price/performance ratios• Transaction processing Council was created to define
benchmarks for transaction processing and database systems• E.g. TCP-A TCP- B bench marks (Online transaction processing
benchmarks)• Wisconsin benchmark (Query benchmark)• 001 and 007 benchmarks (Object Database benchmarks)Read Section 16.8 in Gerhke for Tuning Conceptual Schema