database tuning prerequisite cluster index b+tree indexing hash indexing isam (indexed sequential...

Database Tuning

PrerequisiteCluster Index

B+Tree IndexingHash Indexing

ISAM (indexed Sequential access)

Physical Data base Design and Tuning

• The creation of physical schema after the creation of conceptual schema and integrity constrains is physical data base design

• Refining the conceptual schema and hence modifying physical schema and indexing schemes to improve the performance of the underlying database is called tuning

• Work load. An estimate of the frequency and selectivity of a list of expected queries and update operations, with user performance preferences if any is called work load

Work load• Work load

– List of queries and their frequencies– List of updates and their frequencies– Performance goals (as expected by users) on the listed queries and updates

• Analysis on the work load– For each query Identify

• the relations which are accessed by the operation• Attributes which are retained in the (SELECT CLAUSE)• Attributes which have selection or join conditions on them in (WHERE CLAUSE)• Selectivity of the conditions

– For each update operation Identify • the relations which are accessed by the operation• Kind of update (DELETE, INSERT or UPDATE)• The fields that are modified by the UPDATE

Important decisions made during tuning

• Creation of Indices– What type of index to use– Should the index be clustered or not

• Alteration to the conceptual schema– Choosing alternative normalization schemes– Denormalization– Vertical Partitioning– Views

• Rewriting frequently run queries and transactions effectively

Guidelines with example

• Decide whether to index a particular query attribute or notIssues– When there is no index the time to search a tuple is O(n) – b is

the total number of tuples– Indexing an attribute does increase the query performance,

since the searched tuple can easily be traced without a sequential scan less time

• Caution– When there are more updates on the attribute than query

then adding an index may increase the overhead of update operations

Guideline 2

• Selection conditions decide Indexing techniques– An equality selection would be faster with a hash

indexing schemeSELECT E.DnoFROM Employee EWHERE E.ename = ‘Roy’– A range selection would be faster with a B+Tree indexingSELECT E.DnoFROM Employee EWHERE E.Age < 40

Guideline 3

• Join conditions decide the indexing Scheme– When there is a join condition on the attributes to

be indexed– Selecting a hash indexing scheme on the inner

relation attribute would be better than B+TreeSELECT E.ename, D.mgrFROM Employee E, Department DWHERE E.dno = D.dno A hash indexing on D.dno may be faster than B+Tree

Guideline 4• A very important decision regarding which attribute should be

chosen for clustering– At most one attribute can be used for clustering (i.e., physical ordering

of the records in the disk)– Only sequential scan on the attribute benefits from clustering since

there will be fewer disk accesses on the tuple– If a selectivity of a query is low or if the frequency of a particular query

is low, clustering the corresponding attributes may not be beneficial– If there is no sequencial access but only random accesses on the

attribute, then a hash indexing without clustering may be enough.– To choose the best attribute for clustering, consider all the queries,

updates, their frequencies and selectivity and choose the attributes which in frequent queries with range conditions with high selectivity.

Guideline 5

• Balancing the cost of index maintenance– Indexes may need to be dropped if they slow

down any of the frequent update operation– Indexes may also speed up update operations

when there is a WHERE condition in the update operation• UPDATE SET E.Income = 30000 WHERE E.eid = 123

Additonal Guideline 1Analyse the query evaluation plan of the underlying

database engine • E.g.,• SELECT E.ename, D.mgr FROM Employee E, Department D WHERE D.dname = ‘Toy’ AND E.dno = D.dno

Here both D.dname and D.dno may be indexed

Query Evaluation Plan guides index choices

Department D

• Department

σ Dname =‘Toy’

D.dno = E.dno

∏ E.ename

Employee E

the join is done on the resulting tuple and not on the database. So an

index on D.dno is unnecessary

All tuples with Dname = ‘Toy’ gets selected from the

database

Always the attributes in the lowest point of the plan are to be indexed. Others need not be indexed

Clustering IndicesClustered B+Tree UnClustered B+Trees

22 53

20 22 23 53 55 60

20 22 23 53 55 60

22 53

20 22 23 53 55 60

53 55 20 22 60 23

Physical Storage in Secondary Storage devices

If three records fit in a pageA range operation < 60 will make 2 access to the disk

If three records fit in a pageA range operation < 60 will make6 access to the disk once for every tuple

Selectivity decides clustering and non clustering index schemes

• When there are more than one query plan available, the selectivity of a condition decides the best plan and also the attribute to be clustered

• SELECT E.ename , D.Mgr FROM Employee E, Department D WHERE E.hobby = ‘stamps’ E.salary Between 10000 AND 30000 E. Dno = D.Dno

Query Plan 1• Is almost all are collecting stamps and are salaried in the given

range, then an index need not be present on the two attributes (High selectivity).

Department D

E.dno = D.dno

∏ E.ename

Employee E

Selection is done on the resulting tuple. So an index on E.hobby and

E.salary are unnecessaryσ E.Hobby =‘Stamps’

σ Salary Between(10000,30000)

A join is done using E.dno as external tuple. So a

clustered index on E.dno is necessary A hash index is desirable for

internal tuple attribute D.dno. It need not be clustered

because it is not accessed sequentially

Query Plan 2• If only a few are collecting stamps (Low selectivity)

Employee E

D.dno = E.dno

∏ E.ename

Department D

Selection is done on the resulting tuple. So an index on E.Dno and

E.salary are not necessary

σ E.Hobby =‘Stamps’

σ Salary Between(10000,30000)

Selection is done on the resulting tuple. So an index on E.Hobby is necessary and also a

clustered index

Impact of clustering on Cost of operation

Cost

Percentage of tuples retrieved (selectivity)

Unclustered index scheme


Clustered index scheme


Sequential scan

Cost

Cost


All schemes

Cost

The range for which unclustered index is

better than sequential scan

Co-clustered indexing• Co clustered index on manager income in department tuple and employee income in Employee

tuple• Each Department record is followed by its employees’ records in the secondary storage device.

Clustered B+Tree

22 53

20 22 23 53 55 60

20 22 23 53 55 60

Clustered Index on manager income (22 => 22000)

12 15 18 20 21 21 30 35 38 44 33 48

SELECT ALL E.ename FROM Employee E , Department D WHERE D.MgrIncome > 53 AND D.Dno = E.Dno

• Selecting all employees who work under managers with income > 53• Only looks through the index on manager’s income an directly accesses his employees records on the disk

Indexes on multiple attribute search keysThe following query returns all employees with age

between 20 and 30 and having a salary from 3000 to 5000

SELECT E.eidFROM Employee EWHERE E.age BETWEEN 20 and 30AND E.sal BETWEEN 3000 and 5000

A composite clustered B+ tree index on (age ,salary) would be a better choice than one on (salary,age) since more employees would be

having same age than the same salary

A composite clustered index on (age,sal) first sorts tuples according to age and then within each age group sorts them according to salary

Index only plans• If only index files are used to answer the query wthout

accessing the actual data from the table then it is called index only plan

• If a composite index happens to have all relevant attributes of a query (including the one after the SELECT clause) as a search key then no database access is needed.

SELECT E.eidFROM Employee EWHERE E.age BETWEEN 20 and 30AND E.sal BETWEEN 3000 and 5000

A composite index on (age, sal, eid) is defined then no database access is needed. All eids in the index file satisfying the condition can be

displayed as result

In an index only access like this one clustering of physical records is not necessary.

Index only plans

SELECT E.dno, COUNT(*)FROM Employee EGROUP BY E.dno

SELECT E.dno, MIN(E.sal)FROM Employee EGROUP BY E.dno

In the example if there is an index on E.dno we can just count the number of entries for each dno and give that as an answer for count(*). We need not access the secondary storage for tuples. This is also an example of index only plans

Since index only schemes are faster, composite indices may be defined for attributes which are just projected.

In the example salary is not used in conditions but having a composite (dno,sal) will save us from accessing the secondary device and just get away with the index filesSee for more examples from pages 472 an 473

in Gehrke

Data base Tuning Query Rewriting

SELECT MIN (E.age)FROM Employee EGROUP BY E.DnoHAVING E.Dno = 102

SELECT MIN (E.age)FROM Employee EWHERE E.Dno = 102

SELECT * INTO TEMPFROM Employee E, Department DWHERE E.Dno = D.dno AND D.mgrname = ‘robby’

SELECT T.Dno, Avg(T.Sal)FROM TEMP TGROUP BY T.Dno

SELECT E.Dno, Avg(E.Sal)FROM Employee E, Department DWHERE E.Dno = D.dno AND D.mgrname = ‘robby’GROUP BY E.Dno

Eliminating GROUP operation avoids expensive sorting

Combining steps to form a single query avoids creating an unnecessary table

Temp

Impact of concurrency

• The duration for which transactions hold a lock can affect performance significantly. Tuning transactions by writing to local variables and deferring database access can improve performance

• Replacing a transaction with several smaller transactions improve performance

• A careful partitioning of tuples in a relation and its associated indexes across a collection of disks can improve concurrent access

• If DBMS uses specialized locking protocols for tree indexes and sets fine – granularity locks concurrencu improves

Performance benchmarks• To assist users in choosing a DBMS that well suits their needs,

several performance benchmarks have been developed.• They should be portable, easy to understand, and scale naturally

to larger problem instances• They should measure peak performance as well as

price/performance ratios• Transaction processing Council was created to define

benchmarks for transaction processing and database systems• E.g. TCP-A TCP- B bench marks (Online transaction processing

benchmarks)• Wisconsin benchmark (Query benchmark)• 001 and 007 benchmarks (Object Database benchmarks)Read Section 16.8 in Gerhke for Tuning Conceptual Schema

database tuning prerequisite cluster index b+tree indexing hash indexing isam (indexed sequential...

Documents