oracle perf tune - intro

8/12/2019 Oracle Perf Tune - Intro

1/16

Oracle Data Architecture

Introduction to Performance Tuning

Performance tuning focuses primarily on writing efficient SQL, allocating appropriate computing

Resources, and analyzing wait events and contention in the system. Unlike several other

features of Oracle database management, performance tuning isnt a cut and

dried subject with clear prescriptions and rules for every type of problem you may face. This is

one area where your technical knowledge must be used together with constant experimentation

and observation

A Systematic Approach to Performance Tuning

Its important to follow a systematic approach to tuning database performance. Performance

problems commonly come to the fore only after a large number of users start working on a new


2/16

production database. The system seems fine during development and rigorous testing, but it

slows down to a crawl when it goes to production. This could be because the application isnt

easily scalable for a number of reasons.

The seeds of the future performance potential of your database are planted when you design

your database. You need to know the nature of the applications the database is going to

support.

The more you understand your application, the better you can prepare for it by creating your

database

with the right configuration parameters. If major mistakes were made during the design stage

and the database is already built, youre left with tuning application code on one hand and the

database resources such as memory, CPU, and I/O on the other. Oracle suggests a specific

design approach with the following steps:

1. Design the application correctly.

2. Tune the application SQL code.

3. Tune memory.

4. Tune I/O.

5. Tune contention and other issues.

Reactive & Reactive Performance Tuning

Although the preceding performance tuning steps suggests that you can follow the sequence in

an orderly fashion, the reality is completely different. Performance tuning is an iterative process,

not a sequential one where you start at the top and end up with a fully tuned database as the

product.

As a DBA, you may be involved in a new project from the outset, when you have just the

functional requirements. In this case, you have an opportunity to be involved in the tuning effort

from the beginning stages of the application, a phase that is somewhat misleadingly dubbed

proactive tuning by some. Alternatively, you may come in after the application has already been

designed and implemented, and is in production. In this case, your performance efforts are

categorized as reactive performance tuning. What you can do to improve the performance of the


3/16

database depends on the stage at which you can have input, and on the nature of the

application itself.

Optimizing Oracle Query Processing

When a user starts a data-retrieval operation, the users SQL statement goes through several

sequential steps that together constitute query processing. One of the great benefits of using

the SQL language is that it isnt a procedural language in which you have to specify the steps to

be followed to achieve the statements goal. In other words, you dont have to state how to do

something; rather, you just state what you need from the database.

Query processing is the transformation of your SQL statement into an efficient execution plan

to return the requested data from the database. Query optimization is the process of choosing

the most efficient execution plan. The goal is to achieve the result with the least cost in terms of

resource usage. Resources include the I/O and CPU usage on the server where your database

is running. This also means that the goal is to reduce the total execution time of the query,

which is simply the sum of the execution times of all the component operations of the query.

This optimization of throughput may not be the same as minimizing response time. If you want

to minimize the time it takes to get the first n rows of a query instead of the entire output of the

query, the Optimizer may choose a different plan. If you choose to minimize the response time

for all the query data, you may also choose to parallelize the operation.

A users SQL statement goes through theparsing, optimizing, and execution stages. If the

SQL statement is a query, data has to be retrieved, so theres an additional fetch stage before

the SQL statement processing is complete.

Parsing

Parsing primarily consists of checking the syntax and semantics of the SQL statements. The

end product of the parse stage of query compilation is the creation of theparse tree, which

represents the querys structure.

The SQL statement is decomposed into a relational algebra query thats analyzed to see

Whether its syntactically correct. The query then undergoes semantic checking. Thedata

dictionary is consulted to ensure that the tables and the individual columns that are referenced


4/16

in the query do exist, as well as all the object privileges. In addition, the column types are

checked to ensure that the data matches the column definitions. The statement is normalized so

it can be processed more efficiently. The query is rejected if it is incorrectly formulated. Once

the parse tree passes all the syntactic and semantic checks, its considered a valid parse tree,

and its sent to the logical query plan generation stage. All these operations take place in the

library cache portion of the SGA.

Optimization

During the optimization phase, Oracle uses its optimizerwhich is a cost-based optimizer

(CBO)to choose the best access method for retrieving data for the tables and indexes

referred to in the query. Using statistics that you provide and any hints specified in the SQL

queries, the CBO produces an optimal execution plan for the SQL statement.

Execution Plan Generation Phase

During this phase, Oracle transforms the logical query plan into a physical query plan. The

Optimizer may be faced with a choice of several algorithms to resolve a query. It needs to

choose the most efficient algorithm to answer a query, and it needs to determine the most

efficient way to implement the operations. In addition to deciding on the best operational steps,

the Optimizer determines the order in which it will perform these steps. For example, the

Optimizer may decide that a join between table A and table B is called for. It then needs to

decide on the type of join and the order in which it will perform the table join.

The physical query or execution plan takes into account the following factors:

The various operations (for example, joins) to be performed during the query

The order in which the operations are performed

The algorithm to be used for performing each operation

The best way to retrieve data from disk or memory

The Optimizer may generate several valid physical query plans, all of which are potential

execution plans. The Optimizer then chooses among them by estimating the cost of each

possible physical plan based on the table and index statistics available to it, and selecting the

plan with the lowest estimated cost. This evaluation of the possible physical query plans is

called cost-based query optimization. The cost of executing a plan is directly proportional to the

amount of resources such as I/O, memory, and CPU necessary to execute the proposed plan.

The Optimizer passes this lowcost physical query plan to Oracles query execution engine.


5/16

An Example of Cost Optimization

Lets say you want to run the following query, which seeks to find all the supervisors who work

in Dallas. The query looks like this:

Now, you have several ways to arrive at the list of the supervisors. Lets consider three ways to

arrive at this list, and compute the cost of accessing the results in each of the three ways.

Make the following simplifying assumptions for the cost computations:

You can only read and write data one row at a time (in the real world, you do I/O at the block

level, not the row level).

The database writes each intermediate step to disk (again, this may not be the case in the

real world).

No indexes are on the tables.

The employee table has 2,000 rows.

The dept table has 40 rows. The number of supervisors is also 40 (one for each department).

Ten departments are in the city of Dallas.

In the following sections, youll see three different queries that retrieve the same data, but tha t

use different access methods. For each query, a crude cost is calculated, so you can compare

how the three queries stack up in terms of resource cost. The first query uses a Cartesian join.

Query 2: A Join of Two Tables

The second query uses a join of the employee and dept tables. First, join the employee and

dept tables on the dept_no column. From this join, select all rows where e.job=supervisor and

city=Dallas.

The following would be the total cost of performing the query:

Joining the employee and dept tables first requires a read of all the rows in both tables:

2,000 + 40 = 2,040


6/16

Creating the join of the employee and dept tables: 2,000 writes

Reading the join results costs: 2,000 reads

Total I/O cost: 2,040 + 2,000 + 2,000 = 6,040

Query 3: A Join of Reduced Relations

The third query also uses a join of the employee and dept tables, but not all the rows in the two

tablesonly selected rows from the two tables are joined. Heres how this query would proceed

to retrieve the needed data. First, read the employee table to get all supervisor rows. Next, read

the dept table to get all Dallas departments. Finally, join the rows you derived from the

employee and the dept tables.

The following would be the total cost of performing the query:

Reading the employee table to get the supervisor rows: 2,000 reads

Writing the supervisor rows derived in the previous step: 40 writes

Reading the dept table to get all Dallas departments: 40 reads

Writing the Dallas department rows derived from the previous step: 10 writes

Joining the supervisor rows and department rows derived in the previous steps of this query

execution: A total of 40 + 10 = 50 writes

Reading the join result from the previous step: 50 reads

Total I/O cost: 2,000 + 2(40) + 10 + 2(50) = 2,190

This example, simplified as it may be, shows you that Cartesian products are more expensive

than more restrictive joins. Even a selective join operation, the results show, is more expensive

than a selection operation. Although a join operation is in query 3, itsa join of two reduced

relations; the size of the join is much smaller than the join in query 2. Query optimization often

involves early selection (picking only some rows) and projection (picking only some columns)

operations to reduce the size of the resulting outputs or row sources.

Query Execution

During the final stage of query processing, the optimized query (the physical query plan that has

been selected) is executed. If its a SELECT statement, the rows are returned to the user. If its

an INSERT, UPDATE, or DELETE statement, the rows are modified. The SQL execution engine

takes the execution plan provided by the optimization phase and executes it.

Of the three steps involved in SQL statement processing, the optimization process is the


7/16

crucial one because it determines the all-important question of how fast your data will be

retrieved.

Understanding how the Optimizer works is at the heart of query optimization. Its important to

know what the common access methods, join methods, and join orders are in order to write

efficient SQL. The next section presents a detailed discussion of the all-powerful Oracle CBO.

Query Optimization and the Oracle

Cost-Based Optimizer

In most cases, you have multiple ways to execute a SQL query. You can get the same results

from doing a full table scan or using an index. You can also retrieve the same data by accessing

the tables and indexes in a different order. The job of the Optimizer is to find the optimal or best

plan to execute your DML statements such as SELECT, INSERT, UPDATE, and DELETE.

Oracle uses the CBO to help determine efficient methods to execute queries.

The CBO uses statistics on tables and indexes, the order of tables and columns in the SQL

statements, available indexes, and any user-supplied access hints to pick the most efficient way

to access them. The most efficient way, according to the CBO, is the least costly access

method, cost being defined in terms of the I/O and the CPU expended in retrieving the rows.

Accessing the necessary rows means Oracle reads the database blocks on the file system into

the buffer pool. The resulting I/O cost is the most expensive part of SQL statement execution

because it involves reading from the disk. You can examine these access paths by using tools

such as the EXPLAIN PLAN.

Choosing Your Optimization Mode

In older versions of Oracle, you had a choice between a rule-based and a cost-based optimizer.

In a rule-based approach. All the access paths were assigned a rank, and the path with the

lowest rank was chosen. The operations with a lower rank usually executed faster than those

with a higher rank. For example, a query that uses the ROWID to search for a row has a cost of

1. This is expected because identifying a row with the help of the ROWID, an Oracle pointer-like

mechanism, is the fastest way to locate a row. On the other hand, a query that uses a full table

scan has a cost of 19, the highest possible cost under rule-based optimization. The CBO

method almost always performs better than the older rule-based approach because, among

other things, it takes into account the latest statistics about the database objects.

Providing Statistics to the Optimizer


8/16

In Oracle Database 10g, the database itself automatically collects the necessary Optimizer

statistics. Every night, the database schedules a statistics collection job during the maintenance

window of the Oracle Scheduler. The maintenance window, by default, extends from 10 PM to 6

AM on weekdays and all weekend as well. The job is named GATHER_STATS_JOB, and runs

by default in every Oracle Database 10g database. You have the ability to disable the

GATHER_STATS_JOB if you wish.

You can get details about this default GATHER_STATS_JOB by querying the

DBA_SCHEDULER_JOBS view.

The GATHER_STATS_JOB collects statistics for all tables that either dont have Optimizer

statistics, or have stale (outdated) statistics. Oracle considers an objects statistics stale if more

than 10 percent of its data has changed since the last time it collected statistics for that object.

By default, Oracle monitors all DML changes such as inserts, updates, and deletes made to all

database objects. You can also view the information about these changes in the

DBA_TAB_MODIFICATIONS view. Based on this default object monitoring, Oracle decides

whether to collect new statistics for an object.

To check that the GATHER_STATS_JOB is indeed collecting statistics on a regular basis, use

the following:

Note the following points about the preceding output:


9/16

The job collects statistics during the maintenance window of the database, which is, by

default, scheduled between 10 PM and 6 AM during weekdays and all weekend.

The statistics are collected by the nightly GATHER_STATS_JOB run by the Scheduler.

If a table is created that day, the job uses all the rows of the table the first time it collects

statistics for the table.

The sampling percentage varies from less than 1 percent to 100 percent.

The size of the table and the percentage of the sample arent correlated.

The job doesnt collect statistics for all the tables each day.

If a tables data doesnt change after its created, the job never collects a second time.

Oracle determines the sample size for each object based on its internal algorithms; there is no

Standard sample size for all objects.

What exactly the statistics does

The number of rows in a table

The number of rows per database block

The average row length

The total number of database blocks in a table

The number of levels in each index

The number of leaf blocks in each index

The number of distinct values in each column of a table

Data distribution histograms

The number of distinct index keys

Cardinality (the number of columns with similar values for each column)

The minimum and maximum values for each column

System statistics, which include I/O characteristics of your system; and CPU statistics, which

include CPU speed and other related statistics

The key to the CBOs capability to pick the best possible query plan is its capability to correctly

estimate the cost of the individual operations of the query plan. These cost estimates are

derived from the knowledge about the I/O, CPU, and memory resources needed to execute

each operation based on the table and index statistics.

The database stores the Optimizer statistics that it collects in its data dictionary. The DBA_


10/16

TAB_STATISTICS table shows Optimizer statistics for all the tables in your database. You can

also see column statistics by querying the DBA_TAB_COL_STATISTICS view

As you can see, more than 22 million PERSON_ID numbers are in the PERSONNEL table.

However, there are only 7,281 distinct last names and 1,729 distinct first names. Of course, the

GENDER column has only two distinct values. The Optimizer takes into account these types of

information regarding your table data, before deciding on the best plan of execution for a SQL

statement that involves the tables columns.

Tip Optimizer statistics include both object (table and index) statistics, as well as system

statistics. Without

accurate system statistics, the Optimizer cant come up with valid cost estimates to evaluate

alternate execution plans.

What Does the Optimizer Do?

The CBO performs several intricate steps to arrive at the optimal execution plan for a users

query. The original SQL statement is most likely transformed, and the CBO evaluates

alternative access paths (for example, full-table or index-based scans). If table joins are

necessary, the Optimizer evaluates all possible join methods and join orders. The Optimizer

evaluates all the possibilities and arrives at the execution plan it deems the cheapest in terms of

total cost, which includes both I/O and CPU resource usage cost.

SQL Transformation

Oracle hardly ever executes your query in its original form. If the CBO determines that a

different SQL formulation will achieve the same results more efficiently, it transforms the

statement before executing it. A good example is where you submit a query with an OR


11/16

condition, and the CBO transforms it into a statement using UNION or UNION ALL. Or your

statement may include an index, but the CBO might transform the statement so it can do a full

table scan, which can be more efficient under some circumstances. In any case, its good to

remember that the query a user wishes to be executed may not be executed in the same form

by Oracle, but the querys results are still the same. Here are some common transformations

performed by the Oracle CBO:

Transform IN into OR statements.

Transform OR into UNION or UNION ALL statements.

Transform noncorrelated nested selects into more efficient joins.

Transform outer joins into more efficient inner joins.

Transform complex subqueries into joins, semi joins, and anti joins.

Star transformation for data warehouse tables based on thestar schema.

Transform BETWEEN to greater than or equal to and less than or equal to statements.

Choosing the Access Path

Oracle can often access the same data through different paths. For each query, the Optimizer

evaluates all the available paths and picks the least expensive one in terms of resource usage.

If joins are involved, then the join order and the join method are evaluated to finally arrive at the

best execution plan. Youll take a brief look at the steps the Optimizer goes through before

deciding on its choice of execution path.

Full Table Scans

Oracle scans the entire table during a full table scan. Oracle reads each block in the table

sequentially, so the full table scan can be efficient if the MULTI_BLOCK_READ_COUNT


12/16

initialization parameter is set high enough. However, for large tables, full table scans are

inefficient in general.

Table Access by ROWID

Accessing a table by ROWID retrieves rows using unique ROWIDs. ROWIDs in Oracle specify

the exact location in the data file and the data block where the row resides, so ROWID access is

the fastest way to retrieve a row in Oracle. Often, Oracle obtains the ROWID through an index

scan of the tables indexes. Using these ROWIDs, Oracle swiftly fetches the rows.

Index Scans

An index stores two things: the column value of the column on which the index is based and the

ROWID of the rows in the table that contain that column value. An index scan retrieves data

from an index using the values of the index columns. If the query requests only the indexed

column values, Oracle will return those values. If the query requests other columns outside the

indexed column, Oracle will use the ROWIDs to get the rows of the table.

Choosing the Join Method

When you need to access data thats in two or more tables, Oracle joins the tables based on a

common column. However, there are several ways to join the row sets returned from the

execution plan steps. For each statement, Oracle evaluates the best join method based on the

statistics and the type of unique or primary keys on the tables. After Oracle has evaluated the

join methods, the CBO picks the join method with the least cost.

The following are the common join methods used by the CBO:

Nested-loop join: A nested-loop join involves the designation of one table as the driving table

(also called the outer table) in the join loop. The other table in the join is called the inner

table. Oracle fetches all the rows of the inner table for every row in the driving table.

Hash join: When you join two tables, Oracle uses the smaller table to build a hash table on

the join key. Oracle then searches the larger table and returns the joined rows from the hash

table.

Sort-merge join: If both the tables in a join are sorted on the join key, the sorted lists are

merged together.


13/16

Choosing the Join Order

Once the Optimizer chooses the join method, it determines the order in which the tables are

joined. The goal of the Optimizer is always to join tables in such a way that the driving table

eliminates the largest number of rows. A query with four tables has a maximum of 4 factorial, or

24, possible ways in which the tables can be joined. Each such join order would lead to a

number of different execution plans, based on the available indexes and the access methods.

The search for an optimal join strategy could take a long time in a query with a large number of

tables, so Oracle depends on an adaptive search strategy to limit the time it takes to find the

best execution plan. An adaptive search strategy means that the time taken for optimization is

always a small percentage of the total time that is taken for execution of the query itself.

Drawbacks of the CBO

The CBO is systematic, but the Optimizer is not guaranteed to follow the same plan in similar

cases. However, the CBO isnt always perfect, and you need to watch out for the following:

Application developers may know more than the CBO when it comes to choosing the best

access path. Application developers know the needs of the users, of which the CBO is

completely unaware. This could lead to a situation where the CBO may be optimizing

throughput, when the users would rather have a quick set of results on their screen. By

using hints such as FIRST_ROWS_n, you can overcome this drawback in the CBO.

The CBO depends enormously on correct statistics gathering. If the statistics are absent or

outdated, the Optimizer can make poor decisions.

Providing Statistics to the CBO

The CBO can follow optimal execution paths only if it has detailed knowledge of the database

objects. In Oracle Database 10g, the recommended way to provide these statistics is by letting

the database automatically collect statistics for you. This is known as the Automatic Optimizer

Statistics Collection feature. You can also manually provide statistics to the


14/16

Optimizer with the DBMS_STATS package.

Using DBMS_STATS to Collect Statistics

Although letting the database automatically collect Optimizer statistics is the recommended

approach under Oracle Database 10g, you can still manually collect Optimizer statistics using

the DBMS_STATS package.

Collecting the Statistics

The DBMS_STATS package has several procedures that let you collect data at different levels.

The main data collection procedures for database table and index data are as follows:


15/16

Frequency of Statistics Collection

Theoretically, if your data is static, you may only need to collect statistics once. If your database

performs only a small amount of DML activities, you may collect statistics at relatively longer

intervals, say weekly or monthly. However, if your database objects go through constant changeon a daily basis, you need to schedule the statistics collection jobs much more frequently, say

daily or even more often. One of the best Oracle Database 10g new features is that you can let

the database decide the frequency of statistics collection. Remember that the database bases

its statistics collection on whether the statistics are fresh or stale. Thus, you can relax and let

the database be the arbiter of how often to collect statistics.

What Happens When You Dont Have Statistics

Youve seen how the Oracle database can automatically collect Optimizer statistics for you.

Youvealso learned how to use the DBMS_STATS package to collect the statistics manually

yourself. But what happens if you disable the automatic statistics collection process, or if you

dont collect statisticsin a timely fashion? Even with automatic statistics collection, under which

necessary statistics are collected on a nightly basis, you may have a situation where table data

is altered after the statistics collection process is over. In situations such as this, Oracle usesdata, such as the number of blocks taken up by the table data and other ancillary information, to

figure out the Optimizer execution plan. You can also use the initialization parameter

OPTIMIZER_DYNAMIC_SAMPLING to let Oracle estimate Optimizer statistics on the fly, when

no statistics exist for a table, or when the statistics exist but are too old or otherwise unreliable.

Of course, sampling statistics dynamically would mean that the compile time for the SQL


16/16

statement involved would be longer. Oracle smartly figures out if the increased compile time is

worth it when it encounters objects without statistics. If its worth it,Oracle will sample a portion

of the objects data blocks to estimate statistics. You need to set thevalue of the

OPTIMIZER_DYNAMIC_SAMPLING initialization parameter to 2 or higher to enable dynamic

sampling of all unanalyzed tables. Because the default for this parameter is 2, dynamic

sampling is turned on by default in your database. Thus, you need not spend sleepless nights

worrying about objects with missing or outdated statistics. In any case, if you adhere to Oracles

recommendation and use the Automatic Optimizer Statistics Collection feature, the

GATHER_STATS_JOB will automatically collect your databases statistics. The

GATHER_STATS_JOB is created at database creation time and is managed by the Oracle

Scheduler, which runs the job when the maintenance window is opened. By default, the

maintenance window opens every night from 10 PM to 6 AM, and all day on weekends. Oracle

will collect statistics for all objects that need them if you adopt the Automatic Optimizer Statistics

Collection feature. The feature is turned on by default in a new Oracle 10g database

or when you upgrade to the 10g release from an older version.

oracle perf tune - intro

Documents