dba mysteries
Post on 08-Apr-2018
221 Views
Preview:
TRANSCRIPT
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 1/36
Ten Database Mysteries
Chris LawsonDatabase Specialists, Inc.
www.dbspecialists.com
clawson@dbspecialists.com
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 2/36
Focus of Presentation
� Explore some ³strange´ database problems that have
baffled some DBAs
� Most of the mysteries occurred on critical productionsystems, although some were on development systems
� ALL of the mysteries were eventually explained
� Depending on your personal experience, some of these
³mysteries´ will seem trivial or commonplace; otherswill indeed seem mysterious
� Most mysteries have a simple explanation
� Most mysteries have a simple fix
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 3/36
Why Spend Time on These
Database Mysteries?� Each DBA has a unique set of experiences and biases.
What one DBA thinks is obvious, another will not.
� An Oracle ³detective´ is part scientist, part artist.Many solutions require creativity, not just logic.
� A superior DBA will look for ways to ³stretch´ and
learn ways to handle difficult problems.
� Without working out difficult problems, you will notadvance as a DBA.
� You will be the ³hero´ if you encounter a mystery and
solve it; remember the solution--you may see it again!
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 4/36
AW
ord About Oracle Versions� This presentation was originally written in
1998
� Most of these mysteries involve Oracle 7
databases
� Although some of the mysteries might not
apply directly to Oracle 8i, they still offer insight into the problem-solving process
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 5/36
The Case of the
Berserk ApplicationClue #1: ³Big Phone Company´ 1997
� Using HPUX, Oracle 7.3.2.3
� Help desk application (Vantive) that connects to Oracledatabase suddenly goes berserk, creating thousands of
connections
� Program had worked normally for many months
� DBAs watch helplessly as CPU load driven from 1 to 50� As DBAs kill extra processes, more take their place
� Alert log and recent trace files show nothing unusual
� DBAs are united in accusing the application as the culprit
1
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 6/36
Berserk Application
(continued )Clue #2: ³Big Publisher Ltd.´ 1998
� Running Sun Solaris, Oracle 7.3.2.3
� Users complain that performance has degraded in recentmonths
� Manager states that ³something must be wrong with thenetwork´
� Application is CORIS , a document management/printing
application� DBA investigates. Discovers that time to connect in
SQL*Plus is 30-45 seconds, even though server load is low
� Connect time is bad whether remote (PC) or directly on server
� Server load (file I/O and CPU) is generally low
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 7/36
Berserk Application: Solution
� OTRACE is the culprit. It is active by default on
many 7.3 Oracle versions
� Excerpt from Oracle Corporation Alert:
³Pr oblems descr ibed her e can occur when Or acle
Tr ace is not configur ed and is widely enabled.´
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 8/36
Berserk Application: Solution
(continued )To Detect:
� Check directory ORACLE_HOME/rdbms/otrace: As
size of files process.dat and regid.dat approach 10mb, problems arise
-rw-r--r-- 1 oracle dba 3161424 Jun 05 09:43 process.dat
-rw-r--r-- 1 oracle dba 263808 Jun 05 09:43 regid.dat
� To correct: simply remove these two files, then issuecommand otrccref
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 9/36
Berserk Application: Solution
(continued )To Prevent:
� Add line to listener.ora for each database (after
ORACLE_HOME):
( ENVS=¶EPC_ DISABLED=TRUE¶)
� Set and export environment variable
EPC_ DISABLED=TRUE for all users.Put standard profile in /etc directory
� Restart all databases and restart listener
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 10/36
The Case of the Reluctant PatchBackground:
� To correct several bugs, decision is made to
upgrade from 7.3.2.2 to 7.3.2.3 (HPUX)
� Patch is obtained from Oracle and applied to test
server. DBA notes that patch ran very quickly and
runs again ³just to be sure´� Bug is now gone on test server
2
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 11/36
The Case of the Reluctant Patch
(continued )Problem:
� Patch is similarly applied to production server--
same operating system and version.
� Production application is tested, but bug is still
there!
� Another DBA reviews patch file, location, etc.All seem correct.
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 12/36
The Case of the Reluctant Patch:
Solution
� DBA happens to notice that upon SQL*Plus
startup, database is 7.3.2.2!
� The patch was really only applied on the second
run. This is apparently a quirk in the patch r eadme
file.
� The command what or acle (then grep for patch#)can be used to determine which patches are
applied
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 13/36
The Case of the Sleazy SQL� ³Big Publisher Ltd.´ runs an MRP system called
³AVALON,´ similar to Oracle Manufacturing.
Database stores inventory, part information,vendors, etc.
� Server is ATT3555, running NCR UNIX.
Database is Oracle 7.1.6
� Issue: Users report that certain common operations
are very slow
3
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 14/36
The Case of the Sleazy SQL
(continued )� DBA investigates and queries v$sqlarea using:
SELECT sql_text FROM v$sqlarea
WHERE disk_reads/executions > 1000;
� Query yields troublesome SQL statement, with
these stats:
DISK _ READS PER EXECUTION =5,000BUFFER _GETS PER EXECUTION =5,100
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 15/36
The Case of the Sleazy SQL
(continued )� Statement has been accidentally designed to ensure wor st
possible per for mance by making index usage impossible:SELECT * FROM ABC WHERE
NVL (COL _W) = NVL (:1) AND
NVL (COL _ X) = NVL (:2) AND
NVL (COL _Y) = NVL (:3) AND
NVL (COL _Z) = NVL (:4);
� TABLE ABC SIZE = 3mb, about 25,000 rows
� DBA requests developers to alter statement to eliminate NVL (COL_N) functions
� DBA advised that no resources available to make change
� Problem: If code can¶t be changed, what can be done to
improve performance?
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 16/36
The Case of the Sleazy SQL:
Solution� CACHE the table! For example:
alter table xyz cache;
� Normally, blocks from full-table scans are designated for
rapid age-out; otherwise, they would ³wipe-out´ the db
cache. Cache of table causes blocks to be treated ³normally.´
� Caching table disables rapid age-out of this table
� Logical reads will not be reduced, but disk reads approachzero!
� Note: DB _ BLOCK _ BUFFER S was slightly increased to
compensate for the cached table that now consumes a few
megabytes of database cache
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 17/36
The Case of the
Non-Optimal Optimizer
� A large software company based in ³Cedar Shores´ has
designed a large financials application. Program has been
tuned for Rule Based O ptimizer .
� The application runs very well, is a mature product, which
is used in thousands of companies around the world.
� Some users clamor for new features: more horns and
whistles
4
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 18/36
The Case of the
Non-Optimal Optimizer (continued )� The new development team, afraid to become obsolete,
wants to convert to Cost-Based Optimizer (CBO). They
also wisely consider that Oracle recommends using CBOon new projects.
� The older developers, now nearing peaceful retirement,
predict disaster if the database is switched to CBO,
because the execution plans will change.� Issue: How can Optimizer be selectively switched to CBO
without changing the code?
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 19/36
The Case of the Non-Optimal
Optimizer: Solution� Simply substitute a view having a ³hint´ for the table
needing CBO
For example:
rename DEPT to DEPT_ORIG
create view DEPT as select / * + ALL_ROWS */
* from DEPT_ORIG;
� Now, application will use the VIEW when it looks for
DEPT
� All queries using DEPT will use CBO
� Note: Upon renaming a table, the indexes and constraintswill ³move´ with the table; however, synonyms and grants
may need to be reset.
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 20/36
The Case of the
Forgetful Memory� A new internet-transaction application, ECX per t , and its
database have been installed on a Sun Ultra Enterprise Server
� Sun Solaris 2.5.1, Oracle 7.2.3� Application appears to run smoothly for several months,
although it occasionally creates large dump files
� Trace files appear occasionally with ORA-4030 ³Out of Process
Memory´ and recommends ³increase process memory quota´
� Server seems to hang occasionally. Server reboot fixes
� SysAdmin checks kernel parameters related to memory. All
correct and match other servers. Not using any large stored
procedures
� Problem: What is causing memory/hang problems?
5
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 21/36
The Case of the
Forgetful Memory: Solution� DBA checks /tmp (swap area on server) and notes 99%
consumed
� Investigation reveals that application occasionally goes berserk and consumes ENTIRE SWAP area with log
files
� Deletion of log files does not return disk space, since
application is still ³holding´ the files� Reboot of server cleaned up /tmp area, thereby
correcting problem
� Suggestion: If memory-related error messages exist,
check swap area first
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 22/36
The Reluctant Index Affair Background:
� DBA asked to analyze and tune Australia manufacturing
database. Database is running CBO. One particularly bothersome SQL statement is identified
� The WHERE condition is perfect for a new index,
because of its excellent selectivity
� Index is quickly created. Table is also analyzedProblem:
� Even though index is a ³perfect´ solution to the query, a
full table scan is used instead
6
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 23/36
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 24/36
Mystery of the
Hanging Database� At random intervals, a 7.3.2.3 database hangs. No tracefiles, and nothing unusual in the alert log.
� When problem occurs, no response to new connectionsrequests; over 1200 existing connections ³hang.´
� Oracle Support is alerted to priority 1 problem; expertsacross the world investigate for weeks
� DBA is using OEM Lock Manager tool and notices user who is blocking about 25 other users. The hang occurs
soon after.� Oracle Australia recommends checking indexes. This
suggestion led to the solution.
Problem:
� How did index problems hang database?
7
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 25/36
Mystery of the
Hanging Database: Solution� Application design flaw
� There are hundreds of foreign keys in the database; 99%
had indexes. A few did not, violating good design practice.When batch program began updates, locking increased
rapidly.
� Without FK index, updates on par ent table completely
block updates on child (vice versa for 7.1.6)
� Reference: Server Application Developers Guide
� Although not admitted as database ³bug,´ database was
overwhelmed by the locks
� Once indexes on all FK ¶s created, problems disappeared
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 26/36
The Case of the
Mysterious Package� Manufacturing application was installed on a Sun Ultra
3000 server. A small database was created for testing
purposes. Oracle version 7.2.3. Shared pool size about 60mb.
� At first, all went well. Then, seemingly randomly, when
the users began to try new features, they would receive a
³funny´ error message and the application failed.
� A trace file recommended increasing shared pool
Problem:
� How can application fail with such a sizable shared pool?
� Aside from massive increase in shared pool, what can be
done?
8
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 27/36
The Case of the
Mysterious Package: Solution� The application uses about 20 massive PL/SQL packages.
Some are 5x the SYS.STANDARD package. When a
package load is attempted, it will not fit in the shared pool.
� Memory-intensive packages should be ³pinned´ or ³kept´
in shared pool after database startup
EXECUTE SYS.dbms_shared_pool.keep ('OBJ ECT_ NA ME');
� But first, must find the ³big´ packages (will also listSYS.STANDARD):
SELECT owner, name, sharable_mem FROM v$db_object_cache
WHERE sharable_mem > 100000;
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 28/36
The Case of the
Mysterious Package: Solution(continued )� Example script to find ³big´ packages and generate
SQL script to ³pin´ them in memorySELECT 'EXECUTE SYS.dbms_shared_pool.keep('''||
owner||'.'||name||''');'
FROM v$db_object_cache
WHERE sharable_mem > 100000
AND type NOT IN ('VIEW', 'SYNONYM', 'TABLE')AND name NOT LIKE '%SHARED_POOL%'
AND owner IS NOT NULL;
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 29/36
The Case of the
Uncooperative Rollback � In mid-afternoon, DBA (running ³OEM Top Sessions´)
notices many users ³ACTIVE´ but showing 0 file I/O. Lock
Manager reveals one user performing big update blocking all.
� Culprit tracked down--agrees to be terminated. DBA
disconnects session.
� Locks are not released, but user is ³marked for kill.´
� Very little file I/O activity. Alert log shows very slow
switching of redo logs.
� DBA performs shutdown abort then startup. Database starts
up after 2 minutes. All is well.
Problem:
� Why did user not rollback and release locks?
9
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 30/36
The Case of the Uncooperative
Rollback: Solution� If session is terminated, speed of rollback is proportional
to init.ora parameter
CLEANUP _ ROLLBACK _ ENTRIES� If default value (20) is used, rollback of killed session
can take 50x time of original update. Alternatively,
shutdown abort/startup cleans up database much faster.
� Rationale: Parameter prevents rollback of one user fromhogging all the resources on a busy system
� Solution: Increase parameter to reduce rollback time
(since shutdown abort is usually not an option)
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 31/36
The Singular Case of the
Phantom Users� A manufacturing database in Sydney, Australia, needed
performance tuning. SQL tuning on US databases had
yielded good results.
� The table v$sqlarea was queried to find resource-intensive
SQL statements. Several commonly-run statements were
isolated. Performance was improved through index additions.
� Statistics were re-examined over the next 4 hours, in order to
confirm improvements.
� However, repeated looks at execution statistics showed no
change.
� DBA puzzles over enigma for several hours, then realizes
that NOTHING is WRONG! What did he finally realize?
10
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 32/36
The Singular Case of the
Phantom Users: Solution� Nothing is wrong because the users were
still asleep. It was only 5:00am in Sydney!
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 33/36
The Case of the Slow Physician
(Bonus Mystery)� Health application is experiencing slow run times. Analysis shows
following SQL statement causing 3000 disk reads
� COSIGN_VIEW is a join of 2 tables (DOCS + COSIGN),
joined on patient_id (indexed)
� Search criteria µDR. MCK ENZIE¶ is ver y selective; thus,
nested loop IS expected choice for optimizer, with DOCSas Driving table.
11
SELECT * from COSIG N _VIEW WHERE doctor_id = 'DR. MCKE NZIE'
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 34/36
The Case of the Slow Physician
(continued )� Even with index on DOCS(doctor _ id), optimizer (CBO)
insists on using hash-join, and r efuses to ever use index on
doctor_ id !
� Repeated analyze table commands do not correct
� Substituting query not using a view yields expected NL
result
Problem:
� Why does using the view cause optimizer to make the
³wrong´ choice?
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 35/36
The Case of the Slow Physician:
Solution� Everything seemed to point to a problem with the view,
because all worked normally as long as the view was
excluded
� Finally, DBA compared view definition (in OEM Schema
Manager) to definition seen using ³describe table´ syntax.
The columns did NOT match!
� Examining the object-create script revealed that the view
switched column names, so that column DOC_ID in theview did NOT match DOC_ID in the table!
� Once the correct column was indexed, a Nested-Loop Join
was selected by the optimizer
8/7/2019 DBA Mysteries
http://slidepdf.com/reader/full/dba-mysteries 36/36
Contact InformationChris Lawson
clawson@dbspecialists.com
http://www.dbspecialists.com
Database Specialists, Inc.388 Market Street, Suite 400
San Francisco, CA 94111
top related