oow 2012 oraclerealapplicationtesting
DESCRIPTION
oracle database Oow 2012 OracleRealApplicationTestingTRANSCRIPT
-
Oracle Real Application Testing Tips from Bank of America
Presenter : Keerti Prasad, Vice President, Bank of America
-
Disclaimer
All the views expressed in this presentation are the opinion of the author and do not necessarily reflect the views of Bank of America
-
Keerti Prasad Working in Merrill Lynch/Bank of America for 6+ years on varied applications 18+ years experience with Oracle database and its related technologies
Speaker Info
-
Agenda
Overview
Role of Real Application Testing in Migrations
Database Replay details
Issues encountered during Database Replay
Summary
Q&A
-
Overview What is Real Application Testing
Real Application Testing
Oracle tool to capture actual workload being performed on a database and then to replay it.
Modules Capture, Preprocess, Instantiate, Replay
Replay Components - DB Replay and SPA
-
6
Overview - Challenges Business often need new features which are available in latest
software Infrastructure needs to be on supported version / cheaper
alternative triggering migration Application or infrastructure changes need a rapid deployment for
break fix or version changes. Performance of the newer systems need to be at par or better Any Changes Need to be tested. Testing methods need to be fast, reliable and consistent We need to simulate actual production data/volumes on the testing
environment Real Application Testing can achieve all the above
-
7
Overview Factors Driving Changes
Application : Break-fixes, increasing data volumes, new requirements
OS Platforms : Server patching/upgrades, third party upgrades
Database : Upgrades, patching, parameter changes
Storage/Network : Datacenter Moves
-
8
Overview - Application/Database Info Database
Database size is ~5Tb
Workload can roughly be classified into 2 parts:
OLTP transactions is done during business hours
Batch processing is done after business hours with some overlap
SLA is 3ms and Rate of Redo generated varies between 3-5Gb/hour
Application -
Some are vendor supplied and some in house developed
-
9
Online Database Reporting Database
RAC: Online RAC: Reporting
Production Support
End Users
Adhoc Queries
OGG
Overview - Current Configuration
60+
-
Role of Real Application Testing in Migration High Level
Three major Migrations done within the span of 3years :
No. Date RAT Testing Change Details Issues
1 Sep 2010 Not Used Hardware, OS, Database, Application Migration/Upgrade/Release Many
2 May 2011 Not Used OS, Database, Application Upgrade/Release Many
3 Aug 2012 Extensive OS Upgrade, Database migration to 3-Node RAC, Application Upgrade/Release None
-
11
Role of Real Application Testing in Migration 1
Migration 1 (Oracle DB migration 10.2.0.4 on Solaris to 11.2.0.1 on Linux): Types of changes:
Hardware changes OS migration Solaris to Linux Oracle Upgrade 10.2.0.4 to 11.2.0.1 Application Upgrade/Release
Some problems we encountered after migration: Optimizer related issues Library cache and mutex issues High version counts
One-off patches were applied and a few parameters were changed
-
12
Role of Real Application Testing in Migration 2
Migration 2 (Oracle Upgrade 11.2.0.1 to 11.2.0.2): Types of changes:
OS upgrade Oracle Upgrade 11.2.0.1 to 11.2.0.2 Application Upgrade/Release
Some problems we encountered after migration: Memory resizing operations Issues caused due to cursor sharing set to force High version count issues
One-off patches were applied and a few parameters were changed Real Application Testing was in a rollout phase
-
13
Role of Real Application Testing in Migration 3 Migration 3 (Oracle migration Standalone to 3 Node RAC) :
Types of changes: Hardware OS upgrade Oracle GI Upgrade 11.2.0.2 to 11.2.0.3 Database migration from standalone to 3 Node RAC without
version change. Application Upgrade/Release
Real Application Testing was used extensively. Migration was accomplished with minor issues and performance of the
system was satisfactory.
-
14
Role of Real Application Testing in Migration Problems Exposed by DB Replay
Problems occurred during migration which were exposed by DB Replay
11g Child Cursor issue replay was going into a hang situation. High version counts from child cursor was the cause
Storage issue Replay was showing high i/o in some reports. Storage/hardware changes were made which fixed the issue.
Authentication issue Oracle database owner on OS is authenticated by third party tool. During replay, clients encountered errors. On investigation, it was found to be caused due to a bug in the authentication tool
-
DB Replay details - Typical Cycle
Typical DB Replay cycle: Workload is captured on source(production) for one week
daily between 9am-5pm
Source database is restored to test system. DB Replay done for first day Batches are run by application team in the night. This is repeated for a week. In the end DB Replay reports, AWR comparison reports and
batches load timings are analyzed
-
16
DB Replay details Impact on Production System during Capture
Impact during Capture and steps taken to circumvent it -
Load on server is minimal ~ 5% CPU
Only OLTP traffic is captured during business hours, since batches can be easily reproduced.
Captures were done for a typical workload cycle Mon-Fri business hours. They were retaken ONLY when the database went through release.
-
17
DB Replay details Capture Tips
We took a capture for a whole business day, but pre-processing ran into an issue since it creates objects in sysaux tablespace which grew a lot. So, a capture interval was set to create files of ~2Gb. Divergence increase minimal due to split capture.
To facilitate the above, script was created to start/end capture and restart a new one.
We needed to capture only the users that performed OLTP which is not easily reproducible, hence user filters were used.
-
18
DB Replay Details - Capture Report
Statistic Name Value % Total ---------------------------------------- ------------- --------- DB time (secs) 48407.45 78.86 Average Active Sessions 13.44 User calls captured 13141577 89.62 User calls captured with Errors 43961 Session logins 726 9.45 Transactions 271724 90.02 ---------------------------------------------------------------- Top Service/Module Captured Service Module % Activity Action % Action -------------- ------------------------ ---------- ------------------ ---------- APPCONN1.US.ML UNNAMED 55.22 UNNAMED 55.22 APPCONN2.US.ML UNNAMED 6.30 UNNAMED 6.30 APPCONN1.US.ML APP_SCHE.PKG_APPS_PFL.sp 5.62 Identify the Maste 5.62 PKG_APPS_PFL.SP_CALCULAT 1.62 Calculate EPI Alig 1.62 SQL*Plus 1.52 UNNAMED 1.52 -------------------------------------------------------------------------------- Workload Filters DB: SOURCE Snaps: 3008-3011 # Filter Name Type Attribute Value --- ------------------------ ------- ------------ -------------------------- 1 APPSCHEM_USER_162 INCLUDE USER APPSCHEM_USER 2 APPSCHEMAA2_USER_162 INCLUDE USER APPSCHEMAA2_USER ----------------------------------------------------------------------------
Major DB Time/ User Calls/ Transactions Captured
Filtered Users Captured
Services that were captured
-
19
DB Replay Details - Replay Details/Tips SCN was used for synchronization parameter Calibration tool used to calculate number of clients
Typically 15-25 clients were used for replay Remapping was achieved by defining database service name
for the clients
Think time auto correct was used. Script was used to run the replays. Scale up multiplier set to 1 as Real Application testing was
used as a tool for performance testing and not stress testing.
-
20
DB Replay Details - Replay Report
Replay Options ================================================================= Option Name Value Synchronization SCN Connect Time 0% Think Time 0% Think Time Auto Correct TRUE Number of WRC Clients 15 (15 Completed, 0 Running ) Replay Statistics ================================================================= Statistic Replay Capture DB Time 32811.942 s 37622.648 s AvG Active Sessions 7.43 10.45 User calls 12824968 12824968 Network Time 2925862.226 s . Think Time 766.466 s . =================================================================== Replay Divergence Summary Divergence Type Count % Total Session Failures During Replay 0 0.00 Errors No Longer Seen During Replay 79 0.00 New Errors Seen During Replay 532 0.00 Errors Mutated During Replay 0 0.00 DMLs with Different Number of Rows Modified 3382 0.03 SELECTs with Different Number of Rows Fetched 514558 4.01
Replay errors were minor
-
21
DB Replay Details Capture and Replay Compare
Information About AWR and Time Periods Interconnect Usage Main Performance Statistics
ADDM REPORT I/O to Data and Temp Files
Capture total
time(s)
Replay total
time(s)
Database Time 44009.17 34604
CPU Time 10567.09 10623.24
User I/O Wait Time 30913.41 23118.46
Cluster Wait Time N/A 0 impact (sec)
average active
sessions
percent active
sessions
"User I/O" wait Class
Capture 30608 8.39 69.31
Replay 22439 5.07 63.49
Global Cache Messaging
Capture N/A N/A N/A
Replay 849 0.19 2.4
PL/SQL Execution Capture 1615 0.44 3.66
Replay 2453 0.55 6.94
Capture Replay average 8Kb ping time (ms) Not RAC 667.56 average fast grant time (ms) Not RAC 0.24 average fast block transfer time (ms) Not RAC 0.58 total GC blocks transferred Not RAC 21228672 total interconnect usage (G bits) Not RAC 4001.231 interconnect throughput (Mb/s) Not RAC 926.56
Capture Replay average time of single block read (ms) 5.95 3.8 total number of single block reads 4537537 5144026 total physical read 59.515 G 72.303 G total physical write 7.201 G 9.861 G read rate 16.71M/sec 16.743M/s write rate 2.022M/sec 2.284M/s
Capture Replay Start Time 2012-07-16 11:03:08 2012-07-16 17:45:34 End Time 2012-07-16 12:03:51 2012-07-16 18:57:42 Duration 1.01 hours 1.2 hours
-
22
1st (CAPTURE) 2nd (REPLAY)
Event
Wait Class Waits Time (s)
Avg Time (ms)
Waits Time (s)
Avg Time (ms) Performance
db file sequential read User I/O 4,263,441 37,343 8.76 2,868,731 11,541 4.02 Better direct path read User I/O 78,285 3,551 45.41 32,702 613 18.75 Better log file sync Commit 205,154 3,090 15.06 75,012 213 2.84 Better log file parallel write System I/O 291,138 1,547 5.32 357,429 498 1.39 Better db file parallel write System I/O 257,180 1,081 4.2 809,049 688 0.85 Better
DB Replay Details Performance Comparison
AWR Performance Comparison for a Replay
-
23
1st (CAPTURE) 2nd (REPLAY)
Event
Wait Class Waits Time (s)
Avg Time (ms)
Waits Time (s) Avg Time (ms) Performance
db file sequential read User I/O 5,243,296 34,811 6.64 16,932,235 26,133 1.54 Avg.better but Total Wait high
Log file sync Commit 275,891 3,554 12.88 78,157 152 1.94 Better log file parallel write System I/O 368,717 1,698 4.6 593 1 0.4 Better Read by other session User I/O 702,920 1,616 2.3 204,975 428 2.09 Better
db file scattered read User I/O 2,204 50 23.11 19,227,535 12,628 0.66 Avg.better but Total Wait high
DB Replay Details Performance Comparison
AWR Performance Comparison for a Replay with I/O issues
-
24
DB Replay Details Test Cycles
Multiple cycles of OLTP captures and replays were done.
Captures taken were minimal and retaken only when source had gone through a release cycle.
Sometimes, after the restore, a guaranteed restore point was set on the target system, and the database was flash-backed to run multiple cycles with different settings to save restore time.
One off patches applied were first tested with a DB Replay cycle
-
Issues encountered during DB Replay
Issue 1 : Pre-processing made system tablespace become full Fix : Run the capture in batches.
Issue 2 : ORA-15590: encountered an incomplete workload capture file.
Fix : As per oracle this could be ignored.
Issue 3 : ORA-600 [17183] calling dbms_workload_replay process_capture
Fix : Applied DB Replay patch
-
26
Issues encountered during DB Replay (Contd.)
Issue 4 : Replay seemed to be hanging Fix : There were some minor changes in workload. Replay ran
fine after stats was gathered on the database.
Issue 5 : Batches after replay complained of constraint violations.
Fix : The violations were for sequences. Once these were bumped up, the replay ran successfully.
-
27
Issues encountered during DB Replay - Limitations
Only selects can be ramped up. DMLs can only be replayed once.
Some part of our application uses XA connections. They are not yet supported in Real Application testing.
CTAS operations are not captured
-
28
Value
Service Highlights
Comprehensive support offering with Solution Support Center
Focused engineering team Fast track solutions based on intimate
knowledge of customer environment and
business needs Proactive advice and knowledge transfer Performance and patch reviews Single point of contact for all technical
issues - Technical Lead Single point of contact for all escalations
Service Delivery Manager (SDM) Onsite assistance Proof of Concept and Test Case assistance
Maximize Availability
Mitigate risks
Optimize Performance
Reduced costs
Oracle ACS for Bank of America
-
Summary Lessons Learned - Traditional testing methods limitations:
They have to be setup each testing cycle They require a lot of resources and coordination is needed amongst all teams Baselines set can be questionable making it difficult to compare results,
remediate or tune the system
Last month our application in BofA migrated successfully to RAC platform without any issues unlike the previous migrations. Considering past migrations, this undoubtedly was due to a combination of traditional and Real Application Testing.
Reports from DB Replay gave conclusive evidence of system performance which boosted DBA and Application teams confidence on the migration.
-
30
Q & A
Questions ?
-
31
For general information on Advanced Customer Support Services, go to oracle.com/acs
or e-mail [email protected]
My Email: Keerti Prasad ([email protected])
Acknowledgements: Sridhar Kakumanu ([email protected]) Mohammad Awwal, Ph.D. ([email protected]) Jeremy Schirer ([email protected]) Bryant Bisnar ([email protected]) Stephen Ahn ([email protected])
Acknoledgements/Email