copyright © 2009 rolta international, inc., all rights reserved oracle high availability - a case...
TRANSCRIPT
Copyright © 2009 Rolta International, Inc., All Rights Reserved
Oracle High Availability - A Case Study
Rama Balaji
Senior Oracle Consultant
• This presentation is a case study of a customer whose storage array failed during heavy transaction processing period.
• Methods followed to quickly restore the database.
• How the best practice configuration was arrived while maintaining at least two copies of production databases on-line.
• How Oracle’s HA components were effectively explored to achieve the final configuration with only few minutes of production downtime.
Overview
Agenda• Background
• Approach
• Initial Configuration
• Disaster to Full recovery
• High Availability Features
• Conclusion and best practices
• Question and Answer
• RMAN compressed backups
• Physical Standby using Data Guard
• ASM to non-ASM and vice versa
• Cascaded Destinations Standby
• 2 node RAC
• Failover and Switchover
• Flash Recovery Area and Flash Back Logs
Oracle Features Used
Background• E-commerce client
• Nature of business
• Initial cost effective business approach
• Business growth
• IT infrastructure
Approach
1. Storage array failure
2. Restored database from tape using RMAN
3. Set up the Physical Standby
4. Second Physical Standby using cascaded destination feature
5. Failover from production to standby
EventDatabase Availability
No Database
One
Two
Three
Two
Approach
6. 2-Node RAC with ASM on the new storage array as standby
7.Switched over from single node Db instance to 2 Node RAC as primary
8.Disconnected the second standby
9. 2 node RAC standby to 2 node RAC primary
10. Flashback Logs cleanup from FRA
EventDatabase Availability
Three
Three
Two
Two
No Downtime
Storage Array
2 node RAC + ASM
Initial configuration• Server - Linux Red Hat
x86_64• Database – Oracle 10.2.0.3• ASM – two diskgroups
DATA_DG – Database
FLASH_DG – flash_recovery_area
• Daily full database backup to disk- RMAN, and to tape using third party software
• Hourly archivelog backups to disk – RMAN, and to tape using third party software
• No RMAN catalog
RMAN• RMAN> show controlfile autobackup;
RMAN configuration parameters are:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
• RMAN> show controlfile autobackup format;
RMAN configuration parameters are:
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO
'/opt/oracle/admin/PROD/backup/cntl/%F';
• RUN
{
CONFIGURE RETENTION POLICY TO REDUNDANCY 1;
ALLOCATE CHANNEL ch1 DEVICE TYPE DISK MAXPIECESIZE 2G FORMAT '/opt/oracle/admin/PROD/backup/db/RCOMPRESSED_%U';
BACKUP as compressed backupset DATABASE plus archivelog channel ch1;
}
RMAN Backup script
• RUN
{
ALLOCATE CHANNEL ch1 DEVICE TYPE DISK FORMAT '/opt/oracle/admin/PROD/backup/archive/ARCH_%U';
BACKUP as compressed backupset archivelog all;
}
RMAN Archivelog backup script
Storage Array Failure
Storage Array
2 node RAC + ASM
• Storage admin determined that the SAN failure has caused loss of entire data including the database and backups.
• Validated that the RMAN backup on tape from previous night as well as archivelog files from previous hour were intact.
Restore using RMAN
Storage Array
2 node RAC + ASM Single DB Non- ASM
• Restored the tape backup
and all archivelog file backups from tape to internal disks on one of the RAC node
Restore from ASM to non-ASM• Restored the backup from tape to the original
backup location.
• Restored spfile from autobackup. RMAN> restore spfile from autobackup;
• Created pfile from spfile;
• Edited the following parameters in pfile control_files - Changed +DATA_DG
and +FLASH_DG to file system db_file_name_convert - Changed
+DATA_DG to file system log_file_name_convert - Changed
+DATA_DG and +FLASH_DG to file system
• startup nomount using pfile;
• Using RMAN nocatalog, restore controlfile from autobackup.
RMAN> restore controlfile from ‘/opt/oracle/admin/PROD/backup/cntl/c-662196802-20081230-0e’
Restore from ASM to non-ASM (Continued…)
• alter database mount;
• restore database;
• recover database;
• Created temporary tablespaces
• alter database open;
Restore from ASM to non-ASM (Continued…)
Physical Standby
Standby Database
Primary Database
PROD PRODS
Physical Standby Steps
• On the primary database (PROD) generated pfile.
Create pfile=’/tmp/initPROD.ora’ from spfile; and copy that file to the standby server.
• On the primary (PROD) backed up the current controlfile using RMAN.
RMAN> backup current controlfile for standby;
• RMAN> copy current controlfile for standby to '/tmp/sby_control01.ctl';
• Run a full RMAN backup on the primary (PROD).
RMAN> RUN
{
ALLOCATE CHANNEL ch1 DEVICE TYPE DISK MAXPIECESIZE 2G FORMAT '/home/oracle/backup_standby/RCOMPRESSED_%U';
BACKUP as compressed backupset DATABASE plus archivelog channel ch1;
}• Copied all backup files as well as controlfile
backup to the standby (PRODS). Backup files needs to be in the same location as the primary backup location or create a symbolic link.
Physical Standby Steps (Continued…)
• Edited the parameter file on the standby (PRODS).
Changed control_files parameter to point to the controlfile on the standby server.
Changed db_file_name_convert to use the new location on the standby server.
Changed log_file_name_convert to use the new location on the standby server.
• Start up the standby (PRODS)instance in nomount
Physical Standby Steps (Continued…)
• rman nocatalog RMAN> connect target
username/password@PROD RMAN> connect auxiliary / RMAN> RUN { ALLOCATE auxiliary CHANNEL ch1 DEVICE
TYPE DISK FORMAT '/home/oracle/backup_standby/RCOMPRESSED_%U';
duplicate target database for standby; } • Changed the following parameter on the
primary (PROD). SQL> ALTER SYSTEM SET
log_archive_dest_2= 'SERVICE=PRODS_XPT REOPEN=300'
Physical Standby Steps (Continued…)
• Changed the standby (PRODS) to Managed Recovery
alter database recover managed standby database disconnect from session;
• Issued the following command to make sure the archive log files are applied on the standby (PRODS)
Select sequence#, applied from v$archived_log order by sequence#;
Physical Standby Steps (Continued…)
Data Guard Cascaded Destinations
LOG_ARCHIVE_DEST_2='SERVICE=PRODS_XPT REOPEN=300';
Primary Database
Standby Database 1Data Guard
Data Guard Cascaded Destination
Standby Database 2
LOG_ARCHIVE_DEST_2='SERVICE=PRODC_XPT REOPEN=300';
PROD PRODS PRODC
Data Guard Failover
Primary Database
Data Guard Failover
Primary Database Data
Guard
Standby Database
PROD PRODS PRODC
Data Guard Failover Steps• On the primary (PROD)I did the
following. After users were logged out,
created a table called TEST. Create table test as select * from
dba_users; This was a last operation on the
primary database.
• On the primary (PROD) issued the following command couple of times.
Alter system switch logfile;
• On the primary (PROD), noted down the sequence#
SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /d01/oracle/flashback
Oldest online log sequence 6503
Next log sequence to archive 6504
Current log sequence 6504• On the standby (PRODS) issued the following
command to make sure standby is in maximum performance mode.
SQL> alter database set standby database to maximize performance;
Data Guard Failover Steps (Continued…)
• On the standby(PRODS), issued the following command
SQL> select thread#, low_sequence#, high_sequence# from v$archive_gap;
no rows selected
• On the standby (PRODS), check the last sequence#,
SQL> select max(sequence#) from v$archived_log;
MAX(SEQUENCE#)
-------------
6504
Data Guard Failover Steps (Continued…)
• On the standby (PRODS),
SQL> alter database recover managed standby database finish force;
Database altered
On the standby (PRODS),
SQL> alter database commit to switchover to primary; Database altered
• On the standby instance (PRODS),
SQL> shutdown immediate
ORA-01507: database not mounted
ORACLE instance shut down.
Data Guard Failover Steps (Continued…)
• SQL> startup
ORACLE instance started.
Total System Global Area 2147483648 bytes
Fixed Size 2074152 bytes
Variable Size 1056967128 bytes
Database Buffers 1073741824 bytes
Redo Buffers 14700544 bytes
Database mounted.
Database opened.
At this point standby is opened as primary.
Data Guard Failover Steps (Continued…)
• On the current primary (PRODS), issued the following query to make sure the table was brought over.
Select * from test; then drop the table.
• On the second standby (PRODC) instance verified the cascaded standby destination by querying v$archived_log.
• I had to bounce the cascaded standby instance (PRODC), and put it back in a recovery mode.
Data Guard Failover Steps (Continued…)
2-Node RAC as Standby
PROD1
Storage Array
2 node RAC + ASMStandby Database 2
Primary Database Data Guard
Standby Database 1
PRODS PRODC
PROD2
Data Guard Switchover Steps• Make sure only one instance on the RAC
system is mounted(PROD1).
• Make sure the FAL_SERVER and FAL_CLIENT parameters are set correctly on the primary as well as standby.
On the primary FAL_SERVER=standby instance FAL_CLIENT=primary instance
On the Physical Standby FAL_SERVER=primary instance FAL_CLIENT=standby instance
• On the primary (PRODS) issued the following
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------------------
TO STANDBY
• On the standby (PROD1) instance issued the following command
SQL> select switchover_status from v$database;
SWITCHOVER_STATUS
--------------------------------
TO PRIMARY
Data Guard Switchover Steps (Continued…)
• On the primary (PRODS) issued the following command
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY;
• SQL> ALTER SYSTEM SET log_archive_dest_state_2='DEFER';
• Shutdown the primary instance (PRODS)
SQL> shutdown immediate;
SQL> Startup nomount;
SQL> alter database mount standby database;
Data Guard Switchover Steps (Continued…)
• On the standby (PROD1) issued the following command
SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;
• On the standby (PROD1)
SQL>shutdown immediate;
SQL> startup open;
• SQL>alter system switch logfile;
• Make sure other instance(PROD2) comes up as well.
Data Guard Switchover Steps (Continued…)
Data Guard Switchover
Storage Array
2 node RAC + ASMPrimary Database 1
Standby Database 1 Data
Guard
PROD PRODS
Standby Database 2
PRODC
Storage Array
2 node RAC + ASMPrimary Database
Storage Array
2 node RAC + ASMStandby Database
Final Configuration
Real Application Clusters• RAC misconception.• Always register the database and instances to the cluster.
• You have to use “netca” to register the listener to crs in 10g.
• Srvctl is “case sensitive”. • Always use single parameter file for multiple RAC nodes.
• Client side load balancing.• Server side load balancing.
ASM Configuration
LUN 1 LUN 2
LUN 3 LUN 4
LUN 5 LUN 6
LUN 7 LUN 8
Storage Group 1 Storage Group 2
DATA FLASH
ASM Disk Groups
Data FilesControl FilesOnline log files
Archive log filesRMAN backupsMirrored cntl and logFlashback Logs
ASM disks
Flash Recovery Area• Show parameter db_recovery NAME TYPE VALUE------------------------------------ -----------
--------------------------- db_recovery_file_dest string +FLASH_DG db_recovery_file_dest_size big integer 190000M
• SQL> select space_used/(1024*1024*1024),space_limit/(1024*1024*1024) from v$recovery_file_dest;
SPACE_USED (in GB) SPACE_LIMIT (in GB) ---------------------------
---------------------------- 55.6249833 185.546875
• SQL> select * from v$flash_recovery_area_usage;
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 18.17 6.03 4691
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 11.81 11.73 1077
• ORA-19815: WARNING: db_recovery_file_dest_size of 53687091200 bytes is 87.63% used, and has 6642196992 remaining bytes available
Flash Recovery Area
• SQL> alter system set db_recovery_file_dest_size=55G scope=memory;
• SQL> select * from v$flash_recovery_area_usage;
CONTROLFILE 0 0 0ONLINELOG 0 0 0ARCHIVELOG 61.3 20.34 4692BACKUPPIECE 0 0 0IMAGECOPY 0 0 0FLASHBACKLOG 38.7 38.44 1046
Flash Recovery Area
• Issue the following RMAN commands
RMAN> crosscheck archivelog all;
RMAN> delete expired archivelog all;
• After “delete expired” command
CONTROLFILE 0 0 0
ONLINELOG 0 0 0
ARCHIVELOG 3.83 0 1274
BACKUPPIECE 0 0 0
IMAGECOPY 0 0 0
FLASHBACKLOG 51.23 50.93 1007
Cleanup of Archivelog files from Flash Recovery Area
Flashback Logs• SQL> show parameter db_flashback NAME TYPE
VALUE ------------------------------------
------------------------------------------ ------------------ db_flashback_retention_target integer
1440
• SQL> show parameter log_archive_min_succeed_dest;
NAME TYPE VALUE ------------------------------------ -----------
------------------------------ log_archive_min_succeed_dest integer
1
How to cleanup Flashback Logs from Flash Recovery Area• SQL> alter system set
log_archive_dest_1='LOCATION=/home/oracle/temp_archivelog' scope=memory;
• SQL> alter system set log_archive_dest_state_10=defer scope=memory;
• SQL> alter system set db_recovery_file_dest_size=55G scope=memory;
• Check the alert log for “deleted Oracle managed files” messages.
How to cleanup Flashback Logs from Flash Recovery Area• SQL> select * from v$flash_recovery_area_usage;
CONTROLFILE 0 0 0ONLINELOG 0 0 0ARCHIVELOG 92.19 0 1274BACKUPPIECE 0 0 0IMAGECOPY 0 0 0FLASHBACKLOG 7.3 0 6
• SQL> alter system set db_recovery_file_dest_size=190G scope=memory;
How to cleanup Flashback Logs from Flash Recovery Area
• SQL> alter system set log_archive_dest_1='' scope=memory;
• SQL> alter system set log_archive_dest_state_10=enable scope=memory
Conclusion and Best Practices• All configuration changes were performed
without any significant downtime outages to the production or standby databases.
• Various Oracle technologies, including RMAN, RAC, ASM, and Data Guard, were successfully utilized together to achieve the final high-availability solution.
• An awareness and understanding of the technologies available from Oracle, together with an innovative approach to implementation, were critical in building the environment while complying with the customer’s business needs.
World-wide Team of IT Professionals
Contact Information
Rama Balaji
(303) 985-2213
www.tusc.com
• Core Services– Oracle E-Business Suite
– Enterprise Performance Management (EPM)
– Oracle DBA, Database, and Infrastructure
– Business Intelligence and Data Warehousing
– Managed Services – remote support & hosting
– Oracle Fusion Middleware
– Oracle Hyperion
– Software
– Training