oracle data guard 11g release 2 high availability to protect your business

50
1

Upload: sieger74

Post on 12-Nov-2015

15 views

Category:

Documents


1 download

DESCRIPTION

Oracle Data Guard 11g

TRANSCRIPT

  • *

  • Oracle Data Guard 11g Release 2:High Availability to Protect Your BusinessJoseph MeeksDirector, Product ManagementOracle USAMichael T. SmithPrincipal Member of Technical StaffOracle USAAris PrassinosDistinguished Member of Technical StaffMorphoTrak, SAFRAN Group

  • *

    Program

    Traditional approach to HAThe ultimate HA solution Active Data Guard 11.2ImplementationResources

  • *Buy Components That Never Fail

  • *Deploy HA Clusters That Never Fail

    (to compensate for components that fail)

  • *Hire People That Never Make Mistakes

    (to manage HA clusters that never fail)

  • *

  • *Three Production Examples(that never said never)

  • *Oracle - 90,000 UsersBeehive Office ApplicationsBeehive Oracles unified collaboration solutionEmail, instant messaging, conferencing, collaboration, calendarOracle Database 11.1.0.716 node RAC clusters98 Exadata storage cells / siteData GuardLocal standby for HAOffload read-only workloadOffload backupsRemote standby for DRDual purpose as test system

  • *Major Credit Card IssuerWebsite Authentication and AuthorizationSingle-Sign-On ApplicationInternal and external website authentication and authorization, including web access to personal accountsPrimary DatabaseOracle 10g - RACLocal standby database for HARemote Mirror Disaster RecoveryData GuardSYNCSAN mirroring - ASYNC

  • *MorphoTrakAris Prassinos - Distinguished Member of Technical StaffUS subsidiary of Sagem Scurit, SAFRAN Group

    Innovators in multi-modal Biometric Identification and VerificationFingerprint, palmprint, iris, facialPrintrak Biometrics Identification Solution

    Government and Commercial customers Law enforcement, border management, civil identificationSecure travel documents, e-passports, drivers licenses, smart cardsFacility / IT access control

    Recently chosen by the FBI as Biometric Provider for their Next Generation Identification Program http://www.sagem-securite.com/eng/site.php?spage=04010847

  • *MorphoTrakPrintrak Biometrics Identification SolutionGoal high availability and disaster recovery at minimal costOracle 11.1.0.7Oracle RAC, XML DB, SecureFiles, ASM15TB, 2MB/sec redo rateMixed OLTP read intensiveMorphTrak - Open World 2009 Session 307560

  • *

    Program

    Traditional approach to HAThe ultimate HA solution Active Data Guard 11.2ImplementationResources

  • *High Availability Attributes

    AttributeWhy Important1. Redundancy with isolation No single point of failure, failures stay put2. Zero data lossComplete protection, no recovery concerns3. Extreme performanceDeploy for any application4. Automatic failoverFast, predictable5. Full systems utilizationFast recovery, high return on investment6. Management simplicityReliable, reduced administrative costs

  • *ClusterProductionDatabase

    Redundancy with isolation Automatic failover Zero data loss Full systems utilization Extreme performance Management simplicity

  • *Cluster with Remote DR SitePrimaryDatabasePrimary SiteRemote SiteDisaster RecoveryASYNCSAN Mirroring

    Redundancy with isolation Automatic failover Zero data loss Full systems utilization Extreme performance Management simplicity

  • *Cluster with Remote DR SitePrimaryDatabasePrimary SiteRemote SiteDisaster RecoveryASYNCData GuardRemote StandbyDatabase

    Redundancy with isolation Automatic failover Zero data loss Full systems utilization Extreme performance Management simplicity

  • *Cluster with Data Guard Local and Remote StandbyPrimaryDatabasePrimary SiteRemote SiteDisaster RecoveryASYNCData GuardLocalStandbyDatabaseSYNCRemote StandbyDatabase

    Redundancy with isolation Automatic failover Zero data loss Full systems utilization Extreme performance Management simplicity

  • *Cluster with Data Guard Local and Remote StandbyPrimary SiteRemote SiteDisaster RecoveryASYNCData GuardPrimaryDatabaseRemote StandbyDatabase

    Redundancy with isolation Automatic failover Zero data loss Full systems utilization Extreme performance Management simplicity

  • *

    Program

    Traditional approach to HAThe ultimate HA solutionActive Data Guard 11.2ImplementationResources

  • *What is Active Data Guard?Data availability and data protection for the Oracle DatabaseUp to thirty standby databases in a single configurationPhysical standby used for queries, reports, test, or backupsPhysical StandbyDatabaseOpen Read-OnlyActive Standby SitePrimaryDatabasePrimary Site

  • *High Availability AttributesHow Does Active Data Guard Stack Up?

    AttributeWhy Important1. Redundancy with isolation No single point of failure, failures stay put2. Zero data lossComplete protection, no recovery concerns3. Extreme performanceDeploy for any application4. Automatic failoverFast, predictable5. Full systems utilizationFast recovery, high return on investment6. Management simplicityReliable, reduced administrative costs

  • *HA Attribute: Redundancy with IsolationData Guard Transport and ApplyOracle Data files

    Oracle InstancePrimary DatabaseOracle Data files

    Recovery dataOracle InstanceStandby DatabaseRecovery data

  • *HA Attribute: Redundancy with IsolationData IntegrityPrimary changes transmitted directly from SGAIsolates standby from I/O corruptionsSoftware code path on standby different than primaryIsolates standby from firmware and software errorsMultiple Oracle corruption detection checksData applied to the standby is logically and physically consistentStandby detects silent corruptions that occur at primaryHardware errors and data transfer faults that occur after Oracle receives acknowledgment of write-completeKnown-state of standby databaseOracle is open, ready for failover if needed

  • *StandbyRedo LogsRFSPrimary Online Redo LogsPrimaryDatabaseHA Attribute: Zero Data LossSynchronous redo transport

    SGA

    Redo Buffer

    ActiveStandbyDatabaseQueries, ReportsTesting & BackupsMRPUser TransactionsQueries, Updates, DDLMaximum Availability Protection Mode - Controlled by NET_TIMEOUT parameter of LOG_ARCHIVE_DEST_n - Default value 30 seconds in Data Guard 11g

  • *Automatic failoverDatabase downDesignated health-check conditionsOr at request of an applicationFailed primary automatically reinstated as standby databaseAll other standbys automatically synchronize with the new primaryHA Attribute: Automatic FailoverDatabaseObserverData Guard Fast-Start Failover

  • *HA Attribute: Automatic FailoverApplications Standby DatabaseDatabase Tier- Oracle Real Application ClustersApplication Tier - Oracle Application Server ClustersDatabase ServicesPrimary DatabasePrimary DatabaseStandby Database

  • *HA Attribute: Extreme PerformancePrimary DatabaseData Guard 11.2 SYNCRedo shipped in parallel with LGWR write to local online log fileLittle to no impact on response time when using SYNC in low latency network40% improvement over 11.1 on low latency LANnetwork latency

    Chart1

    7148

    71710

    72417

    No Sync

    11.1 Sync

    11.2 Sync

    Application Response Time in Milliseconds

    Sheet1

  • *HA Attribute: Extreme PerformanceStandby DatabaseData Guard 11.2 Redo ApplyAcross the board increase in apply ratesHigh query load on active standby does not impact applyRedo Apply is optimized to utilize Exadata I/O bandwidthImproved Apply Lag stat allows for finer grained monitoring of standby progress

    Chart1

    3080

    200615

    OLTP

    Batch

    Redo Apply Rates in MB/sec

    Sheet1

    OLTPBatch

    Trad. Hardware3080

    Exadata V2200615

    To resize chart data range, drag lower right corner of range.

  • *HA Attribute: Full Systems UtilizationActive Data Guard

    Real-time QueriesProductionDatabaseContinuous redo shipping, validation & apply Active Standby Database

  • *Standby is used as Production SystemMore scalableBetter performanceEliminate contention between read-wite and read-only workloadSimplify performance tuning2901,5302,610630Transactions / secAll servicesrun on primary databaseRead-only offloaded to standby

  • *Standby is used to Reduce Planned DowntimeDatabase rolling upgradesTransient Logical StandbyMigrations to ASM and/or RACTechnology refresh servers and storageWindows/Linux migrations *32bit/64bit migrations*Implement major database changes in rolling fashione.g. ASSM, initrans, blocksizeImplement new database features in rolling fashione.g. Advanced Compression, SecureFiles, Exadata Storage * see Metalink Note 413484.1

  • *Standby is used to Eliminate RiskData Guard Snapshot Standby Ideal for TestingReplay workload using Real Application Testing

  • *HA Attribute: Simple to ManageActive Data GuardAll data typesAll storage attributesAll DDLFewest moving partsBased on media recovery mature technologyHighest performanceGuaranteed EXACT replica of production

  • *HA Attribute: Simple to Manage

  • *

    Program

    Traditional approach to HAThe ultimate HA solutionActive Data Guard 11.2ImplementationResources

  • *Adding a Local Data Guard Standby Database

    PrimaryDatabasePrimary SiteRemote SiteDisaster RecoveryASYNCData GuardLocalStandbyDatabaseSYNCRemote StandbyDatabase

  • **Key ComponentsLocal physical standby Maximum AvailabilityActive Data GuardData Guard BrokerData Guard Observer and Fast-Start FailoverFlashback DatabaseFast Application Failover

  • **Implementation ConsiderationsData Guard Transport Tuning and ConfigurationLocal StandbyLow latency network (ideally less than 5ms)Maximum Availability Mode with SYNC transportSet NET_TIMEOUT to 10 seconds from default of 30Standby redo logs on fast storageRemote StandbyHigh network latencyASYNC transportPotentially increase log_buffer to ensure LNS reads from memory instead of disk (MetaLink Note 951152.1)Tune TCP socket buffer sizes and device queuesValue is a function of bandwidth and latencySee HA Best Practices

  • *Implementation ConsiderationsBasic ConfigurationFlashback DatabaseConfigure on all databases in the configurationAppropriately size Flash Recovery AreaFLASHBACK_RETENTION_PERIOD minimum of 60 minutesSee MetaLink Note 565535.1 for performance best practicesData Guard BrokerRequired for Fast-Start FailoverRequired for auto-restart of role specific database services (11.2)Required for Fast Application NotificationClose integration with RAC (ie apply instance failover)Simplified role transitions when using multiple standbysCheck MetaLink for Data Guard Broker bundled patchE.g. 10.2.0.4 bundle has backports of several Broker 11.1 features

  • *Implementation ConsiderationsFast-Start FailoverData Guard ObserverLocal standby is the Fast-Start Failover TargetDeploy Observer on 3rd host, independent of primary/standbySet FastStartFailoverThreshold10 seconds for single instance databases20 seconds plus time for node eviction for Oracle RACUse Oracle Enterprise Manager for Observer HAAuto restart of Observer on new host

  • **Implementation ConsiderationsConfiguring Client FailoverRole based services (11.2)Application service only runs on primary databaseAll primary and standby hostnames in ADDRESS_LIST / URLOutbound connect timeoutLimits amount of time spent waiting for connection to failed resourcesApplication notificationBreak clients out of TCP with Fast Application Notification eventsPre Data Guard 11.2 please refer to Client Failover Best Practiceshttp://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_ClientFailoverBestPractices.pdf

  • *The ResultAn HA architecture built on the assumption that eventually something will fail

  • *Ultimate High Availability

    PrimaryDatabasePrimary SiteRemote SiteDisaster RecoveryASYNCData GuardLocalStandbyDatabaseSYNCRemote StandbyDatabase

  • *Ultimate High AvailabilityPrimary SiteRemote SiteDisaster RecoveryASYNCData GuardPrimaryDatabaseRemote StandbyDatabase

    Redundancy with isolation Automatic failover Zero data loss Full systems utilization Extreme performance Management simplicity

  • *Start HerePrimary SiteRemote SiteDisaster RecoveryStandbyDatabaseRemote StandbyDatabaseASYNCData GuardPrimaryDatabaseSYNC

    Redundancy with isolation Automatic failover Zero data loss Full systems utilization Extreme performance Management simplicity

  • *Key Best Practices DocumentationHA Best Practices http://www.oracle.com/pls/db111/portal.portal_db?selected=14&frame=Active Data Guard and Redo Apply http://www.oracle.com/technology/deploy/availability/pdf/maa_wp_11gr1_activedataguard.pdfData Guard Redo Transport http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_DataGuardNetworkBestPractices.pdfData Guard Fast-Start Failover http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_FastStartFailoverBestPractices.pdfAutomating Client Failover (Data Guard 10g and 11gR1) http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_ClientFailoverBestPractices.pdfManaging Data Guard Configurations with Multiple Standby Databases http://www.oracle.com/technology/deploy/availability/pdf/maa10gr2multiplestandbybp.pdfUsing your Data Guard Standby for Real Application Testing http://www.oracle.com/technology/deploy/availability/pdf/oracle-openworld-2008/298770.pdfS307560 Active / Active Configurations with Oracle Active Data Guard http://www.oracle.com/technology/deploy/availability/pdf/oracle-openworld-2009/307560.pdf

  • *HA Sessions, Labs, & Demos by Oracle DevelopmentSunday, 11 October Hilton Hotel Imperial Ballroom B 3:45p Online Application UpgradeMonday, 12 October Marriott Hotel Golden Gate B111:30a Introducing Oracle GoldenGate ProductsMonday, 12 October Moscone South1:00p Oracles HA Vision: Whats New in 11.2, Room 1034:00p Database 11g: Performance Innovations, Room 1032:30p Oracle Streams: What's New in 11.2, Room 3015:30p Comparing Data Protection Solutions, Room 102Tuesday, 13 October Moscone South11:30a Oracle Streams: Replication Made Easy, Room 30811:30a Backup & Recovery on the Database Machine, Room 30711:30a Next-Generation Database Grid Overview, Room 103 1:00p Oracle Data Guard: Whats New in 11.2, Room 104 2:30p GoldenGate and Streams - The Future, Room 270 2:30p Backup & Recovery Best Practices, Room 104 2:30p Single-Instance RAC, Room 300 4:00p Enterprise Manager HA Best Practices, Room 303Tuesday, 13 October Marriott Hotel Golden Gate B111:30a GoldenGate Zero-Downtime Application Upgrades 1:00p GoldenGate Deep Dive: Architecture for Real-TimeWednesday, 14 October Moscone South10:15a Announcing OSB 10.3, Room 30011:45a Active Data Guard, Room 103 5:00p Exadata Storage & Database Machine, Room 104Thursday, 15 October Moscone South 9:00a Empowering Availability for Apps, Room 30012:00p Exadata Technical Deep Dive, Room 307 1:30p Zero-Risk DB Maintenance, Room 103Hands-on Labs Marriott Hotel Golden Gate B2Monday 11:30a-2:00p Oracle Active Data Guard, Parts I & IIThursday 9:00a-11:30a Oracle Active Data Guard, Parts I & IIDemos Moscone West DEMOGroundsMon & Tue 10:30a - 6:30p; Wed 9:15a - 5:15pMaximum Availability Architecture (MAA), W-045Oracle Streams: Replication & Advanced Queuing, W-043Oracle Active Data Guard, W-048Oracle Secure Backup, W-044Oracle Recovery Manager & Flashback, W-046Oracle GoldenGate, 3709

  • *For More Informationsearch.oracle.comororacle.com/hadata guard

  • *

  • *

    ****HA is simple, right.

    Just purchase high end servers and storage arrays with redundant internal components.

    But its very expensive, and its still not a guarantee.

    For example, its not unusual to be approached by customers who have experienced SAN failures that have caused extended outages and data loss for their mission critical databases.*So what next deploy HA clusters that never fail. That will do it. Well ..

    How many folks in here are running HA clusters of one kind or another either Oracle RAC or third party cold failover clusters.

    Keep your hands up. Now, if that cluster has NEVER failed, bring your hand down.

    One of the HA challenges with a cluster for Oracle is that at its foundation, an HA cluster runs on a single Oracle Database. Failures that can impact the database can impact the entire cluster. Data corruptions caused by faulty system components for example. Or more frequently, the challenge posed by this next approach to achieving high availability .*You have invested in redundant components, and in HA clusters, that never fail of course, and now you simply make sure you hire people that never make mistakes. This reminds me of one of our customers who operated a large Oracle RAC cluster and was very happy with their high availability until a storage admin deleted the files containing the online logs for one of the RAC instances. On a Unix/Linux platform, LGWR continues writing to the deleted log file because it maintains an open file descriptor which keeps a reference count in the node. When LGWR tried to switch to the next log group, it crashed the instance and eventually all RAC instances died attempting instance recovery.

    The customer thought they were ok, after all they had a DR site maintained by array based remote-mirroring. But the mirroring software had deleted the same file at the remote mirror, replicating the human error to the remote site.

    The poor customer had an extended outage and data loss while they rebuilt their database from an earlier backup. Oh, and then there was the case where admin tripped over the power cords for the array and brought their cluster down.*When it comes to achieving high availability for your most critical systems please do me a favor, never say never again. You cant achieve high availability by assuming you can buy components and train people such that a failure will never happen. In spite of best efforts, failures are guaranteed, its only a mater of time. *Lets look at 3 production examples that never say never.*Ravi Sharma wrote:

    Hi Joe,It was American Express and the application was their in-house developed Single-Sign-On application that serves their internal and external website authentication and authorization, including the My Card Account application that we are all familiar with. Was about 5 to 6 years ago.

    I forgot to mention that a few years ago, I had a similar situation at a customer and the stated needs were active-active real-time replication across 2000 miles for an important/mission critical system. After some analysis we found that what they were looking for was really high availability with ability to transparently / quickly failover in case of a problem. When the requirements were "dissected" the customer realized that Oracle RAC with local (using Data Guard) and remote replication (using EMC SRDF - customer used this as a standard otherwise they could have used Data Guard for this as well) was an ideal solution for them than replicating massive amounts of data across two geographically different sites. This addressed all the scenarios that they were looking to protect against.*Each of the previous examples share the same HA attributes lets identify them and then drill down into the Oracle has a solution that can get you there.**No restart of mid tier**Data Guard and Real Application Clusters are complementary and are ideally used together - Maximum Availability Architecture

    Real Application Clusters provides high availabilityRapid and automatic recovery from node failures or an instance crashLow cost, application transparent scale-out using commodity hardware

    Data Guard provides data availability, data protection and disaster protection, preventing downtime and data lossMaintains synchronized copies of primary databaseProtects against disasters, data corruption and user errorsDoes not require expensive HW/SW mirroring

    **

    Active Data Guard is a loosely coupled architecture where standby databases are isolated from failures that can occur at the primary database. Primary database changes are transmitted directly from memory, isolating the standby from I/O corruptions that occur at the primary. The software code-path executed on an active standby is different from that of the primary isolating it from firmware and software errors that impact the primary. Oracle corruption-detection checks insure that data is logically and physically consistent before it is applied to a standby database. An active standby also acts as a canary in a coal mine by detecting silent corruptions that can occur at the primary due to hardware errors (memory, cpu, disk, NIC) and data transfer faults, preventing them from impacting the standby database.

    *7LNS Network round tripCommit acknowledgementNet_timeout

    *High query load on active standby does not impact apply 11.2. Bug 7212222 fixed in 11.2.

    *Open read-write while maintaining data protectionIdeal QA systemIdeal for use with Real Application Testing

    Refer audience to Larrys talk for more details****[wh] To learn more about Active Data Guard with Oracle Database 11g and how it fits into Oracles Maximum Availability Architecture framework go to oracle.com/ha, or simple type active data guard into search.oracle.com

    Thank you.

    **