data-guard-performance.pdf

39
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle Active Data Guard Performance Geovanni Vega Velasquez Database Brand Manager Oracle Mexico

Upload: anabel-reyes

Post on 10-Nov-2015

5 views

Category:

Documents


1 download

TRANSCRIPT

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1

    Oracle Active Data Guard Performance Geovanni Vega Velasquez

    Database Brand Manager Oracle Mexico

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 2

    Note to viewer

    These slides provide various aspects of performance data for

    Data Guard and Active Data Guard we are in the process of

    updating for Oracle Database 12c.

    It can be shared with customers, but is not intended to be a

    canned presentation ready to go in its entirety

    It provides SCs data that can be used to substantiate Data Guard

    performance or to provide focused answers to particular concerns

    that may be expressed by customers.

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 3

    Note to viewer

    See this FAQ for more customer and sales collateral

    http://database.us.oracle.com/pls/htmldb/f?p=301:75:1014514610433

    66::::P75_ID,P75_AREAID:21704,2

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 4

    Agenda Data Guard Performance

    Failover and Switchover Timings

    SYNC Transport Performance

    ASYNC Transport Performance

    Primary Performance with Multiple Standby Databases

    Redo Transport Compression

    Standby Apply Performance

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 5

    Data Guard 12.1 Example - Faster Failover

    # of database

    sessions on primary

    and standby

    # of database sessions on primary and standby

    43 seconds 2,000 sessions

    on both primary

    and standby

    48 seconds 2,000 sessions

    on both primary

    and standby

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 6

    Data Guard 12.1 Example Faster Switchover

    # of database

    sessions on

    primary and

    standby

    # of database sessions on primary and standby

    83 seconds 500 sessions on

    both primary and

    standby

    72 seconds 1,000 sessions on

    both primary and

    standby

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 7

    Agenda Data Guard Performance

    Failover and Switchover Timings

    SYNC Transport Performance

    ASYNC Transport Performance

    Primary Performance with Multiple Standby Databases

    Redo Transport Compression

    Standby Apply Performance

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 8

    Synchronous Redo Transport

    Primary database performance is impacted by the total round-trip time for

    acknowledgement to be received from the standby database

    Data Guard NSS process transmits Redo to the standby directly from log buffer, in

    parallel with local log file write

    Standby receives redo, writes to a standby redo log file (SRL), then returns ACK

    Primary receives standby ACK, then acknowledges commit success to app

    The following performance tests show the impact of SYNC transport on

    primary database using various workloads and latencies

    In all cases, transport was able to keep pace with generation no lag

    We are working on test data for Fast Sync (SYNCNOAFFIRM) in Oracle

    Database 12c (same process as above, but standby acks primary as soon as

    redo is received in memory it does not wait for SRL write.

    Zero Data Loss

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 9

    Test 1) Synchronous Redo Transport

    Workload:

    Random small inserts (OLTP) to 9 tables with 787 commits per second

    132 K redo size, 1368 logical reads, 692 block changes per transaction

    Sun Fire X4800 M2 (Exadata X2-8)

    1 TB RAM, 64 Cores, Oracle Database 11.2.0.3, Oracle Linux

    InfiniBand, seven Exadata cells, Exadata Software 11.2.3.2

    Exadata Smart Flash, Smart Flash Logging and Write-Back flash

    enabled provided significant gains

    OLTP with Random Small Insert < 1ms RTT Network Latency

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 10

    Test 1) Synchronous Redo Transport

    Local standby,

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 11

    Test 2) Synchronous Redo Transport

    Exadata X2-8, 2-node RAC database

    smart flash logging, smart write back flash

    Swingbench OLTP workload

    Random DMLs, 1 ms think time, 400 users, 6000+ transactions per

    second, 30MB/s peak redo rate (different from test 2)

    Transaction profile

    5K redo size, 120 logical reads, 30 block changes per transaction

    1 and 5ms RTT network latency

    Swingbench OLTP Workload with Metro-Area Network Latency

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 12

    Test 2) Synchronous Redo Transport

    30 MB/s redo

    3% impact at

    1ms RTT

    5% impact at

    5ms RTT

    Swingbench OLTP Workload with Metro-Area Network Latency

    0

    1000

    2000

    3000

    4000

    5000

    6000

    Swingbench OLTP

    Baseline

    No Data Guard

    Data Guard SYNC

    1ms RTT

    Network Latency

    Data Guard SYNC

    5ms RTT

    Network Latency

    6363 tps

    6151 tps

    6077 tps

    Transactions

    per/second

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 13

    Test 3) Synchronous Redo Transport

    Exadata X2-8, 2-node RAC database

    smart flash logging, smart write back flash

    Large insert OLTP workload

    180+ transactions per second, 83MB/s peak redo rate, random tables

    Transaction profile

    440K redo size, 6000 logical reads, 2100 block changes per transaction

    1, 2 and 5ms RTT network latency

    Large Insert OLTP Workload with Metro-Area Network Latency

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 14

    0

    50

    100

    150

    200

    Test 3) Synchronous Redo Transport

    83 MB/s redo

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 15

    Test 4) Synchronous Redo Transport

    Exadata X2-8, 2-node RAC database

    smart flash logging, smart write back flash

    Mixed workload with high TPS

    Swingbench plus large insert workloads

    26000+ txn per second and 112 MB/sec peak redo rate

    Transaction profile

    4K redo size, 51 logical reads, 22 block changes per transaction

    1, 2 and 5ms RTT network latency

    Mixed OLTP workload with Metro-Area Network Latency

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 16

    No Sync 0ms 2ms 5ms 10ms 20ms

    Txns/s 29,496 28,751 27,995 27,581 26,860 26,206

    Redo Rate (MB/sec) 116 112 109 107 104 102

    % Workload 100% 97% 95% 94% 91% 89%

    0

    5,000

    10,000

    15,000

    20,000

    25,000

    30,000

    35,000

    Txn

    Rate

    R

    ed

    o R

    ate

    Test 4) Synchronous Redo Transport Mixed OLTP workload with Metro-Area Network Latency

    Swingbench plus large insert

    112 MB/s redo

    3% impact at < 1ms RTT

    5% impact at 2ms RTT

    6% impact at 5ms RTT

    Note: 0ms latency on graph represents values falling in

    the range

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 17

    Additional SYNC Configuration Details

    No system bottlenecks (CPU, IO or memory) were encountered during

    any of the test runs

    Primary and standby databases had 4GB online redo logs

    Log buffer was set to the maximum of 256MB

    OS max TCP socket buffer size set to 128MB on both primary and standby

    Oracle Net configured on both sides to send and receive 128MB with an

    SDU for 32k

    Redo is being shipped over a 10GigE network between the two systems.

    Approximately 8-12 checkpoints/log switches are occurring per run

    For the Previous Series of Synchronous Transport Tests

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 18

    Customer References for SYNC Transport

    Fannie Mae Case Study that includes performance data

    Other SYNC references

    Amazon

    Intel

    MorphoTrak prior biometrics division of Motorola, case study, podcast, presentation

    Enterprise Holdings

    Discover Financial Services, podcast, presentation

    Paychex

    VocaLink

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 19

    Synchronous Redo Transport

    Redo rates achieved are influenced by network latency, redo-write

    size, and commit concurrency in a dynamic relationship with each other that will vary for every environment and application

    Test results illustrate how an example workload can scale with minimal

    impact to primary database performance

    Actual mileage will vary with each application and environment.

    Oracle recommends customers conduct their own tests, using their

    workload and environment. Oracle tests are not a substitute.

    Caveat that Applies to ALL SYNC Performance Comparisons

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 20

    Agenda

    Failover and Switchover Timings

    SYNC Transport Performance

    ASYNC Transport Performance

    Primary Performance with Multiple Standby Databases

    Redo Transport Compression

    Standby Apply Performance

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 21

    Asynchronous Redo Transport

    ASYNC does not wait for primary acknowledgement

    A Data Guard NSA process transmits directly from log buffer in parallel with

    local log file write

    NSA reads from disk (online redo log file) if log buffer is recycled before redo

    transmission is completed

    ASYNC has minimal impact on primary database performance

    Network latency has little, if any, impact on transport throughput

    Uses Data Guard 11g streaming protocol & correctly sized TCP send/receive buffers

    Performance tests are useful to characterize max redo volume that ASYNC is

    able to support without transport lag

    Goal is to ship redo as fast as generated without impacting primary performance

    Near Zero Data Loss

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 22

    Asynchronous Test Configuration

    100GB online redo logs

    Log buffer set to the maximum of 256MB

    OS max TCP socket buffer size set to 128MB on primary and standby

    Oracle Net configured on both sides to send and receive 128MB

    Read buffer size set to 256 (_log_read_buffer_size=256) and archive buffers

    set to 256 (_log_archive_buffers=256) on primary and standby

    Redo is shipped over the IB network between primary and standby nodes

    (insures that transport is not bandwidth constrained)

    Near-zero network latency, approximate throughput of 1200MB/sec.

    Details

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 23

    ASYNC Redo Transport Performance Test

    0

    100

    200

    300

    400

    500

    600

    Single Instance

    Data Guard ASYNC transport can sustain very

    high rates

    484 MB/sec on single node

    Zero transport lag

    Add RAC nodes to scale transport performance

    Each node generates its own redo thread and has a

    dedicated Data Guard transport process

    Performance will scale as nodes are added assuming

    adequate CPU, I/O, and network resources

    A 10GigE NIC on standby receives data at

    maximum of 1.2 GB/second

    Standby can be configured to receive redo across two

    or more instances

    Oracle Database 11.2.

    Redo

    Transport

    MB/sec

    484

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 24

    Data Guard 11g Streaming Network Protocol

    Streaming protocol is new with Data Guard 11g

    Test measured throughput with 0 100ms RTT

    ASYNC tuning best practices

    Set correct TCP send/receive buffer size = 3 x

    BDP (bandwidth delay product)

    BDP = bandwidth x round-trip network latency

    Increase log buffer size if needed to keep NSA

    process reading from memory

    See support note 951152.1

    X$LOGBUF_READHIST to determine buffer hit rate

    High Network Latency has Negligible Impact on Network Throughput

    0

    5

    10

    15

    20

    25

    30

    35

    ASYNC

    0ms

    25ms

    50ms

    100ms

    Redo

    Transport

    Rate MB/sec

    Network

    Latency

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 25

    Agenda

    Failover and Switchover Timings

    SYNC Transport Performance

    ASYNC Transport Performance

    Primary Performance with Multiple Standby Databases

    Redo Transport Compression

    Standby Apply Performance

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 26

    Multi-Standby Configuration

    A growing number of customers use multi-standby Data

    Guard configurations.

    Additional standbys are used for:

    Local zero data loss HA failover with remote DR

    Rolling maintenance to reduce planned downtime

    Offloading backups, reporting, and recovery from primary

    Reader farms scale read-only performance

    This leads to the question: How is primary database

    performance affected as the number of remote transport

    destinations increases?

    Primary - A Local Standby - B

    Remote

    Standby - C

    SYNC

    ASYNC

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 27

    Redo Transport in Multi-Standby Configuration

    97.0%

    98.0%

    99.0%

    100.0%

    101.0%

    102.0%

    103.0%

    104.0%

    105.0%

    Primary Performance Impact: 14 Asynchronous Transport Destinations

    92.0%

    94.0%

    96.0%

    98.0%

    100.0%

    102.0%

    Increase in CPU (compared to baseline)

    Change in redo volume (compared to baseline)

    0 - 14 destinations 0 -14 destinations

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 28

    Redo Transport in Multi-Standby Configuration

    96.0%

    98.0%

    100.0%

    102.0%

    104.0%

    Primary Performance Impact: 1 SYNC and multiple ASYNC Destinations

    92.0%

    94.0%

    96.0%

    98.0%

    100.0%

    102.0%

    Increase in CPU (compared to baseline)

    Change in redo volume (compared to baseline)

    # of SYNC/ASYNC destinations

    Zero

    1/0 1/1 1/14

    # of SYNC/ASYNC destinations

    Zero

    1/0 1/1 1/14

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 29

    Redo Transport for Gap Resolution

    Standby databases can be configured to request log files needed to

    resolve gaps from other standbys in a multi-standby configuration

    A standby database that is local to the primary database is normally

    the preferred location to service gap requests

    Local standby database are least likely to be impacted by network outages

    Other standbys are listed next

    The primary database services gap requests only as a last resort

    Utilizing a standby for gap resolution avoids any overhead on the primary

    database

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 30

    Agenda

    Failover and Switchover Timings

    SYNC Transport Performance

    ASYNC Transport Performance

    Primary Performance with Multiple Standby Databases

    Redo Transport Compression

    Standby Apply Performance

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 31

    0

    500

    1000

    1500

    2000

    2500

    Redo Transport Compression

    Test configuration

    12.5 MB/second bandwidth

    22 MB/second redo volume

    Uncompressed volume exceeds

    available bandwidth

    Recovery Point Objective (RPO)

    impossible to achieve

    perpetual increase in transport lag

    50% compression ratio results in:

    volume < bandwidth = achieve RPO

    ratio will vary across workloads

    Requires Advanced Compression

    Conserve Bandwidth and Improve RPO when Bandwidth Constrained

    22 MB/sec uncompressed

    12 MB/sec compressed

    Elapsed Time - Minutes

    Transport Lag - MB

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 32

    Agenda

    Failover and Switchover Timings

    SYNC Transport Performance

    ASYNC Transport Performance

    Primary Performance with Multiple Standby Databases

    Redo Transport Compression

    Standby Apply Performance

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 33

    Standby Apply Performance Test

    Redo apply was first disabled to accumulate a large number of log files

    at the standby database. Redo apply was then restarted to evaluate

    max apply rate for this workload.

    All standby log files were written to disk in Fast Recovery Area

    Exadata Write Back Flash Cache increased the redo apply rate from

    72MB/second to 174MB/second using test workload (Oracle 11.2.0.3)

    Apply rates will vary based upon platform and workload

    Achieved volumes do not represent physical limits

    They only represent the particular test case configuration and workload,

    higher apply rates have been achieved in practice by production customers

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 34

    Apply Performance at Standby Database

    Test 1: no write-back flash

    cache

    On Exadata x2-2 quarter rack

    Swing bench OLTP workload

    72 MB/second apply rate

    I/O bound during checkpoints

    1,762ms for checkpoint

    complete

    110ms DB File Parallel Write

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 35

    Apply Performance at Standby Database

    Test 2: a repeat of the previous

    test but with write-back flash

    cache enabled

    On Exadata x2-2 quarter rack

    Swing bench OLTP workload

    174 MB/second apply rate

    Checkpoint completes in

    633ms vs 1,762ms

    DB File Parallel Write is

    21ms vs 110ms

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 36

    Two Production Customer Examples

    Thomson-Reuters

    Data Warehouse on Exadata, prior to write-back flash cache

    While resolving a gap of observed an average apply rate of 580MB/second

    Allstate Insurance

    Data Warehouse ETL processing resulted in average apply rate over a 3

    hour period of 668MB/second, with peaks hitting 900MB/second

    Data Guard Redo Apply Performance

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 37

    Redo Apply Performance for Different Releases

    0

    100

    200

    300

    400

    500

    600

    700

    OracleDatabase 9i

    OracleDatabase

    10g

    OracleDatabase11g (nonExadata)

    OracleDatabase

    11g(Exadata)

    High End - Batch

    High End - OLTP

    Range of Observed Apply Rates for Batch and OLTP

    Standby

    Apply

    Rate

    MB/sec

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 38

  • Copyright 2012, Oracle and/or its affiliates. All rights reserved. 39