best practices for high availability - oracle cloud...•not connecting when services resume...

35

Upload: others

Post on 13-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead
Page 2: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Best Practices for High Availability.NET and Oracle Database

Alex KehSenior Principal Product ManagerServer TechnologiesSeptember 22, 2016

Page 3: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Application High Availability

Fast Application Notification

Planned Maintenance Outages

Unplanned Outages

1

2

3

4

3

Page 4: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Application High Availability

4

Page 5: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Application High Availability

•Applications hang on TCP/IP timeouts

•Connection requests hang when services are down

•Not connecting when services resume

•Receiving errors during planned maintenance

•Attempting work at slow, hung, or dead nodes

What if my application does not address high availability?

Performance issues not

reported in your favorite tools.

5

Page 6: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Application High Availability

• Goal: Make Oracle .NET applications highly available

– Not just Oracle DB servers

• What is application HA?

– Applications handle DB and network failures transparently

– Developers add code to handle failures gracefully specific for their use case

• Benefits

– Improved end user service levels

– Less administrative time spent handling failover

6

.NET

Page 7: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Application High Availability

• Database has numerous HA solutions

– RAC, Data Guard, GoldenGate, Global Data Services, RAC One Node, etc.

• What about for the middle-tier for client HA?

– ODP.NET works with each DB HA solution

– Main scenarios: planned maintenance and unplanned outages

– Errors and interruptions automatically masked to end users during outages

.NET

7

Page 8: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Fast Application Notification(FAN)

8

Page 9: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

FAN in a Nut Shell

• FAN events specify what changed, where, when

– Server initiated messages

– ODP.NET apps receive these messages

– Adjusts connection behavior automatically

• ODP.NET subscribers receive FAN over either

– Oracle Notification Service (ONS): ODP.NET 12c and higher• Except unmanaged ODP.NET 12c with Oracle DB 11.2 or earlier

– AQ: ODP.NET 11g and lower

• ONS is faster, more scalable, eliminates firewall issue, supports Active Data Guard, and consolidates publish/subscribe service

Page 10: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

FAN• Rapid notification about DB and service state changes

• Event types

– Down• Received quickly to invoke failover

– Planned Down• Drains sessions for planned maintenance with no user interruption

– Up• Re-allocates sessions when services resume

– Load % • Advice to balance sessions for RAC locally and GDS globally

– Affinity• Advice when to keep conversation locality

Page 11: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Global Data Services

• Extends FAN services to a global basis

– Access to FAN capabilities across clusters and across geographies

– RAC, Active Data Guard, and GoldenGate can participate

• Benefit– Optimizes utilization, HA, and performance

• ODP.NET connection pool enhanced for GDS– No code changes required

• Requires Oracle Database 12c and ODP.NET 12c

Page 12: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Without GDS

Sales Service Sales Service

GoldenGate GoldenGate

Sales Global Service

With GDS

Global Data Services

Page 13: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Planned Maintenance Outages

13

Page 14: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

• FAN alerts ODP.NET of impending downtime or service relocation

– ODP.NET stops allocating new connections to that DB node

– Idle connections closed

– Connections returned to the pool are closed

• Pool draining feature– Part of Fast Connection Failover (FCF)

• Shutdown commences when all connections are closed– Optionally set a maximum timeout upon which shutdown occurs

• End user experience: Little to no disruption

Planned Maintenance OutageConnection Pool Scenarios

Page 15: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Pool Draining

• Set “Pooling=true” (default)

• Set ODP.NET connection string attribute “HA Events = true”

– In ODP.NET 12.2, set to true by default

• Recommend using Oracle Database 11.2.0.4+ and ODP.NET 11.2.0.4+

15

ODP.NET settings

Page 16: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Pool Draining

• Data Guard 12.2 adds support for pool draining

– Ex. SWITCHOVER TO <database name> [WAIT <timeout in seconds> ];

• Switchovers require primary to shut down before standby comes up

– End users may request a connection during that time

– With no available DB to serve the request, users receive timeout errors

– But this can be prevented by “pausing” ODP.NET requests

Data Guard Switchovers

16

Page 17: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Pool Draining

• ODP.NET ServiceRelocationConnectionTimeout setting blocks incoming connection requests until standby is up (or timeout is hit)

– Eliminates connection timeout errors

– Set in the settings section of the .NET config file

– Default = 90 seconds

• Oracle TNSNAMES settings complementary to blocking connection requests

– RETRY_COUNT • Number of times ADDRESS list traversed before connection attempt terminates

– RETRY_DELAY• Delay in seconds between subsequent retries for a connection

Data Guard Switchovers

17

Page 18: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

RETRY_COUNT and RETRY_DELAYTNSNAMES.ORA Sample File

alias =(DESCRIPTION_LIST =

(DESCRIPTION=

(RETRY_COUNT=10)(RETRY_DELAY=5)

(ADDRESS_LIST=(ADDRESS = . . .)(ADDRESS= . . .))

(CONNECT_DATA=(SERVICE_NAME=hr_svc))) .

(DESCRIPTION=

(RETRY_COUNT=10)(RETRY_DELAY=5)

(ADDRESS_LIST=(ADDRESS = . . .)(ADDRESS=. . .))

(CONNECT_DATA=(SERVICE_NAME=hr_svc2))))

Retry while service is unavailable

18

Page 19: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Planned Maintenance Outage

• Use Transparent Application Failover (TAF)

– Upon planned outage, automatically reconnect to an available DB instance

– Restore session state – new in 12.2• i.e. NLS, timezone, action , module, etc.

– Reposition cursors to position they were on prior to failure

– Replay initial transaction statement – new in 12.2• Any DML or stored procedure

• TAF is an OCI level feature

– Unmanaged ODP.NET

Non-Connection Pool Scenarios

19

Page 20: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Unplanned Outages

20

Page 21: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Fast Connection Failover

• FAN alerts ODP.NET of immediate DB node, service, etc. failure

– ODP.NET stops allocating new connections to failed entity

– Idle connections closed

– Active connections are closed

• Benefits (assuming another DB instance is available)

– New users do not pick up invalid connections

– No need to clear pools explicitly

Unplanned Outage

21

Page 22: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Transparent Application Failover

• Upon failure, automatically reconnect to an available DB instance (if available)

• Reposition cursors to position they were on prior to failure

• Restore session state – new in 12.2

– i.e. NLS, timezone, action , module, etc.

• Replay initial transaction statement – new in 12.2

– Any DML or stored procedure

• Benefit– Extends TAF use beyond read-only apps

Unplanned Outage

22

Page 23: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Application Continuity (AC)

• Recover active queries and transactions seamlessly after unplanned DB outages from a recoverable error

• Masks hardware, software, network, storage errors, and timeouts

– Available with RAC, RAC One Node, and Active Data Guard

• Benefits

– AC replays in-flight work on recoverable errors

– No additional ODP.NET coding required

– Transparent to end users (except for a slight delay)

Unplanned Outage

Page 24: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

AC

• Set “Pooling=true” (default)

• “Application Continuity=true” (default) in the connection string

– New ODP.NET 12.2 attribute

• Requires ODP.NET 12.2 and Oracle DB 12.2

ODP.NET Settings

Page 25: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Application Continuity Phases

1 – Normal Operation

• Pool marks database requests

• Oracle Client holds original calls, their inputs, and validation data

• Server decides which calls can & cannot be replayed

2 – Outage Phase 1: Reconnect

• ODP.NET checks replay is enabled

• Verifies re-connect timeliness

• Creates a new connection

• Checks target DB is valid

• Uses TG to force last commit outcome

3 – Outage Phase 2: Replay

• Client replays captured calls

• Ensures results returned to application match original

• On success, returns control to application

25

Page 26: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

AC

• Replay is disabled per request after

– Successful commit

– Some ALTER SESSION operations

– ADG with read/write DB links

• Distributed transactions

• Auto-commit calls

• AC will be enhanced to include more and more supported scenarios

Restrictions

Page 27: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Transaction Guard

• ODP.NET can determine whether a transaction committed even upon a DB failure without custom coding

• Benefit

– Ensures accurate knowledge of transaction outcome

• App can query transaction outcome

– OracleConnection properties return transaction status

– OracleLogicalTransaction class

• Requires Oracle Database 12c and ODP.NET 12c

Page 28: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Transaction Guard Scenario

1. ODP.NET receives FAN down event or error

2. IsRecoverable=false roll backIsRecoverable=true re-submit

3. New for 12.1.0.2 – To re-submit, retrieve OracleConnection.OracleLogicalTransaction

4. New for 12.1.0.2 – Retrieve transaction status with OracleLogicalTransaction.GetOutcome.

5. If committed and completed, done.If not committed nor completed, re-submit.

Page 29: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Transaction GuardODP.NET Catch Block Code Sample

29

catch (Exception ex){

if (ex is OracleException){

// It's safe to re-submit the work if the error is // recoverable and the transaction has not been committed

if (ex.IsRecoverable && ex.OracleLogicalTransaction != null && !ex.OracleLogicalTransaction.Committed)

{// safe to re-submit work

}else{

// do not re-submit work}

}}

Page 30: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Oracle Database 12.2 HA Support MatrixFast Application

Notification (FAN)Runtime Load

Balancing (RLB)Transparent Application

Failover (TAF)

Transaction Guard (TG)

ApplicationContinuity

(AC)

Real Application Clusters (RAC, RAC One)

(RAC)

Data Guard (DG) physical standby

With clusterware x x

Active Data Guard (ADG)

With clusterware x

RAC+DG (physical standby)

(within RAC)

Global Data Services (GDS)

Golden Gatex x (no DML retry) x x

30

Page 31: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Additional Oracle .NET Resources

OTNotn.oracle.com/dotnet

Twittertwitter.com/OracleDOTNET

YouTubeyoutube.com/OracleDOTNETTeam

[email protected]

Page 32: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Q&A

32

Page 33: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

33

Page 34: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 34

Page 35: Best Practices for High Availability - Oracle Cloud...•Not connecting when services resume •Receiving errors during planned maintenance •Attempting work at slow, hung, or dead