Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Eliminate Application Downtime with Oracle Database and .NET
Alex Keh Gairik ChakrabortySenior Principal Product Manager Senior DirectorServer Technologies Database Administration October 25, 2018 Epsilon
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Program Agenda
Application Continuous Availability
Fast Application Notification
Planned Maintenance Outages
Unplanned Outages
Real World Oracle HA .NET Experience: Epsilon
1
2
3
4
5
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Application Continuous AvailabilityOracle .NET Applications
3
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
From High Availability to Continuous Availability
•Minimizes downtime
•In-flight work is lost
•Rolling maintenance at DB
•Predictable runtime performance
•Errors may be visible
•Design for single failure
•Basic HA building blocks
• Removes user perspective downtime
• In-flight work is preserved
• Maintenance is hidden
• Predictable outage performance
• Errors only if unrecoverable
• Handles multiple concurrent failures
• Builds on top of HA
4
High Availability Continuous Availability
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Application Continuous Availability
• Goal: Make Oracle .NET applications continuously available
– Not just Oracle DB servers
• What does that mean?
– Applications handle recoverable errors, such as DB and network failures, transparently
– Developers add no to small amounts of code to handle failures gracefully
• Benefits
– Improved end user service levels
– Less administrative time spent handling failover
6
.NET
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Fast Application NotificationOracle .NET Applications
8
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
FAN in a Nut Shell
• FAN events specify what changed, where, when
– Server initiated messages
– ODP.NET apps receive these messages
– Adjusts connection behavior automatically
• ODP.NET subscribers receive FAN over either
– Oracle Notification Service (ONS): ODP.NET 12c and higher• Except unmanaged ODP.NET 12c with Oracle DB 11.2 or earlier
– AQ: ODP.NET 11g and lower
• ONS is faster, more scalable, eliminates firewall issue, supports Active Data Guard, and consolidates publish/subscribe service
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
FAN• Rapid notification about DB and service state changes
• Event types
– Down• Received quickly to invoke failover
– Planned Down• Drains sessions for planned maintenance with no user interruption
– Up• Re-allocates sessions when services resume
– Load % • Advice to balance sessions for RAC locally and GDS globally
– Affinity• Advice when to keep conversation locality
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Planned Maintenance OutagesOracle .NET Applications
13
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• FAN alerts ODP.NET of impending downtime or service relocation
–ODP.NET stops allocating new connections to that DB node
– Idle connections closed
– Connections returned to the pool are closed
• Pool draining feature– Part of Fast Connection Failover (FCF)
• Shutdown commences when all connections are closed– Optionally set a maximum timeout upon which shutdown occurs
• End user experience: Little to no disruption
Planned Maintenance OutageConnection Pool Scenarios
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pool Draining
• Relocate service example
srvctl relocate service –database <unique database name> –service <service name> –drain_timeout 120 –stopoption IMMEDIATE –oldinst <existing instance>
• ODP.NET settings
– “Pooling=true” (default)
– “HA Events = true” – connection pool attribute• In ODP.NET 12.2+, set to true by default
• Recommend using Oracle Database 11.2.0.4+ and ODP.NET 11.2.0.4+
15
Settings
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pool Draining
• Data Guard 12.2 adds support for pool draining
– Ex. SWITCHOVER TO <database name> [WAIT <timeout in seconds> ];
• Switchovers require primary to shut down before standby comes up
– End users may request a connection during that time
–With no available DB to serve the request, users receive timeout errors
– But this can be prevented by “pausing” ODP.NET requests
Data Guard Switchovers
16
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pool Draining
• ODP.NET ServiceRelocationConnectionTimeout setting blocks incoming connection requests until standby is up or timeout expires
– Eliminates/Reduces connection timeout errors
– Set in .NET config file or OracleConfiguration class
– Default = 90 seconds
• Oracle TNSNAMES settings complementary to blocking connection requests
– RETRY_COUNT• Number of times ADDRESS list traversed before connection attempt terminates
– RETRY_DELAY• Delay in seconds between subsequent retries for a connection
Data Guard Switchovers
17
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
RETRY_COUNT and RETRY_DELAYTNSNAMES.ORA Sample File
alias =(DESCRIPTION_LIST =
(DESCRIPTION=
(RETRY_COUNT=10)(RETRY_DELAY=5)
(ADDRESS_LIST=(ADDRESS = . . .)(ADDRESS= . . .))
(CONNECT_DATA=(SERVICE_NAME=hr_svc))) .
(DESCRIPTION=
(RETRY_COUNT=10)(RETRY_DELAY=5)
(ADDRESS_LIST=(ADDRESS = . . .)(ADDRESS=. . .))
(CONNECT_DATA=(SERVICE_NAME=hr_svc2))))
Retry while service is unavailable
18
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Unplanned OutagesOracle .NET Applications
20
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Fast Connection Failover
• FAN alerts ODP.NET of immediate DB node, service, etc. failure
–ODP.NET stops allocating new connections to failed entity
– Idle connections closed
– Active connections are closed
• Benefits (assuming another DB instance is available)
– New users do not pick up invalid connections
– No need to clear pools explicitly
Unplanned Outage
21
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Application Continuity (AC)
• Recover active queries and transactions seamlessly after unplanned DB outages from a recoverable error
• Masks hardware, software, network, storage errors, and timeouts
– Available with RAC, RAC One Node, and Active Data Guard
• Benefits
– AC replays in-flight work on recoverable errors
– No additional ODP.NET code required
– Transparent to end users (except for a slight delay)
Unplanned Outage
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Built in session state tracking
• Restores and enforces session state before replay
• Discovers and advances request boundaries
• Keeps mutables for SQL
• Disables side-effects
• Adapts as applications change: apps protected for the future
24
DatabaseRequest
Errors/Timeouts hidden
TAC
Transparent Application Continuity (TAC)
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
TAC Explained
Normal Operation
• ODP.NET marks DB requests
• Server tracks session state, decides which calls to replay, disables side effects
• ODP.NET holds original calls, their inputs, and validation data
Failover Phase 1: Reconnect
• Checks replay is enabled
• Verifies timeliness
• Creates a new connection
• Checks target database is legal for replay
• Uses Transaction Guard to guarantee commit outcome
Failover Phase 2: Replay
• Restores and verifies the session state
• Replays held calls, restores mutablesautomatically
• Ensures results, states, messages match original
• On success, returns control to the application
25
New with TAC
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Application Continuity
• Set “Pooling=true” (default)
• “Application Continuity=true” (default) in the connection string
– New ODP.NET 12.2 attribute
• AC requires unmanaged ODP.NET and DB 12.2 or higher
• TAC requires unmanaged ODP.NET and DB 18c
ODP.NET Settings
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Transaction Guard
• ODP.NET can determine whether transaction committed even upon a DB failure without custom coding
• Used by AC
• Benefit
– Ensures accurate knowledge of transaction outcome
• App can query transaction outcome
–OracleConnection properties return transaction status
–OracleLogicalTransaction class
• Requires Oracle Database 12c and ODP.NET 12c or higher
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Transaction Guard Scenario
1. ODP.NET receives FAN down event or error
2. IsRecoverable=false roll backIsRecoverable=true re-submit
3. To re-submit, retrieve OracleConnection.OracleLogicalTransaction
4. Retrieve transaction status with OracleLogicalTransaction.GetOutcome
5. If committed and completed, done.If not committed nor completed, re-submit.
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Transaction GuardODP.NET Catch Block Code Sample
29
catch (Exception ex){
if (ex is OracleException){
// It's safe to re-submit the work if the error is // recoverable and the transaction has not been committed
if (ex.IsRecoverable && ex.OracleLogicalTransaction != null && !ex.OracleLogicalTransaction.Committed)
{// safe to re-submit work
}else{
// do not re-submit work}
}}
©2014 Epsilon. Private & Confidential
Real world experience
Eliminate application downtime using ODP.NET and Oracle Database
Gairik Chakraborty
Senior Director , Database Administration , Epsilon
30
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
31
• Snapshot of Epsilon
• Real world experience of application high availability using ODP.NET and Oracle database
• Planned next steps
Agenda
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
32
• Epsilon is all-encompassing global marketing company, we are global leader in turning data-driven marketing into personalized customer experience lasting relationships.
• More than 9000 associates and70 offices worldwide
• Largest permission-based e-mailer in the world, delivering over 75 billion emails annually
• World’s leading source of data with information covering over 1.5B individualrecords and 278M devices
• More than 2,000 global clients, including 26 of the Fortune 1009 out of 10 Top Banks8 out of 10 Top Retailers9 out of Top 10 Pharmaceutical
Companies
Epsilon at a Glance
Co
pyrig
ht ©
Ep
silo
n 2
01
7 E
psilo
n D
ata
Ma
na
ge
me
nt, L
LC
. All rig
hts
reserv
ed.
33
We deliver personalized connections, build loyalty and drive business for
brands around the world
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
34
Application high availability using ODP.NET
34
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
35
• Loyalty real time marketing solution with extreme availability , performance and scalability requirement
• Deployment should support any public cloud as well as on premises – product offering is cloud first
• Platform supports real time POS integration with average call time from 200-500ms average and each API call can have multiple ( 20 – 100 ) sql internally
• Maintenance needs to be performed while system is online
• Scalable , Reliable and highly available infrastructure along with industry standard security and auditing
High Level Business Requirements
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
36
Challenges Before ODP.NET HA implementation
• No draining method available using dedicated connection model
• Planned maintenance required application service restart - major pain point
• No protection for application API calls during unplanned outages e.g. node failure
• Large number of dedicated connection uses higher amount of CPU and memory in application server
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
37
Application Load Balancer
Web Servers – ODP.NET Connection Pool
Web Service Call
OLTP Batch Read only
OLTP
Service
Batch
Service
RAC Node 1 RAC Node 2
Site 1
Primary DB
Fast Application Notification
Application Load Balancer
Web Servers – ODP.NET Connection Pool
OLTP Batch Read only
Report
Service
RRAC Node 1 RRAC Node 2
Site 2
Standby DB
Scheduled maintenance: Application service placement ( RAC )
Fast Application Notification
Active Data Guard
Web Service Call
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
38
Handling Planned Maintenance
• Ensure port 6200 is open in firewall between database and application server for Fast application notification messages from ONS
• Use Connection pool with proper settings required for application
<add key="DefaultConnectionString" value="User ID=<APP_USER>;Password=<PASSWORD>;Data Source=<TNS_ALIAS>;Decr Pool Size=5;Incr Pool Size=5;Max Pool Size=200;Min Pool Size=10;HA Events=True; Load Balancing=True; Connection Timeout=40" />
• For batch workload ( which doesn’t run all the time ) , recommend to use Min Pool size 0 for complete connection draining
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
39
Handling Planned Maintenance
• Ensure application connections are returned back to pool once work is done for smooth draining
• Set TraceOption=1 , TraceLevel=127 ( only during debugging)
for i in $( ls -l|awk '{print $9}' ); doecho filename: $icat "$i"|grep "OracleConnection"|grep -i Open|grep -i "(ENTRY)"|wc -lcat "$i"|grep "OracleConnection"|grep -i Close|grep -i "(ENTRY)"|wc -lcat "$i"|grep "OracleConnection"|grep -i Dispose|grep -i "(ENTRY)"|wc -l
done
• Disable trace before production implementation
• Resource utilization is much less while using connection pooling compared to dedicated connection
• Using ServiceRelocationConnectionTimeout along with fast connection failover ( HA events=true ) helps to hide errors during data guard switchover brownout period.
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
40
Handling Planned Maintenance
srvctl relocate service -db t1fusn -service OLTP_SERVICE -oldinstt1fusn1 -newinst t1fusn2 -drain_timeout 20 -stopoption Immediate
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
41
Handling Unplanned Outage
• Pre 12.2 , Transaction guard along with ODP.NET implementation can be used to handle unplanned outage
• Implementation is not out of the box and requires custom code in error handling section
• Application Continuity is out of the box solution to handle unplanned outage ( requires 12.2+ RDBMS and 12.2+ ODAC ) , uses transaction guard internally.
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
42
Application Load Balancer
Web Servers – ODP.NET Connection Pool
Web Service Call
OLTP Batch Read only
OLTP
Service
Batch
Service
RAC Node 1 RAC Node 2
Site 1
Primary DB
Application Load Balancer
Web Servers – ODP.NET Connection Pool
OLTP Batch Read only
Report
Service
RRAC Node 1 RRAC Node 2
Site 2
Standby DB
Unplanned Outage: Application service placement
Active Data Guard
FAN
AC ( USES TG )
Web Service Call
FAN
AC ( USES TG )
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
43
Handling Unplanned Outage• Connection Pool Settings
<add key="DefaultConnectionString" value="User ID=<APP_USER>;Password=<PASSWORD>;Data Source=<TNS_ALIAS>;Decr Pool Size=5;Incr Pool Size=5;Max Pool Size=200;Min Pool Size=10;HA Events=True;Load Balancing=True;Application Continuity=true;Connection Timeout=40" />
• Database Settings :
1. GRANT EXECUTE ON DBMS_APP_CONT TO <app_user>;2. GRANT KEEP SYSGUID to <app_user>;3. GRANT KEEP DATE TIME to <app_user>;4. SELECT 'grant KEEP SEQUENCE on '||sequence_owner||'.'||sequence_name||' to <app_user>;'
FROM dba_sequences WHERE sequence_owner LIKE '<schema_owner>';5. SELECT 'alter SEQUENCE '||sequence_owner||'.'||sequence_name||' KEEP;' FROM
dba_sequences WHERE sequence_owner LIKE '<schema_owner>‘;
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
44
Handling Unplanned Outage[oracle@dc1uorclrac01 trace]$ srvctl config service -db t1fusn -service oltp_serviceService name: OLTP_SERVICE.AQ HA notifications: true.
Commit Outcome: true => Transaction Guard EnabledFailover type: TRANSACTION => Application continuity enabledFailover method:TAF failover retries: 30TAF failover delay: 5.Connection Load Balancing Goal: LONGRuntime Load Balancing Goal: SERVICE_TIME.Retention: 86400 secondsReplay Initiation Time: 300 secondsDrain timeout: 30 seconds => 12.2 and 18cStop option: immediate => 12.2 and 18c.Preferred instances: t1fusn1Available instances: t1fusn2.
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
45
Handling Unplanned Outage
• Coverage Analysis :
Set trace : alter system set event='10602 trace name context forever, level 28:trace[progint_appcont_rdbms]:10702 trace name context forever, level 16' scope = spfile ;
Run odp.net application workload , initiate failure
Run orachk coverage analysis
./orachk -asmhome /home/oracle/jar/asm-all-5.0.3.jar -javahome /lfs/dba/java/jdk1.8.0_144 -apptrc/u01/app/oracle/diag/rdbms/t1fusn/t1fusn1/trace 1
Analyze output of orachk
Disable trace
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
46
Special Considerations
• Kill session while Application Continuity is enabled at service level should use noreplay clause:
alter system kill session 'sid, serial#, @inst' noreplay;alter system disconnect session 'sid, serial#, @inst' noreplay;
• odp.net CONNECT_TIMEOUT must be > ((RETRY_COUNT+1) * RETRY_DELAY)
OLTP_SERVICE =(DESCRIPTION =(CONNECT_TIMEOUT= 60)(RETRY_COUNT=5)(RETRY_DELAY=5)(TRANSPORT_CONNECT_TIMEOUT=3)
(ADDRESS = (PROTOCOL = TCP)(HOST = dc1uorclrac-scan.res.prod.global)(PORT = 1521))(CONNECT_DATA =(SERVER = DEDICATED)(SERVICE_NAME = oltp_service.res.prod.global)
))
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
47
Lessons learned
• Periodically apply Bundle Patch to avoid known issues• 12.1.0.2 => Patch 28563501 • 12.2.0.1 => Patch 28574555
• Encountered memory leak and crash from ODAC libraries. Issue was fixed after applying BP and following setting in web.config or app.config
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
48
Lessons learned
• Observed higher resource utilization in Web Servers ( 5 – 10% , it can vary depending on workload ) after enabling application continuity
• Change in transaction guard API for 12.2c:
https://docs.oracle.com/database/122/ODPNT/release_changes.htm#GUID-2B9620E7-F8AB-47AA-935C-B6B9E4893DF6
• For Application Continuity setup, refer following :
http://www.oracle.com/technetwork/database/options/clustering/application-continuity-wp-12c-1966213.pdf
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
49
Benefits after ODP.NET HA implementation
• Scheduled maintenance of Oracle technology stack can be done without disrupting business user experience. ( meet security compliance as well as uptime SLA )
• No longer application server restart required for planned maintenance or unplanned outage of oracle stack – a big relief
• CPU utilization reduced in application servers after connection pool implementation
• No application call failure even in case of unplanned outage improves user experience
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
50
Future plans
• Plan to deploy Oracle sharding with ODP.NET driver to implement true horizontally scalable multiple cloud global solution across any region as part of Epsilons global loyalty platform deployment plan
• POC and Implementation of ODP.NET core as part of Epsilon loyalty program deployment plan using .NET core
©2014 E
psilo
n D
ata
Managem
ent, L
LC
. Priv
ate
& C
onfid
entia
l
51
Q&A?
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Additional Oracle .NET Resources
OTNotn.oracle.com/dotnet
GitHubgithub.com/oracle/dotnet-db-samples/
Twittertwitter.com/OracleDOTNET
YouTubeyoutube.com/OracleDOTNETTeam
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Upcoming .NET Sessions – ThursdayOnline schedule: https://bit.ly/2yXG8oQ
• Hands on Lab: Building .NET Applications with Oracle
– Thursday – 12:00 PM – 1:00 PM Marriott Marquis (Yerba Buena Level) – Salon 3 / 4
•
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Questions and Answers
64