maximizing sql server availability - stratus … · page 1 of 16 maximizing sql server availability...

16
Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server Magazine EXECUTIVE SUMMARY Maximizing availability is one of the most important goals of the database administrator. Today most businesses have ever increasing needs for availability and the window of acceptable downtime is rapidly disappearing. Downtime is not just inconvenient for users. It can also be very costly in terms of lost revenue as well as lost customer confidence. In this whitepaper you’ll learn about the different technologies that Microsoft provides for improving SQL Server availability and about their requirements. You’ll also learn about what types of availability scenarios each different technology is best suited for. In addition you’ll learn about how the Stratus Avance and ftServer technologies can be used to maximize SQL Server availability by providing uptime beyond the standard Microsoft availability technologies. THE COSTS OF DOWNTIME Ensuring the availability of the database to support mission critical applications is the database administrators top priority. If the database is unavailable, the applications that use that database are rendered unusable. Today’s mission critical applications need greater windows of availability than ever before and many organizations require availability a full 24-hours a day, 7-days a week, 365 days of the year. For most businesses, downtime is much more than just an inconvenience. Certainly when a business’s applications aren’t available there is an immediate loss of end user productivity which can result in a loss of revenue. For organizations running ERP applications and other manufacturing-based applications downtime can result in the inability to accept orders, the interruption of inventory procurement and even the disruption of the manufacturing process. Virtualization and the cloud multiply the cost of downtime. Both the cloud and virtualization platforms like Hyper-V or vSphere don’t typically support a single application. Instead, they host multiple servers and applications. Downtime for those platforms is critical as it results in widespread loss of availability for all of the services that are supported by the cloud or virtualization host. In addition, many organizations, applications are a primary source of revenue and downtime can have a massive impact on the organization’s bottom line. A study by Forrester Research showed that a 1 hour outage for the online broker firm eTrade resulted in a loss of 8 million dollars. Similarly, a 10-hour outage at DELL was estimated to cost 83 million dollars. More recently, an outage at PayPal was estimated to result in a $2,000 per second loss of income for all of PayPal’s customers or about $7.2 million dollars per hour. Downtime cost for the average company is approximately $150,000 per hour, according to Forrester.

Upload: buikhue

Post on 23-Apr-2018

239 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 1 of 16

MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey

Senior Technical Director for Windows IT Pro Magazine and SQL Server Magazine

EXECUTIVE SUMMARY

Maximizing availability is one of the most important goals of the database administrator. Today most businesses

have ever increasing needs for availability and the window of acceptable downtime is rapidly disappearing.

Downtime is not just inconvenient for users. It can also be very costly in terms of lost revenue as well as lost

customer confidence. In this whitepaper you’ll learn about the different technologies that Microsoft provides for

improving SQL Server availability and about their requirements. You’ll also learn about what types of availability

scenarios each different technology is best suited for. In addition you’ll learn about how the Stratus Avance and

ftServer technologies can be used to maximize SQL Server availability by providing uptime beyond the standard

Microsoft availability technologies.

THE COSTS OF DOWNTIME

Ensuring the availability of the database to support mission critical applications is the database administrator’s top

priority. If the database is unavailable, the applications that use that database are rendered unusable. Today’s

mission critical applications need greater windows of availability than ever before and many organizations require

availability a full 24-hours a day, 7-days a week, 365 days of the year.

For most businesses, downtime is much more than just an inconvenience. Certainly when a business’s applications

aren’t available there is an immediate loss of end user productivity which can result in a loss of revenue. For

organizations running ERP applications and other manufacturing-based applications downtime can result in the

inability to accept orders, the interruption of inventory procurement and even the disruption of the manufacturing

process. Virtualization and the cloud multiply the cost of downtime. Both the cloud and virtualization platforms

like Hyper-V or vSphere don’t typically support a single application. Instead, they host multiple servers and

applications. Downtime for those platforms is critical as it results in widespread loss of availability for all of the

services that are supported by the cloud or virtualization host.

In addition, many organizations, applications are a primary source of revenue and downtime can have a massive

impact on the organization’s bottom line. A study by Forrester Research showed that a 1 hour outage for the

online broker firm eTrade resulted in a loss of 8 million dollars. Similarly, a 10-hour outage at DELL was estimated

to cost 83 million dollars. More recently, an outage at PayPal was estimated to result in a $2,000 per second loss

of income for all of PayPal’s customers or about $7.2 million dollars per hour. Downtime cost for the average

company is approximately $150,000 per hour, according to Forrester.

Page 2: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 2 of 16

Monetary loss is only one of the costs associated with downtime. Downtime of mission critical and public facing

applications has other significant types of costs associated with them as well. When a company’s applications and

services are unavailable this can also result in loss of customer confidence as well as damage to the company’s

reputation.

BUILDING A HIGHLY AVAILABLE SQL SERVER ENVIRONMENT

Attaining the levels of availability that are required by businesses extend far beyond the simple data protection

provided by traditional backup and restore technologies. Understanding your organization’s real availability

requirements is the first step in choosing and implementing the high availability solution that’s best suited to your

organization. While some organizations require 99.999% availability this isn’t true for all organizations.

Determining the true availability requirements depends on the nature of the business. Once those requirements

are determined you can work toward building an IT infrastructure that can provide the level of availability that’s

needed.

Creating a highly available infrastructure depends on more than just selecting the right technologies. Technology

doesn’t operate in a vacuum. Instead, these technologies are used by people supporting the needs of the business.

There are three primary components that work together to create a highly available operating environment:

People - Continuous availability begins with the organization’s personnel. Hiring the best people to run

and manage your IT infrastructure is important for daily business operations. Having expert staff available

is critical when unforeseen events arise and there’s the need to troubleshoot errors. These essential skills

can serve to avoid potential downtime and to minimize it when it does occur. Properly maintaining these

skills will positively influence system availability. Continued training is vital to enable operations personnel

to keep their skill level current, ensuring that they can perform both routine and emergency procedures

correctly and efficiently.

Processes – For organizations to maintain maximum uptime, they must put into practice efficient

operating procedures. They need to create procedural policies that will ensure that their infrastructure

will perform at its optimum level. One of the most important steps here is to create run books that

document both the organization’s standard operating procedures as well as outlining the procedures to

take when troubleshooting problems and responding to emergency situations. Run books can be

particularly important during periods where the senior or most knowledgeable IT staff may not be

available and junior DBAs need to handle the operations. It’s equally important to periodically test these

procedures to ensure that they remain applicable and accurate for the current infrastructure.

Technology - The role of technology in creating a highly available environment has two primary levels.

Technology solutions for availability are essentially designed to address planned or unplanned downtime.

Planned downtime includes activities like routine maintenance, software updates and hardware upgrades

where the IT staff initiates activity that could possibly result in application downtime. Unplanned

downtime can be site, system, or application failures that are not anticipated. SQL Server provides a

number of different features to address planned and unplanned downtime. In addition, Stratus offers two

different product lines that provide even higher levels of availability. The Stratus Avance and ftServer

products address both planned and unplanned downtime and are typically much easier to implement and

manage than the Microsoft technologies.

Page 3: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 3 of 16

SQL SERVER HIGH AVAILABILITY OPTIONS

In the following section you’ll learn about the different high availability technologies that Microsoft provides for

SQL Server. For each of these technologies you’ll see the type of downtime that it is primarily intended to address,

as well as its requirements and limitations.

The primary high availability technologies for SQL Server are: Windows Failover Clustering, Database Mirroring,

and Log Shipping. In addition, there are a number of other SQL Server features that can serve to minimize

downtime and increase availability.

FAILOVER CLUSTERING

Windows Failover Clustering, Microsoft’s primary availability technology, is designed to address server and site

level unplanned downtime. With Windows Failover Clustering each physical server that participates in the cluster

is called a node. If the primary node in a cluster fails another node in the cluster can automatically take over

providing users with the same services as the failed node. This process is called a failover. When the failed node is

repaired, services can be restored to the node. This process is called a failback. Failback can be a manual process or

if there’s one particular node you want the service to run on for normal operations you can specify that node as

the Preferred Owner and the Failback process can be automatic.

Windows Failover Clustering is only provided in the Windows Server Enterprise Edition and higher. Windows

Server 2008 R2 Enterprise edition and higher provide support for up to 16-node clusters. The number of nodes

that are allowed in a cluster is also dependent on the edition of SQL Server in use. SQL Server 2008 R2 Standard

Edition provides support for two-node clusters while SQL Server 2008 R2 Enterprise edition and higher provide

support for up to 16-node clusters. Windows Failover Clusters can be configured in an active-active arrangement

where all of the nodes are performing work and they assume additional workloads in the event of a node failure.

Alternatively, they can be configured in an active-passive arrangement where some of the nodes are performing

active work and other nodes are unused until a failure occurs.

Each node in a Windows Failover Cluster must be running a licensed copy of the Windows Server Enterprise edition

or higher. This is true even if you operate the cluster in an active-passive configuration. For example, the retail

licensing costs for a four-node cluster running Windows Server 2008 R2 Enterprise is $15,996 ($3,999 x 4). This

includes 25 CALs but does not include the licensing cost for the applications you intend to run. If you add the SQL

Server 2008 R2 then you need to add the costs of the SQL Server license. The additional costs depend on whether

you are using an active-active configuration or an active-passive configuration. If you opt to use an active-passive

configuration where only one SQL Server instance is active at one time then you would need one SQL Server

Enterprise edition license. The retail cost for a server license for SQL Server 2008 R2 Enterprise edition is $8,592

which would make the total software cost of a four-node active-passive cluster $24,588. If you opt to use an

active-active cluster configuration then all of the active nodes must have a SQL Server license. For four active

nodes the total cost would be $50,364. You can find more information about Windows Server 2008 R2 licensing

costs at: http://www.microsoft.com/windowsserver2008/en/us/pricing.aspx. More information about SQL Server

licensing costs can be found at: http://www.microsoft.com/sqlserver/en/us/get-sql-server/how-to-buy.aspx.

Page 4: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 4 of 16

You can see an overview of Windows Failover Clustering in Figure 1.

Figure 1 - Windows Failover Clustering

Figure 1 depicts a simple two-node cluster. Each cluster node requires the installation of the Windows Server

operating system on a local hard drive. In addition, for Windows Server 2008 and Windows Server 2008 R2 you

must install the Failover Cluster Feature. The cluster requires a private TCP/IP network, which it uses to send the

heartbeat between cluster nodes. The cluster’s heartbeat is used to determine if a cluster node has failed. A public

network is used to connect networked clients to the clustered resources. A Windows Failover Cluster also requires

shared storage for the clustered services and cluster Quorum.

In normal operations the protected services run on the primary node until a failure occurs. Then the Windows

clustering services automatically perform a failover without any operator intervention and shifts the services to the

standby node. However, there is some downtime as the services are started on the standby server. From an

operating system standpoint, the clustered services or application must be restarted on the backup node. Next the

application or service must perform its own respective startup tasks. For a database application like SQL Server,

the length of time for a failover depends largely on the level of database activity occurring at the time of failover.

SQL Server stores all of its database activity in the transaction log. After a failover all committed transactions in the

transaction log that have not been saved must be applied to the database and all the uncommitted transactions

must be rolled back to ensure database integrity. Very active systems that have more transactions will take longer

to complete the failover process than smaller systems that have fewer transactions to recover. The failover

process for a very active enterprise database could take as much as 10 to 20 minutes. During the failover process

any client systems connected to the failed node are disconnected. When the networked clients reconnect they are

automatically connected to the cluster resources running on the backup node.

Page 5: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 5 of 16

DATABASE MIRRORING

Database Mirroring is another SQL Server high availability technology that’s primarily designed to protect against

unplanned downtime. However, it can also be implemented as a disaster recovery technology. Unlike Windows

Clustering Services, which provides server-level protection, Database Mirroring provides database-level protection

for a specific database. With Database Mirroring all of the transactions that occur on the principal server are

captured and forwarded to the mirror server. This process keeps the mirror server in synch with the principal

server.

Database Mirroring is operating system agnostic. It runs on all versions of the Windows Server operating system

that support SQL Server. Database Mirroring is supported on the SQL Server 2008 R2 Standard Edition and above.

However, the SQL Server Standard Edition is limited to using Database Mirroring only in High Safety mode. More

information about the different Database Mirroring modes is presented later in this section.

With Database Mirroring, if the principal database fails a second, standby database located on a secondary SQL

Server system will be available in a matter of seconds. Database Mirroring can be set up for a single database or

multiple databases. Each Database Mirroring implementation is a one-to-one relationship between the principal

server and the mirror server. Figure 2 shows an overview of SQL Server’s Database Mirroring.

Figure 2 - Database Mirroring

Page 6: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 6 of 16

When Database Mirroring is implemented for high availability three SQL Server systems are required: the principal

server, the mirror server, and a witness server. The principal server initially provides the database services. The

mirror server maintains a copy the databases from the principal server. The witness server determines when a

server becomes unavailable and it is required to implement an automatic failover. Both the principal server and

the mirror server must have a license for the Windows Server operating system. In addition, both nodes must also

have a license for the SQL Server. The witness also requires a SQL Server instance be installed. However, the

witness can use the free SQL Server Express edition. The minimum licensing cost to implement Database Mirroring

using the Windows Server 2008 R2 Standard Edition operating system and the SQL Server 2008 R2 Standard

edition is $4,214 (($1,209 x 2) + ($898 x 2)).

Database Mirroring works by capturing transaction log records from the principal server and forwarding them to

the mirror server. On the mirror server the database is in a constant state of recovery and can’t be used until a

failover occurs. However, the mirror server is not restricted to just providing mirroring services. The mirror server

can also be actively supporting other databases that are not participating in the mirroring operation.

Database mirroring operates in one of two modes: High Performance mode and High Safety mode. High

Performance mode is asynchronous and provides the highest level of performance but offers lower transaction

consistency. In asynchronous mode, database transactions made on the principal server are immediately

committed on the principal without waiting for the mirror server to acknowledge that it has written the data to its

log. High Performance mode is primarily designed for disaster recovery scenarios where the mirrored server is

connected via a WAN link. There is the possibility of data loss using High Performance mode because the principal

server does not wait for a confirmation from the mirror server before hardening the transaction log. If data is

corrupted during the transmission process that data could be lost at the mirror. High Safety mode operates

synchronously and it provides the highest level of data protection. In High Safety mode the principal server waits

for an acknowledgement from the mirror server before hardening the transaction log entries. High Safety mode is

designed for high availability and automatic failover is only supported in synchronous mode. High Safety provides

protection from data loss because the principal server waits for a confirmation from the mirror server before

hardening the entries in the transaction log. However, this wait time also results in lower performance. In order to

compensate for this you may want to implement Database Mirroring High Safety mode using more powerful

server hardware and high performance network connections.

Database mirroring is limited to a single mirrored database for each database that is protected using Database

Mirroring. In addition, because Database Mirroring is not a server-level technology you must manually make sure

that each server has the required server-level configuration, logins, and system database settings to support the

application that makes use of the mirrored database.

LOG SHIPPING

Log Shipping is another important SQL Server availability technology. However, while Windows Failover Clustering

and Database Mirroring were focused primarily on increasing availability, log shipping is primarily designed as a

disaster recovery technology, intended to provide protection from server and site-level failures. Similar to

Database Mirroring, Log Shipping works by forwarding transaction log entries from the primary server to one or

more standby servers.

Like Database Mirroring, Log Shipping doesn’t require specialized hardware and can be implemented on any

system that’s capable of running SQL Server. Log Shipping is supported on the SQL Server Standard Edition and

higher. The licensing cost to implement Log Shipping between two servers is the same as Database Mirroring. At a

Page 7: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 7 of 16

minimum a two-node Log Shipping implementation requires two instances of the Windows Server 2008 R2

Standard Edition operating system and the SQL Server 2008 R2 Standard edition for a total licensing cost of $4,214.

You can see an overview of SQL Server Log Shipping in Figure 3.

Figure 3 - Log Shipping

Log Shipping is implemented using a primary SQL Server system and one or more standby SQL Server systems. The

primary server contains the production database. Log shipping is initialized by taking a full backup of the database

that will be protected on the primary server. The database is then restored to the standby servers. After the

standby server has restored the database a SQL Server Agent job runs a stored procedure that performs

transaction log backups of the production database. Then the transaction log backups are forwarded across the

network to the standby servers. A SQL Server Agent job on the standby servers periodically runs another stored

procedure that applies the transaction log backups to the database on the standby servers.

Unlike Windows Failover Clustering and Database Mirroring, SQL Server Log Shipping has no automated failover or

failback process. All of the steps to connect end users to the standby server are manual. Also like Database

Mirroring the protection provided by Log Shipping is at the database level. This means that you must manually

ensure that the SQL Server system used as the standby server has the required server-level configuration, logins,

and system database settings to support the application.

ALWAYS ON

The upcoming SQL Server Denali release will include a new high availability feature named AlwaysOn. SQL Server

AlwaysOn is essentially the next generation of Database Mirroring. SQL Server Denali AlwaysOn technology uses a

combination of the Windows Failover Clustering and Database Mirroring availability technologies. Like Database

Mirroring, AlwayOn provides protection at the database level. AlwaysOn technology addresses some of the

important limitations that are found in Database Mirroring. Database Mirroring is limited to a single mirrored

partner and the database on the mirrored server is always in a recovery state and can’t be accessed while

mirroring is active. AlwaysOn supports up to 4 secondary systems called replicas. The data in the mirrored replicas

is accessible and can be both queried and backed up. This can help distribute some of the workload from the

primary server. To address the limitation of being restricted to a single database, AlwayOn uses a new concept

Page 8: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 8 of 16

known as Availability Groups to protect multiple databases. The Availability Group enables multiple databases to

failover together as a unit.

The SQL Server Denali release with AlwaysOn requires Windows Server 2008 R2 and all of the AlwaysOn replica

systems must be part of a Windows Failover Cluster. Because Denali AlwaysOn requires Windows Failover

Clustering all of the replica systems must run the Windows Server 2008 R2 Enterprise edition or higher.

Denali enables you to combine up to four replicas for availability. Two of the replica can be synchronous, providing

high availability and automatic failover. The other two replicas can be asynchronous, providing disaster recovery

capabilities. The data on the replicas can be accessed but as read-only. In addition, unlike Database Mirroring there

is support for the FileStream data type and Logins.

The requirement for Windows Failover Clustering make setting up and managing AlwaysOn more complex than

either the standard Windows Failover Clustering technology or Database Mirroring. Unlike Windows Failover

Clustering, AlwaysOn is a database-level technology and it doesn’t provide server-level protection.

ADDITIONAL SQL SERVER AVAILABILITY FEATURES

SQL Server also includes a number of other features that are designed to increase database availability. These are

only supported by the SQL Server Enterprise edition and higher. Some of the most important features include:

Hot-Add CPU – The ability to hot-add CPUs enables SQL Server to be dynamically reconfigured to meet

changing workloads. This ability is most important for SQL Server instances running in virtual machines.

Most physical server don’t support this capability but in a virtual machines the ability to hot-add CPU’s

can allow a SQL Server instance to be dynamically reconfigured to have increased performance while it is

still running.

Hot-Add RAM – Like the ability to hot-add CPUs, the ability to hot add RAM is most applicable to SQL

Server instances running in virtual machines. Some physical servers do support this ability but in a virtual

machine the ability to hot-add RAM can be used to dynamically enable the SQL Server instance to adjust

to changing workloads.

Database Snapshots – Database Snapshots allow a SQL Server database to be quickly restored to a given

point in time. This is most useful for recovering from end-user or administrator errors that result in

corrupted data. Snapshots are a metadata operation that allows all of the database pages that have been

changed since the last backup to be captured; they do not require the entire database to be copied.

Fast Recovery – Fast recovery enables quicker database availability following a restore operation. Initially

the data file must be restored, and then all of the saved transactions must be reapplied. Two phases must

be completed when the transaction log is restored. First, all of the committed transactions in the

transaction log must be reapplied. Second, all of the uncommitted transaction must be rolled back. Fast

Recovery enables the database to be available as soon as all of the committed transactions have been

reapplied. There no need to wait for the uncommitted transactions to be rolled back.

Partial Database Restore – A SQL Server database can consist of multiple file groups. This allows you to

separate different portions of the database to different storage types and locations. The ability to perform

partial database restores enables the database to be available immediately following the restoration of

the primary filegroup, enabling quicker availability following a disaster recovery. Other filegroups can be

restored at a later time.

Page 9: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 9 of 16

Online Indexing – SQL Server also provides the ability to perform online index rebuilds where a selected index can

be rebuilt without interrupting end user access to the base tables.

One notable feature that some organizations use for availability that isn’t included in the previous section is

database replication. While replication can be used for increased availability because of its ability to duplicate data

to multiple databases, Microsoft never designed replication as an availability technology. Instead, database

replication was really designed to support distributed database applications and reporting. Replication has no

automated failover nor is there any failback. Likewise replication doesn’t provide any built-in support for other

database objects like logins.

MAXIMIZING AVAILABILITY

While Microsoft built-in technologies can provide high levels of uptime, you can achieve even higher levels of

availability by utilizing Stratus’ high availability products. The Stratus products are designed to provide the

organization with continuous availability for its mission-critical applications.

When planning for maximum availability it’s important to understand the difference between high availability and

continuous availability. High availability doesn’t mean there is no downtime. Instead, high availability is about

minimizing downtime. For instance, technologies like Windows Failover Clustering increase availability but there is

still downtime that occurs during the failover process. In contrast, continuous availability is the ability to withstand

hardware failures and other system failures without any service interruptions. The Stratus Avance and Stratus

ftServer System products are capable of providing maximum levels of SQL Server availability. These products are

specifically designed to be able to withstand component and server failure with no loss of availability. Research by

the Standish Group on the causes of 50,000 downtime incidences revealed that when downtime is analyzed

according to the time that the system is unavailable hardware outages accounted for 12% of all downtime,

software failures accounted for 19%, network outages accounted for 14%, operator error was responsible for 17%,

and the remainder was due to multiple other causes including environmental issues, planned downtime and

malware. Protecting against hardware and software failure is essential for minimizing downtime. In the following

section you’ll learn more about the Stratus Avance and ftServer availability technologies.

STATUS AVANCE

Stratus Avance is a software-based high availability solution designed for small and medium sized businesses.

Avance is an affordable high availability solution because it can be implemented using industry-standard servers

and doesn’t require any type of external storage solutions to work. You can install Avance on any standard HP,

DELL or IBM x86 servers. It is specially designed to be easy to deploy and manage. Its automated management

tools make it an excellent solution for organization with limited IT personnel. Avance can be used with both

Windows Server and Linux operating systems and it doesn’t require any application changes in order to take

advantage of its continuous availability capabilities.

AVANCE ARCHITECTURE

Stratus Avance is simpler to implement than other availability technologies like hardware server clustering. You

implement Stratus Advance using two industry-standard x86 servers. Avance uses built-in virtualization that’s

provided by the high performance XenServer hypervisor. The Avance software operates under the application and

Page 10: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 10 of 16

operating systems, and manages the two servers as a single unit using the web-based remote monitoring and

management console. The Avance system architecture is shown in Figure 5.

Figure 5 – The Stratus Avance Architecture

The Stratus Avance high availability software is installed on two nodes that are connected using a dedicated

Ethernet network. Avance software works over the Ethernet link to perform real-time data synchronization

between the nodes and the industry standard x86 servers do not need to be identical. Avance can use any edition

of Windows Server and SQL Server. Setting up a high availability environment using Avance can be performed

using the Avance software which is licensed for $5000, each instance of the Windows Server 2008 R2 Standard

edition costs $1209 and each instance of the SQL Server 2008 R2 Standard edition at $895 for a total cost of $9208.

During normal operation the server workload is processed on the primary node shown as Node A in Figure 5. Node

A and Node B are connected via a 1GB Ethernet link. If the primary node experiences a failure the secondary node

can immediately take over with no downtime. The Avance software can work with network latency times of up to

10ms which can enable you to have up to approximately 3 miles of separation between the two nodes. This allows

Status Avance to be used for both high availability and disaster recovery.

Avance features built-in virtualization that allows both Windows and Linux guests to run in virtual machines. These

virtual machines are where the active server workloads are processed. You can use Avance to protect all the

applications and workloads that are running in the different active virtual machines. Both nodes used by Stratus

Avance are managed as a single unit.

Many businesses have used the Avance software to provide 99.99% uptime, which is far superior to clustering

solutions. 99.99% availability means that many of the organizations using Avance experience less than one hour of

Page 11: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 11 of 16

downtime per year. In addition, to automatic failover the Avance software also proactively monitors over a 150

different operating conditions and notifies support personnel of any pending system issues. Proactive monitoring

reduces operator error and can completely avoid issues that could cause potential downtime. Avance’s proactive

monitoring is completely integrated with Stratus 24/7/365 Support Services. Stratus Support Services provides

your organization with expert diagnostics and troubleshooting assistance that can help predict and prevent

downtime.

CASE STUDIES

Many businesses have successfully implemented the Stratus Avance technology to provide high availability for

their critical applications. In the following section you’ll see how doeLegal and the Tai Hing Catering Group use

Stratus Avance to provide maximum availability for critical SQL Server applications.

DOELEGAL

doeLegal provides cloud-based hosted legal services designed to support corporate legal departments as well as

individual law firms. Using doeLegal services enables these businesses to move away from paper-based and other

legacy information systems without needing to buy, deploy and manage their own internal information systems

infrastructure. Uptime was the number one objective when doeLegal moved to their new cloud-based e-billing

systems. Being cloud-based doeLegal wanted 24/7 uptime to deliver round-the-clock services to their customers in

an environment that didn’t have single point of failure.

doeLegal had previously used a server hardware clustering for high availability but found that solution didn’t

provide the seamless customer availability needed. Even with server clustering there is some downtime during a

server failover. Instead, doeLegal wanted a solution to provide customers with continuous access to their

production data. By implementing the Stratus Advance solution doeLegal was able to provide seamless high

availability to the Microsoft Windows and SQL Server infrastructure that powered their cloud services. Russ

Aaronson, Director of IT for doeLegal, asserted “We found Avance gave us the high availability approach we

needed in a 24/7 web-based environment.”

doeLegal used Avance to provide high availability by enabling industry standard x86/x64 servers to work together

to automatically overcome one or more component failures. If Avance detects a server problem it isolates the fault

and automatically migrates the workload to the secondary server with no interruption of end user services. Web

based management monitors for server health and automatically notifies administrators and the Stratus 24-hour

support center for server problems.

You can learn more about doeLegal’s Avance implementation at:

http://www.stratus.com/~/media/Stratus/Files/Library/CaseStudies/doeLegal.pdf

TAI HING CATERING GROUP

The Tai Hing Catering Group uses the SAP Business One enterprise planning resource (ERP) application to ensure

that its main warehouse is stocked with the fresh food products required to keep its 60 different restaurants in

operation. Information gathered from point-of-sale systems is used to determine which items are selling and what

inventory each location will need for the next day. Ryan Chow, IT department manager, said, “Uptime is the

biggest concern we have. If the server is down the buying process will stop.” To maximize their server uptime the

Tai Hing Catering Group chose to implement Stratus Avance.

Page 12: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 12 of 16

Ensuring continuous uptime was especially critical for the Tai Hing Carter Group. When they adopted the SAP

Business One ERP system they also consolidated a number of different and diverse computing systems. This

centralization meant that uptime for the ERP system was vital for the entire enterprise. Avance provides fully

automatic failover for the ERP workloads as well as email and other important business systems. Being based on

server virtualization, Avance is able to provide a good deal of operational flexibility. Different workloads such as

SAP, SQL Server and email can be isolated in separate VMs, plus VMs can be transferred between servers for

scheduled maintenance. In addition, Avance’s ability to automatically monitor and alert on over a 150 different

operating conditions allows the Tai Hing Catering Group to be immediately notified about all critical operating

conditions.

You can learn more about Tai Hing Catering Group’s Avance case study at:

http://www.stratus.com/~/media/Stratus/Files/Library/CaseStudies/TaiHing.pdf

STRATUS FTSERVER SYSTEMS

Stratus ftServer Systems are specifically engineered for continuous availability, providing 99.999% uptime by using

fault tolerant hardware. Rack-mounted ftServer systems are primarily designed for medium and larger businesses

where they can be implemented to protect mission critical and virtualized application workloads. Traditional x86

and x64 servers are not designed for availability. While some have limited redundant components like power

supplies, the vast majority of system components are not protected. If a hardware failure occurs in the CPU, RAM

or DAS the system will fail and all of the workloads will be rendered inoperable. In contrast, the Stratus ftServer

System line is designed to protect against downtime and data loss. All the system components, including the

motherboard, CPU, RAM and power supplies, are paired. Stratus’ lockstep technology keeps all of the components

completely in synch. Lockstep even ensures that the redundant CPUs are executing exactly the same instructions.

Because replicated components perform the same instructions at the same time, there is just one system image,

zero interruption in processing, zero loss of performance, and zero loss of data integrity even if a component fails.

The ftServer Systems are managed exactly like standard servers, making it easy to achieve high levels of availability

with a minimum of complexity. The ftServer system requires one copy of Windows Server and run off-the-shelf

versions of the Windows and Red Hat Enterprise Linux operating systems. No application changes are required.

Entry-level pricing for ftServer systems is comparable to similar SQL HA “off the shelf” hardware configurations.

Stratus makes three models of their ftServer Systems: the entry-level 2600, the mid-level 4500 and the enterprise-

class 6310. The Stratus ftServer systems fully support running Windows Server 2008 and Windows Server 2008 R2

including the Hyper-V virtualization role. In addition, all the Stratus ftServer System families support the Linux

operating and VMware vSphere 4 and higher.

FTSERVER SYSTEMS ARCHITECTURE

Deploying the Stratus ftServer Systems is exactly like deploying a standard server. All of the different models are

delivered in a 4U form factor and they can all be mounted in a standard 19-inch rack. The entry-level ftServer 2600

is a single socket system that comes with a 2.00 GHz quad-core processor and it supports up to 16 GB of RAM. The

mid-level ftServer 4500 is a dual socket system that is also delivered with a 2.00 GHz quad-core processer and it

supports a maximum of 96 GB of RAM. At the high-end the ftServer 6310 is a dual socket system that comes with a

2.93 GHz hex-core processor with support for up to 96 GB of RAM. It’s important to remember that these are the

logical specifications for each of the different server models. Because the ftServer systems use redundant server

components the actual physical count for each of these specifications is doubled.

Page 13: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 13 of 16

In Figure 6 you can see an overview of the continuous availability architecture used by the Stratus ftServer

Systems.

Figure 6 – The Stratus ftServer Systems Architecture

The ftServer system uses a dual-modular-redundancy (DMR) configuration that eliminates any single point of

failure and safeguards data integrity. Orchestrating the server’s ability to provide continuous availability are two

capabilities unique to ftServer systems: the Automated Uptime Layer and Proactive Availability Monitoring and

Management.

The Automated Uptime Layer is the first line of defense against server downtime. Think of it as embedded Level

One service, constantly monitoring hundreds of system components and sensors. The specialized software

manages system resources to preemptively protect against downtime and data loss. This software layer

automatically detects and handles system faults, and isolates them from the application and other system

resources. It may fix and return a misbehaving component to service, or determine there’s a more serious issue at

hand. Either way, the application and system user are unaffected. It’s simple and automatic, requiring no human

intervention. The Automated Uptime Layer also helps to keep planned downtime to a minimum for the inevitable

software patching and system upgrades.

Proactive Availability Management is a combination of availability experts and best practices, tightly integrated

with the uptime layer software. Stratus availability technicians monitor the system and Automated Uptime Layer

over a secure global network. These experts are ready 24/7 to remotely diagnose and remediate more complex

issues. Virtually everything a service technician can do on site, Stratus Proactive Availability Management does

remotely. There is no waiting several hours for a repair technician to show up, hopefully with the right part, to get

your business back online. The technology layer will capture the root cause of problems, and report the

information to the uptime experts to make the fix and avoid a repeat incident. Zero-downtime repair means

applications continue to run, and the business continues to operate.

Together, the Automated Uptime Layer and Proactive Availability Management deliver full protection in real time

to keep critical applications on line all the time.

Modular Implementation

Embedded I/O

PCICPU

Chipset

Memory

Fault Detection

&

Isolation

Embedded I/O

PCICPU

Chipset

Memory

Lo

ck

ste

pp

ed

CP

Us

Mu

lti-

pa

th I

/O

Fault Detection

&

Isolation

Fault Detection

&

Isolation

Fault Detection

&

Isolation

Page 14: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 14 of 16

CASE STUDIES

The Stratus ftServer line of continuous availability servers has been used to provide 5 9’s of availability to many

companies worldwide. The installed base of servers under service contracts have a recorded uptime of 99.9998%

on average. In the next section you’ll read how Atlanta’s Hartsfield-Jackson Internal Airport, Australia’s Strategic

Payment Services and the South Carolina Credit Union protected their mission critical SQL Server implementations

using Stratus ftServer Systems.

ATLANTA’S HARTSFIELD-JACKSON INTERNATIONAL AIRPORT

Atlanta’s Hartsfield-Jackson International Airport serves more passengers each year than any other airport in the

world. It is also the busiest airport for takeoffs and landings. To handle emergencies the airport deployed a new

Centralized Command and Control Center (C4) application. The C4 system utilizes a computer aided dispatch

(CAD) system to communicate incident information to the local Atlanta police and fire departments. System

availability for the CAD system is absolutely critical as it’s used to intelligently dispatch the correct police or fire

units to handle emergencies ranging from heart attacks to unattended bags. The CAD application is built on top of

the Microsoft Windows Server operating system and it uses Microsoft SQL Server as it database platform. To

ensure maximum uptime the Hartsfield-Jackson International Airport chose to implement the new system using

Stratus’ ftServer systems.

Dan Negris, Computer Aided Dispatch Administrator reports, “We have not experienced any unplanned downtime

in five years.” Dan goes on to explain, “Having that instant access to Stratus support is really useful. And what I

really, really love is the phone home to get a call from a technician telling me that, say, one of our power supplies

is down. They’re on top of things even after hours.”

You can find more information about the Stratus implementation in the Hartsfield-Jackson International Airport at:

http://www.stratus.com/~/media/Stratus/Files/Library/CaseStudies/Hartsfield-Jackson-Intll-Airport-Atlanta.pdf

STRATEGIC PAYMENT SERVICES

Strategic Payment Services based out of Sydney Australia has built an electronic payment system that processes

more than 350 million point-of-sale, ATM and Internet payment transactions per year. As you might guess

availability is a critical issue for Strategic Payment Services. This is what David Campbell, their CIO had to say:

“Uptime is an issue we take seriously. Every second that we have to be down is a second we’re not processing

transactions for clients. Hence, we’re not making revenue.”

Strategic Payment Services uses the Microsoft Windows Server operating system with SQL Server as their

relational database to support their S1 Corporation’s payment application. In addition, they use VMware vSphere

as their virtualization platform. Strategic Payment Services started with six ftServer systems: two for each of the

primary production, quality assurance, and disaster recovery environments. One server in each environment

processes card transactions; the other handles back-office reporting and settlement. SAN replication was used to

duplicate data across the different sites. VMware ESX Server and virtualization are used to separate functional

applications from one another. Stratus ftServers provide continuous availability for the VMware ESX Server hosts.

Redundant hardware components in ftServer systems eliminate the single point of failure. Stratus’ proactive

management continuously monitors server heath and includes a server diagnostic feature which can automatically

notify Status support in the event of an error condition. According the Campbell this approach has really paid

Page 15: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 15 of 16

dividends in terms of availability: “When we look at the uptime we have provided our clients over the past 12

months, our rolling 12-month uptime average at the moment is 99.995%.”

You can find out more about how Strategic Payment Service uses the Stratus ftServers at:

http://www.stratus.com/~/media/Stratus/Files/Library/CaseStudies/Strategic-Payments-Services.pdf.

SOUTH CAROLINA FEDERAL CREDIT UNION

South Carolina Federal Credit Union is among the largest 100 credit unions in the U.S. with more than 1.4 billion in

assets and over 150,000 members. In order to provide more flexibility for its customers South Carolina Credit

Union moved support for its 80 branch and remote ATMs in-house from a third-party outsourcing firm. They

chose Windows-based LynxGate ATM management software running on the Stratus ftServer platform. They

considered other forms of availability including server hardware clustering, but chose the Stratus ftServer solution

because it provided higher availability with far less complexity. South Carolina Federal Credit Union also chose

Stratus ftScalable storage, which is similarly designed to provide continuous uptime assurance.

The Stratus ftServer line provided the extreme availability that SCFCU was looking for as it delivered an impressive

99.999% uptime. Drew Foley, LynxGate Business Director summed up their impressions of the Stratus ftServer

solution by stating, “In a world of increasing uptime expectations, Stratus makes sense for savvy financial

institutions.”

You can learn more about South Carolina Federal Credit Union’s use of the Stratus ftServers for maximum

availability at: http://www.stratus.com/~/media/Stratus/Files/Library/CaseStudies/SC_Federal_CU.pdf

MICROSOFT LAB’S SQL SERVER PERFORMANCE TESTING OF THE FTSERVER

Microsoft is one of Stratus’ strategic partners and Microsoft is very aware of the need for maximum availability for

its mission critical server applications like SQL Server. Microsoft is also aware that performance is usually the next

most important attribute for an enterprise server platform. To evaluate the enterprise suitability of the Stratus

ftServer line of products Microsoft performed a series of performance tests in their labs where they compared the

Stratus ftServer 6300 which delivers five-nines of availability to the performance of a similarly configured OEM

server which had no high availability capabilities.

Microsoft’s labs performed two basic types of tests. First, they tested a decision support (DSS) workload on a 30

GB database using a subset of the TPC-H benchmark. This benchmark test was composed of 22 read-only queries

which resulted in 100% CPU utilization. In addition, Microsoft labs performed an OLTP performance test using a 25

GB TPC-C test oriented for 50,000 warehouse databases. Here, the database log was placed on internal SAS drives

and the workload pushed the server to 100% CPU utilization. These performance tests were conducted over a two

month period with 40 plus test runs. No functional issues were encountered during the testing period.

Performance results of the Stratus ftServer and the OEM server were exceptionally close with the Stratus ftServer

6300 coming within 2% of the OEM server in the DSS testing and within 3% of the OEM server for the OLTP testing.

These results conclusively demonstrate that the Stratus ftServer 6300 delivers enterprise class performance while

simultaneously delivering five-nines of availability. In summary, the Stratus lock-step continuous availability

technologies delivered comparable performance with a significantly higher level of availability than a non-fault

tolerant server. You can find out more details about Microsoft lab’s performance testing of the Stratus ftServer at:

http://www.stratus.com/Partners/StrategicPartners/Microsoft.aspx

Page 16: MAXIMIZING SQL SERVER AVAILABILITY - Stratus … · Page 1 of 16 MAXIMIZING SQL SERVER AVAILABILITY By Michael Otey Senior Technical Director for Windows IT Pro Magazine and SQL Server

Page 16 of 16

SUMMARY

Maximizing availability is one of the most important goals for the database administrator. Microsoft provides

several high availability technologies such as Windows Failover Clustering, Database Mirroring and AlwaysOn that

can be successfully used to increase the availability of your database applications. Windows Failover Clustering

provides automatic server-level failover but there is still downtime as the services are restarted on the failover

nodes. Database Mirroring provides database -level protection but it is limited in the number of databases that it

protects and there can be data loss associated with asynchronous mirroring. AlwaysOn addresses the problems

with Database Mirroring but at greater cost and complexity. Higher levels of availability can be achieved by

utilizing Stratus Avance or the Stratus ftServer line of continuous availability servers. Both Avance and the Stratus

ftServers provide protection from server failure with no interruption of end users services and zero data loss.

Stratus’ continuous availability products can exceed the levels of availability than the out-of-the-box Microsoft

technologies with far less complexity and cost. In addition, the proactive monitoring that is a part of the both the

Stratus Avance and ftServer effectively boosts availability even more by automatically alerting you to important

server health condition as well as automatically contacting Stratus support to initiate preventative and corrective

actions.

Additional Resources:

Stratus Technologies

Stratus ftServer Systems

Stratus Avance Software

Microsoft SQL Server 2008 High Availability with Clustering and Database Mirroring