are we ready for “24 by 7” ?

46
Are We Ready For “24 by 7” ? Dennis Cromwell Michael Egolf CUMREC 2001

Upload: shanta

Post on 08-Jan-2016

38 views

Category:

Documents


2 download

DESCRIPTION

Are We Ready For “24 by 7” ?. Dennis Cromwell Michael Egolf CUMREC 2001. What We Will Cover. The Challenge of “24 X 7” Definitions- A common ground The road to “high availability” The road to “continuous operations” The road to “continuous availability” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Are We Ready For “24 by 7” ?

Are We ReadyFor “24 by 7” ?

Dennis Cromwell

Michael Egolf

CUMREC 2001

Page 2: Are We Ready For “24 by 7” ?

What We Will Cover

• The Challenge of “24 X 7”

• Definitions- A common ground

• The road to “high availability”

• The road to “continuous operations”

• The road to “continuous availability”

• Business process steps to get there

Page 3: Are We Ready For “24 by 7” ?

What we will NOT cover

• If you are in this session to learn about….

• Detailed technical architectures to install, and how to manage High Availability, HA, Fault Tolerance---

• To learn about “speeds and feeds”,“slots and watts”…….

• Then you are in the WRONG session

Page 4: Are We Ready For “24 by 7” ?

The Challenge

• Global economy, internet, increased reliance on computers, server consolidation---------

• Demand/increased expectations to make services and applications accessible by users more hours per day, more days per week.– Distance education, internet registration, etc.

• Users are not tolerant of unscheduled interruptions in service.

• Users are less tolerant of scheduled interruptions in service as well.

Page 5: Are We Ready For “24 by 7” ?

Have you ever been asked?• We need “24 x 7”, so we can just buy another server,

or this fault-tolerant stuff, right?

• We need “24 x 7”, so the system techies can take care of this, right?

• We need “24 x 7”, so can’t YOU just change our batch schedule around, or do some special DBA things?

• If we just bought the GizmoTech Belchfire Z-9000HA cluster, that would give us “24 X 7”, right?

• Why can’t you people in IT do “24 X 7” like everyone else on the internet? It’s an IT problem, right?

Page 6: Are We Ready For “24 by 7” ?

The Reality is---

• Achieving “24 x 7” requires a multi-dimensional strategy.

• Cannot be bought off the shelf.• Requires substantial levels of cross-

organizational people,process planning, discipline, and control. (unlike most IT projects)

• Is expensive. • Fewer than 20% will achieve by 2005.

(GartnerGroup)1

Page 7: Are We Ready For “24 by 7” ?

Definitions

• Reliability• Performance• Schedule of Operations• Availability• High Availability• Continuous Operations• Continuous Availability• Fault tolerance

Page 8: Are We Ready For “24 by 7” ?

Reliability

• The extent to which the application/service provides the same results on repeated trials.

• Provides consistent, correct results.

• Not the same as Availability.

Page 9: Are We Ready For “24 by 7” ?

Performance

• The amount of elapsed time for the service or application to provide the information or result to the end user.

• Performance is not an absolute measure, it is relative to the end user.

• Performance is acceptable if it allows the end user to be productive in his/her work. Ask- they’ll tell you.

Page 10: Are We Ready For “24 by 7” ?

Schedule of Operations(scheduled time)

• The negotiated, agreed-to, and published schedule (days, hours per day), that the application or service is to be accessible by the customer.

• Not all applications are scheduled to be accessible 24 hours per day, or 7 days per week, nor need they be.

• Thought: Should the service be accessible outside the schedule, even though it could be?

Page 11: Are We Ready For “24 by 7” ?

Availability

• “The percent of the time that the application or service is actually accessible by the customer, within the schedule of operations.”

• “The proportion of time that a system can be used for productive work.”

• Availability implies reliability and acceptable performance.

Page 12: Are We Ready For “24 by 7” ?

Availability ChartBased on 24 X 7 X 52 Schedule

Availability Unscheduled Down Time per Year

Percent Minutes Minutes Hours 

90.0% 50,000 52,560 876

99.0% 5,000 5,256 87.6

99.9% 500 525 8.76

99.99% 50 52 .87

99.999% 5 5 .08

99.9999% .5 .5 .008

99.99999% .05 .05 .0008

Page 13: Are We Ready For “24 by 7” ?

Availability ChartFor a schedule of 6 am – 9 pm, 6 days/week

Unscheduled Down Time per Year

Percent Minutes Hours

• 90% 28,080 468

• 99% 2808 46.8

• 99.9 280 4.68

• 99.99% 28 .46

• 99.999% 2 .046

Page 14: Are We Ready For “24 by 7” ?

High Availability

• Gartner Group: “A highly available application provides user access to applications and data a high percentage (e.g. 99 percent or greater) of scheduled time, despite unscheduled events.”

 • IBM: “High Availability isn’t a specific

technology but is instead a balanced solution that addresses the people, process, and technology issues for specific systems.”

Page 15: Are We Ready For “24 by 7” ?

High AvailabilityMicrosoft :

• “One way to understand high availability is to contrast it with fault tolerance. These terms describe two different benchmarks measuring availability. Fault tolerance is defined as 100% availability 100% of the time, regardless of the circumstances. A fault tolerant system is designed to guarantee resource availability.

• In contrast, a high-availability system is concerned with maximizing resource availability. A highly available resource is available a very high percentage of the time and may even approach 100% availability, but a small percentage of down time is acceptable and expected.

• High availability can be defined as follows:A highly available resource is almost always operational and accessible to clients.”

Page 16: Are We Ready For “24 by 7” ?

High Availability

• Unmanaged 90.0%• Managed 99.0%• Well Managed 99.9%• Fault-Tolerant 99.99%• High Availability 99.999%• Very High Availability99.9999%• Ultra-Availability 99.9999%

Source: Strategic Research Corp.

Page 17: Are We Ready For “24 by 7” ?

Measuring Availability

• Most organizations do not currently measure end to end, but need to do so will increase.

• How do you really measure?• What if individual workstations down?• What if some application components down? • Weight by number of users?• Who/where are the users anyhow?• What’s good enough? Available to network? • What’s acceptable? Who says? • User Surveys. Can be a good source. • Thought : Are SLA’s in higher-ed good,

or a guarantee of failure?.

Page 18: Are We Ready For “24 by 7” ?

Continuous Operations• Architecting an application and the process

components to schedule the application to allow user access during expanded hours, often 24 hours per day, 7 days per week, or near 24/7.

• Continuous Operations does NOT imply high availability.

• Addresses minimizing planned downtime.

Page 19: Are We Ready For “24 by 7” ?

Continuous Availability

• The combination of High Availability and Continuous Operations.

• If we schedule and enable the application to allow user access 24 x 7 (or near 24 x 7), and also want the application to achieve availability of 99% or greater, then this is continuous availability.

• Addresses unplanned and planned downtime• Probably closest to what is meant by “we

need 24 X 7”.

Page 20: Are We Ready For “24 by 7” ?

Availability BenchmarkSource: (GartnerGroup) 2

Unplanned/yr Planned/yr

• Average 175+ hours 250+ hours

98%

• Very Good 87 hours 200 hours

99%

• Outstanding 43 hours 50 hours

99.5%

• Best in Class*9 hours 12 hours

99.9%

*Fewer than 5 % achieve Best in Class

Page 21: Are We Ready For “24 by 7” ?

Fault Tolerance• The entire system and all the resources that

are needed for an application to run must be duplicated. Cannot afford any downtime.

• To eliminate all planned, unplanned downtime.• As a result of this complete replication, fault-

tolerant systems are much more expensive than highly available systems.

• E.G. air traffic control, life support systems.• At IU, no strong business case apparent.• Thought: Are there any true business cases

for Fault Tolerance in higher-ed?

Page 22: Are We Ready For “24 by 7” ?

The Road to High Availability

• Simple: Reduce unplanned downtime!

Page 23: Are We Ready For “24 by 7” ?

The Road to High Availability• 80% or more of unplanned downtime is the result

of People and Processes, NOT hardware or O/S failures……

• Application failures• Software failures, errors in configurations• Scheduling errors• Operator errors• Out of space conditions• Batch prevented OLTP from being available on time• Data corruption• Unexpected or unplanned volumes

Page 24: Are We Ready For “24 by 7” ?

The Road to High AvailabilityTo address the 80%, invest money/time in:

– Staffing, Training– Change management– Problem management– Job scheduling, restart procedures– Intelligent event management, tuning– Application architecture– Function, regression, integration, load testing– Test and time recovery scenarios– Production readiness reviews, standards– Application planning, capacity planning

Page 25: Are We Ready For “24 by 7” ?

The Road to High Availabilitysome technology stuff….

• Minimize SPOF- Single Point Of Failure• Environmental, facilities, network• Web load balancers, redundant dispatchers• RAID: level 5/0/1, mirroring, striping• ECC data protection• On site spares, hot swappable parts• “HA” solutions, clustering, auto fail over• Data Base replication, cloning• Oracle Parallel Server- OPS

Page 26: Are We Ready For “24 by 7” ?

The Road to Continuous Operations(Expanding the Scheduled Time for accessibility)

• Understand the application architecture and constraints.

• Understand all application dependencies and interrelationships to needed components.

• Reduce batch interference.• Confront the “backup problem”.• Hot backup strategies, cloning, SAN’s.• Manage other planned changes.

Page 27: Are We Ready For “24 by 7” ?

The Road to Continuous Operations

Manage the Planned downtime:• Infrastructure and facility work• Hardware changes and upgrades• Operating system level changes• Database changes and releases• Application changes and releases- “release

tolerance” a key item• Increased need for infrastructure test

environments. To some this is new.• Common “maintenance windows”• Expect increased coordination, staff overhead

Page 28: Are We Ready For “24 by 7” ?

The Road to Continuous Availability

• Application availability dependent on design.– Transaction queuing, batch processing– Release tolerance, recovery

• Set schedule and availability expectations early.

• Have ‘some’ functions up 24 x 7, not all.• Continuous availability cost about 3.5X as

much as a standard application. (GartnerGroup) 1

Page 29: Are We Ready For “24 by 7” ?

The Common Maintenance Window• Applications are interrelated and integrated with

others more than ever.

• Shared infrastructure elements are more common.

• Managing a maintenance window for each application can be exceedingly complex.

• A common maintenance window for infrastructure activity can be beneficial.– Saves negotiating time, sets expectations

Page 30: Are We Ready For “24 by 7” ?

Putting it all together

• Now that you know some definitions, lots of numbers, and components to address, how do you get started on the road to “24 X 7”?

• The following Business Process Steps

represent an approach for IU.

Page 31: Are We Ready For “24 by 7” ?

Step 1 Define the Problem• A problem well defined is a problem 80% solved.

• For each application area, determine what the problem/goal is with the correct user representative(s) .

– Determine the schedule goal.– Separately, determine the availability goal.

• Schedule and availability should be determined and designed in up front, just like any other application functional requirement. It’s more costly to retrofit.

Page 32: Are We Ready For “24 by 7” ?

Step 2 Categorize

• Categorize the applications into groups.

• For Example….

– Business Support Systems– Operational Support Systems– Self Service/E-Commerce– Management Support Systems

Page 33: Are We Ready For “24 by 7” ?

“Business Support” System

• Mon-Fri: 6:00 a.m. to 10:00 p.m. EST• Sat: 6:00 a.m. to 6:00 p.m. EST• Sun: Normal maintenance window• Batch updates, data refreshes

Page 34: Are We Ready For “24 by 7” ?

“Operational Support” Systems

• Round-the-clock operations, such as physical plant, security, hospitals

• Near 24x 7 schedule• Occasional Sunday morning maintenance• Monthly cold backups• Batch, backups non-disruptive to users• Accessible about 8700 hours/year• The most extended schedule

Page 35: Are We Ready For “24 by 7” ?

“Self Service/ E-Commerce”

• Near 24 by 7 schedule• Can tolerate 1-2 hours down per night• Accessible from 148 to 156 hours per week• Batch and backups during 1-2 hours per day

Page 36: Are We Ready For “24 by 7” ?

“Management Support” Systems

• Systems used by “management” for such activities as reporting, queries.

• Same schedule as Business Support Systems

Page 37: Are We Ready For “24 by 7” ?

Step 3 Know the Applications

• Understand each application’s architecture, constraints, “release tolerance”, flexibility to change.

• In-House vs. purchased.

• Know the applications dependencies on other applications and components.

• Architecture Diagrams, data flows are key.

Page 38: Are We Ready For “24 by 7” ?

Sample Architecture Diagram

Page 39: Are We Ready For “24 by 7” ?

Step 4 Know the Baseline • What is your current SOP with respect to

technology? Procedures? Testing?• What is your current availability? What can

you expect with existing budget?• If you haven’t already, at least start

measuring something.• Identify root causes of unplanned downtime.• What are infrastructure constraints on

expanding schedule?

Page 40: Are We Ready For “24 by 7” ?

Step 5 Know the Costs• What improvements can you make from existing budget?

Training, testing, Q/A, etc.

• Invest in the right areas for you to expand schedule and availability.

• Know costs to expand schedule beyond baseline to meet goals.

• Know costs to increase availability beyond baseline to meet goals.

– Expect involvement from all areas of IT

Page 41: Are We Ready For “24 by 7” ?

Step 6 The Business Case• Develop a consistent approach to weigh the

business benefits vs. the cost. Maintain focus on the business problem/goal.

• The “Steering Committee” or business owner(s) of the applications need to determine the business need.

• It’s difficult to cost and plan for applications individually- categorizing may help.

• Differentiate between “like to have” and true business need. Who pays?

• May not be any “quick fix”.

Page 42: Are We Ready For “24 by 7” ?

Step 7 Execute The Plan• Have Commitment.

– Sr. management commitment– Front-line management commitment

• Define the resources, people, budget, etc.• Define ownership.• Develop, document a typical plan, with goals,

activities, responsibilities, dates, etc.– Make it part of existing project plans

• Manage and adjust.• Measure actual vs. goal.

Page 43: Are We Ready For “24 by 7” ?

In SummaryWhat we have covered

• The challenge of “24 X 7”

• Definitions- standard terminology

• Elements of achieving “high availability”

• Elements of achieving “continuous operations”

• Business Steps as an approach to proceed

Page 44: Are We Ready For “24 by 7” ?

Questions and Discussion

This presentation can be accessed(24 x 7?) at:

www.indiana.edu/~uis/cumrec

Dennis Cromwell

email: [email protected]

Mike Egolf

email: [email protected]

Page 45: Are We Ready For “24 by 7” ?

ReferencesGartnerGroup: Building Continuous Availability Into E-Applications,

COM-12-1325, 29 September 2000, D. Scott, Y. Natis

GartnerGroup: Availability: How Do Your Applications Services Stack Up?,SPA-12-8280, 17 January 2001, D. Scott

GartnerGroup: High Availability: A Perspective, DPRO-90193, 29 June 2000, Jane Wright, Ann Katan

GartnerGroup: Measuring End-to End Application Service Availability, DF-13-1114, 19 March 2001, D. Scott

GartnerGroup: 24 X 7 E-Commerce Availability, 25H, SYM10, 10/00, Donna Scott

IBM: Helping to keep your critical systems up and running, 7/22/99, IBM Global Services

Page 46: Are We Ready For “24 by 7” ?