the business - it alignment challenge - unifr.ch · the business - it alignment challenge ... the...

37
08.02.07 The Business - IT Alignment Challenge Being precise with uncertainty - a Fuzzy-Intuitionistic Approach Roland Schütze 23.09.2014

Upload: danganh

Post on 08-Apr-2018

220 views

Category:

Documents


3 download

TRANSCRIPT

08.02.07

The Business - IT Alignment Challenge

Being precise with uncertainty- a Fuzzy-Intuitionistic Approach

Roland Schütze

23.09.2014

Jean Hennebert

Problem Statement and Background1

SLA Definition and Measurements2

Multi-Layered SLA Translations3

Practical Considerations4

Fuzzy Mapping of Quality Aspects5

2

Background and Problem Statement1

Jean Hennebert

Engineering education and process improvement actions always stress the importance of ”SMART”ness of requirements.

The original use of this acronym was by George T. Doran, in an article about management goals and objectives. Today the acronym is mostly used to stress the specificity and measurability.

The meaning of SMART :• Specific• Measurable• Assignable (Achievable, Attainable, Action oriented, Acceptable, Agreedupon, Accountable)

• Realistic (Relevant, Result-Oriented)

• Time-related (Timely, Time-bound, Tangible, Traceable)

”SMART”ness of requirements

Jean Hennebert

The ”Fuzzy” needs of the User

The customer needs are often ill-defined or fuzzy. The need for specific and verifiable user requirements is obvious. The smartening process of fuzzy requirements often significantly increases the understanding of the requirements requirements, mostly due to the need to articulate everything explicit.

Example: Gerrit MullerEmbedded Systems Institute, Netherlands

Jean Hennebert

5

Jean Hennebert

6

Who is the customer?

Jean Hennebert

Problem Statement and Background1

SLA Definition and Measurements2

Multi-Layered SLA Translations3

Practical Considerations4

Fuzzy Mapping of Quality Aspects5

7

Background and Problem Statement1

Jean Hennebert

For “Service Quality“ several measurement

and delivery criteria can be defined

Example SLA Critical Performance Indicator for User Help Desk : x% of Urgent Incidents found and fixed or Workaround established within 4 Service Hours. Measured from the time of incident raised to the time of incident resolution when logged in the workflow system (closing the ticket).

Jean Hennebert

SLA Measurements and Metrics

A central concept of the quality of service management is adaptive penalization of individual requests according to the current degree of SLA conformance

The conformance is monitored per service class, that is, for each transaction type invoked by an individual customer and the associated SLA. We define conformance c as

c = Number of timely transaction invocations / Total number of invocations of the transaction

In practice, so-called step-wise SLAs are commonly used to specify the QoS requirements of a service class. The SLAs consist of one or more percentile constraints and an optional deadline constraint. Percentile constraints require e.g. n% of all service requests to be processed within x seconds.

Jean Hennebert

SLA, an example

Online Services Availability Minutes of service unavailability

Period 1 definition: MON-FRI 8-18

Period 2 definition: other

Observation interval 1 YEAR:

“Inappropriate” SL: more than 523 min/year in period 1, more than 680

in period 2

“Insufficient” SL: more than 756 min/year in period 1, more than 983 in

period 2

“Unsuitable” SL: more than 1.047 min/year in period 1, more than 1.361

in period 2

Observation interval 1 MONTH:

“Inappropriate” SL: n/a

“Insufficient” SL: n/a

“Unsuitable” SL: more than 209 min/month in period 1, more than 272 in

period 2

Jean Hennebert

SLA, more examples

Online Services Performance Transactions mean response time ≤ 2,5 sec

Maximum percentage of transactions ending in more than 1 sec =

5%

DR Service RTO (Recovery Time Option):

Applications A, B, C, ... restarting in 2 hours after the disaster formal

statement

Applications X, Y, Z, ... restarting in 24 hours after the disaster formal

statement

RPO (Recovery Point Option):

No data loss for applications A, B, C, ...

Maximum data loss for applications X, Y, Z, ... updates in the last hour

before the disaster

Jean Hennebert

Efficient Service Level Targets based on Business Impacts

Enforcement of

SLAs should be

closely related to the

estimated business

impact caused by a

SLA breach

Different outage

durations lead to

significantly

different business

cost.

Source:

KSRI: Leveraging Service Incident

Analytics to Determine Cost-Optimal

Service Offers

2011SRII Towards Service Level

Engineering for IT Services - Defining IT

Services from a Line of Business

Perspective A. Kieninger

Jean Hennebert

IT Services deliver to Availability Contracts

13

Availability

Security (and Compliance)

Failure Management

Reliability

Recoverability

Scalability

Maintainability

Operability

Performance

Availability % Downtime / yr Downtime / mon* Downtime / wk

90% ("one nine") 36.5 days 72 hours 16.8 hours

95% 18.25 days 36 hours 8.4 hours

98% 7.30 days 14.4 hours 3.36 hours

99% ("two nines") 3.65 days 7.20 hours 1.68 hours

99.50% 1.83 days 3.60 hours 50.4 minutes

99.80% 17.52 hours 86.23 minutes 20.16 minutes

99.9% ("three nines") 8.76 hours 43.2 minutes 10.1 minutes

99.95% 4.38 hours 21.56 minutes 5.04 minutes

99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes

99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds

99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds

* based on 30 day month

Availability % Downtime / yr Downtime / mon* Downtime / wk

90% ("one nine") 36.5 days 72 hours 16.8 hours

95% 18.25 days 36 hours 8.4 hours

98% 7.30 days 14.4 hours 3.36 hours

99% ("two nines") 3.65 days 7.20 hours 1.68 hours

99.50% 1.83 days 3.60 hours 50.4 minutes

99.80% 17.52 hours 86.23 minutes 20.16 minutes

99.9% ("three nines") 8.76 hours 43.2 minutes 10.1 minutes

99.95% 4.38 hours 21.56 minutes 5.04 minutes

99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes

99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds

99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds

* based on 30 day month

Jean Hennebert

Groups of KPIs

Business

Activity

1. Ticket

Resolution time

2.

3.

...

Technical

1. Throughput

2.

3.

...

KPIs

Operational /

Application

1. Response Time

2. Availability

3.

...

Finance

Revenue

Environmental

KPI

HR KPI

Jean Hennebert

the quality objective is defined and measured via KPI‘s

which can be negotiated within a SLA

The quality objectives are defined within the KPIs (Key Performance Indicator). The KPI is derived from a number of sources, including performance metrics of the service or underlying support services as PI (Performance Indicator). As a service or application is supported by a number of service elements, a number of different PI may need to be determined to calculate a particular KQI

Example of objective Service KPIs with quantitative measurements :

Availability : The service is available for use at the time required. As a KQI, this includes all aspects of the service, physical terminal availability, network, etc.

Response Time :How quickly the service responds to an internal or external stimulus.

Transaction Rate : Rate that the system or service can service requests..

Throughput : The total amount of information that is offered to the system. Throughput includes all processed information, including retries and replications

Example of objective Service KPIs with qualitative measurements : Authorization :The system is only available to authorized resources (information and

personnel) at times allowed.

Confidentiality : Information can only be seen by those intended to see the information.

Integrity : Information is available as required and has not been changed from the original.

Example of subjective (perceptive) KPIs :

terms like “80% of respondents should respond Satisfied or Higher”

verbal/linguistic expressions “acceptable”, “good”, “excellent”

Being subjective, KQI parameters can be hard to include as a contractual requirement.

Jean Hennebert

Problem Statement and Background1

SLA Definition and Measurements2

Multi-Layered SLA Translations3

Practical Considerations4

Fuzzy Mapping of Quality Aspects5

16

Background and Problem Statement1

Jean Hennebert

User Experience: Interaction or Irritation

(Component and System View)

To guarantee business-focused SLAs results in optimization problem solving across multiple domains. The landscape of today's IT service. providers is inherently integrated. It consists of all kinds of elements, namely networks, servers, storage, and software stacks.

Jean Hennebert

Flow of Requirements18

Why

Business

Impact

Jean Hennebert

KQI/PI Association Hierarchy Graph

The automated process oftranslating and correlating high-level requirements and policies ofall kinds down to infrastructurelevel creates a set of related PIs,which we term now a Key Quality/ Performance Indicator (KQI/PI)Hierarchy.

The KQI/PI Association Graph, orKQI/PI Hierarchy for short, is adirected graph representing theassociation relationships betweensets of KQI/PIs within (or across)tiers in a multi-tier architectureas well as across multi-stakeholder domains.Introduction of service quality parameters (KQI)

rather the individual component performance(PI). This concept is described in the “OpenGroup SLA Management Handbook. Volume 4:Enterprise Perspective”

Jean Hennebert

In very most cases the KQI/PI relationship cannot be

mathematically described. For instance, when extending

the response time of as DB query by 1 second, this may

lead to an additional delay of half-a-second in the business

service response time to the end-user.

Having multiple PI parameters Pn, a formula as f(P1; P2; … ;Pn) = F(Q1; Q2) may in

theory be determined to calculate KQI parameters Qn [The Open Group 04].

The KQI is derived from a number of information

sources, including metrics for calculating the

performance of the service or derived from metric of

underlying services as PI. In general way a KQI is

defined from a set of PIs and each PI or KQI will have

upper thresholds and lower thresholds of warning

("Lower Warning/Upper Warning") and error ("Lower

Error/Upper Error")

The concept of Key Quality - and Performance Indicators

20

Jean Hennebert

ITSM – Service Delivery

Service Level Management - Terminology and Definitions

Service Level Requirements (SLR) – A listing of the customer’s service

requirements (e.g. availability, capacity, financial, criticality, service restoration,

etc.).

Service Level Agreement (SLA) – a written agreement with a customer defining

the service targets and responsibilities of both parties.

Operational Level Agreement (OLA) – a written agreement between two internal

IT areas (e.g. Networks and Service Desk)

Underpinning Contract (UC) – a contract with a 3rd party vendor/supplier that

documents the delivery of services that supports IT in their delivery of service.

Jean Hennebert

SLA,OLA,UC

Organization Management

Customer

Org’s Internal Teams

Vendors

SLA

OLA

UC

Jean Hennebert

ITSM – Service Delivery

Service Level Management

•The process responsible for maintaining and improving IT Service quality through a constant cycle of agreeing, monitoring, and reporting to meet customers’ objectives.

•Provides us and our customers a clear and consistent understanding and expectation of the level of service required to provide a quality product.

• Through these methods, a better relationship between IT and the customers can be developed.

Jean Hennebert

Problem Statement and Background1

SLA Definition and Measurements2

Multi-Layered SLA Translations3

Practical Considerations4

Fuzzy Mapping of Quality Aspects5

24

Background and Problem Statement1

Jean Hennebert

Enterprise class IT systems have many components

25

Jean Hennebert

approach for loosely coupled

services26

Process invoice

WSDL XML SOAP UDDI

Open standards

SuppliersCustomers

Approve/

reject credit

Process invoice

Collect

A/R

Check

inventory

Receive

PO

Apply for

credit

Check for outstanding

A/R

Check print

history

Business

partners

Check order

status

Ship

goods

Fulfill order

Jean Hennebert

Today’s IT systems behave like “Complex Systems1”

“Complex systems” are systems whose behaviour is perceived2 to be

complicated. They typically consist of

Many elements (enterprise IT will have 2-10M Configuration Items)

Many relationships between elements

Nonlinear and discontinuous relationships

Incomplete information about of elements and relationships

1 Complexity science: http://informatics.indiana.edu/rocha/complex/csm.html http://en.wikipedia.org/wiki/Complex_system. 2. Complexity is perceived because apparent complexity can decrease with learning. Helicopters can be flown with training / practise.

Application

Change

Hardware

UpgradeHardware

Upgrade

Hardware

UpgradeHardware

Upgrade

OS

Upgrade

DBMS

Upgrade

Application

Change

Re-

integrate

Re-test

Time

Efforts Ops

ProcessCutover

Plan

OS

Upgrade

OS

Upgrade

OS

Upgrade

App Server

Upgrade

Middleware

Upgrade

Problem

Fix/Recover

Back-out

Application

Change

Hardware

UpgradeHardware

Upgrade

Hardware

UpgradeHardware

Upgrade

OS

Upgrade

DBMS

Upgrade

Application

Change

Re-

integrate

Re-test

Time

Efforts Ops

ProcessCutover

Plan

OS

Upgrade

OS

Upgrade

OS

Upgrade

App Server

Upgrade

Middleware

Upgrade

Problem

Fix/Recover

Back-out

Application

Change

Hardware

UpgradeHardware

Upgrade

Hardware

UpgradeHardware

Upgrade

OS

Upgrade

DBMS

Upgrade

Application

Change

Re-

integrate

Re-test

Time

Efforts Ops

ProcessCutover

Plan

OS

Upgrade

OS

Upgrade

OS

Upgrade

App Server

Upgrade

Middleware

Upgrade

Problem

Fix/Recover

Back-out

Application

Change

Hardware

UpgradeHardware

Upgrade

Hardware

UpgradeHardware

Upgrade

OS

Upgrade

DBMS

Upgrade

Application

Change

Re-

integrate

Re-test

Time

Efforts Ops

ProcessCutover

Plan

OS

Upgrade

OS

Upgrade

OS

Upgrade

App Server

Upgrade

Middleware

Upgrade

Problem

Fix/Recover

Back-out

Tight interdependencies between components result in cascading impact of change and therefore complexity, cost, risk and therefore the unexpected, including failure

Example: A large bank spend 80% of their IT budget managing their existing systems / infrastructure

Source: Increasing Client Capacity

for Change, TT Assessment, Jenny

Choy, Jan, 2007

27

Source: Gartner Group

Application40.0%

Process40.0%

Hardware10.0%

Operating Systems10.0%

Source: Gartner Group

Unscheduled Outages

“An average of 80 percent of mission-critical application service downtime is directly caused by people or process failures. The other 20 percent is caused by technology failure, environmental failure or a disaster. Source: Gartner

Change – new app, workload, technology, people,

procedure, fewer people - is most often the trigger.

Jean Hennebert

Managing systems is costly, complex and remains labour intensive

28

Enterprise IT systems are complex

They require (a lot of) “managing systems” which also have to be managed and infrastructure

Support organisations are split by discipline

Coordination essential to make / resolve anything other than simple change / problem

E.g. alter tablespace to use new container affects DBA, storage, potentially network support, operations

Labour – dominated by effort to fix problems and make changes – dominates IT cost

Support cost related to number of moving parts – i.e. things to manage; OS, instances, subsystems

Partial explanation for drive to “as a Service” and Cloud delivery models

TCO Model

De

sig

n

Co

de

Bu

ild

Ru

n

Ru

n

Ru

n

Ru

n

Ru

n

Ru

n

Ru

n

Ru

n

Ru

n

Ru

n

Ru

n

Ru

n

6 12 18 24 30 36 42 48 54 60 66 72 78 84 90

Phase and Time (months)

Cost

Client £ value

Physical

Server Installed

Base (Millions)

Source: IDC, May 2006

Server Mgt and Admin Costs

$0

$50

$100

$150

$200

$250

$300

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

Spending

(US$B)

New Server SpendingNew Server Spending

Power and Cooling Costs

0

5

10

15

20

25

30

35

40

45

50

55

60

60

Virtual+Physical

Server Installed

Base (Millions)

Virtualization

Management

Gap

• Server virtualization will result in a significant

increase in the number of servers (physical + virtual)

to be managed

• The projected increase is not yet reflected in

their forecast of server management costs

Lots of focus on TCA and time to value but TCO and releasing IT budget “lost” to supporting “as is” increasingly

important. IT is an inhibitor to business change.

For clients to allow IT to support the rate of business change needed

For IBM; whoever addresses TCO of their systems and software will allow clients to release IT budget for something more

valuable than supporting current systems

Jean Hennebert

Complexity of multi-layered SLA translations • M2C (Metric to Configuration)

translates in the example the end-

user objective “Response Time” to

the underlying application server

topology, which is needed to ensure

enough capacity to handle the

expected number of requests in time.

• C2C (Configuration to Configuration)

is used to translate here the deploy

option of a web application server to

the supporting DB configuration. A

clustered application server for high

availability topologies needs a

corresponding database

configuration to support the clustered

processing of Java 2 Entity Beans.

• M2M (Metric to Metric) correlates the

high-level metric with lower-level

metrics, here for example the service

objective for application response

time to the required average

database query execution time. For

instance, a sub second end-user

application response time requires

an average DB query execution time

of max half-a-second.

• C2M (Configuration to Metric) we

use this to translate the requested

DB configuration and cluster setup to

the lower-level system parameters of

the Storage Area Network (SAN)

infrastructure with required

bandwidth capacity.

29

Jean Hennebert

Challenge of multi-layered SLA translations

30

Jean Hennebert

Problem Statement and Background1

SLA Definition and Measurements2

Multi-Layered SLA Translations3

Practical Considerations4

Fuzzy Mapping of Quality Aspects5

31

Background and Problem Statement1

Jean Hennebert

Example of Fuzzy Logicfor quality parameters with trapezoidal membership function

• fuzzy set A

• A = {(x, µA(x))| x Є X} where

µA(x) is called the membership

function for the fuzzy set A. X is

referred to as the universe of

discourse.

• The membership function

associates each element x Є X

with a value in the interval [0,1].response time

1.0

µ

0.0

good acceptable bad

If response time is a linguistic variable then its term set is

T(response time) = { good, not good, very good, not very good,…… acceptable, sufficient,… bad, not

too bad, very bad, more or less bad, not very bad,…not very good and not very bad,…}.

Jean Hennebert

For this example, we fuzzify the “response time” performance metric into the fuzzy variables HIGH, LOW and MEDIUM. For the Collaboration tool service, Response time performance is LOW if response time is greater than 10 seconds. It is MEDIUM if response time lies between 3 and 10 seconds. It is HIGH if response time is less than 3 seconds

Performance metrics mapped into fuzzy variables

Jean Hennebert

Fuzzification of Performance Parameters

Thresholds as natural boundaries for fuzzy attributes. For instance, a

set of Performance Indicators values indicating warnings can degrade a

service until it provokes the interruption, then, it would have to be considered

as an error indicating a quality violation.

Jean Hennebert

Quality of Service parameter :

types of membership functions

examples : type of membership function, depending on the property of the

QoS parameter:

QoS parameter - should have Gaussian waveform when missing might cause a

drastic loss of the perception

QoS can have a trapezoidal function when quality remains the same until we

reach a threshold (that is usually referred to as the JND - Just Noticed

Difference) after which the quality starts decaying.

Psychological measures have often best a linear triangular membership function

as they are linearly distributed based on the user.

User Satisfaction is again a Gaussian membership function because of the

normal distribution of human satisfaction measures.

Quality of perception - This can be a simple triangular membership function when

linearly distributed.

Source: A Fuzzy Logic System for Evaluating Quality of Experience of Haptic-Based Applications, 2008

A. Hamam et al, Distributed & Collaborative Virtual Environments Research Laboratory University of Ottawa, Canada

Jean Hennebert

36

Coming next Friday:

Their key contribution is a concept of a new framework that enables the translation of backstage metrics to those at the frontstage.

It captures the dependency of a service on others, or on backend applications and resources. Linguistic rules are then used to define how quality measures of a service at the frontstage relate to those of its resources or the other services it calls. Fuzzy Logic is used to reason over such rules to move from the known hard metrics at the backstage to the soft metrics at the front.

Jean Hennebert

37