workload forecasting and reporting

Post on 03-Jan-2017

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Confidential

Workload Forecasting and Reporting

Damian Ward

Nonstop Solutions Architect / BITUG Vice Chairman

2 Unclassified

• About Me

• About VocaLink

• Part 1 – Some Theory

• Part 2 – Forecasts & Models

− Part 2a – Transaction Volume Forecast

− Part 2b – Improved Transaction Volume Forecast

− Part 2c – Workload Models

− Part 2d – Combining Forecast & Workload models

• Part 3 – Case Study

• Summary

• Questions..? Please feel free to ask as we go through the presentation.

Introduction What am I going to talk about today

3 Unclassified

Introduction About your presenter

• Damian Ward

• 20 years HP NonStop and Payments experience

• Career spanning:

− Operations, Application Programming, System Management, Programme

Management, Technical Specialist, Solutions Architect, Enterprise Architect,

Infrastructure Architect

• Specialities:

− HP NonStop systems and architecture, Enterprise Architecture, Encryption,

Availability Management, ATM Systems, Payments Processing, Capacity Planning,

System modelling, Fraud, Mobile and Internet technologies, Programming,

Emerging Technologies and Robotics

• BITUG Vice Chairman 2011

• BITUG Chairman 2012

4 Unclassified

Introduction VocaLink History

5 Unclassified

Introduction VocaLink History

6 Unclassified

Direct connection to in house processing system

Introduction Card processing landscape

FIS Connex Advantage

Switch with resillient

telecommunication

connections to each

customer

Indirect ATM acquirer and card issuer

connection (via VocaLinkCSB)

ATM and POS international acquiring

and issuing connections via gateway connections to international schemes

Connections to Mobile Operators

Direct connection

to Post Office systems

Connections to

overseas schemes and

banks

Indirect ATM connection (via third

party processor)

via TNS CSB

7 Unclassified

Introduction Transaction Processing Peeks

8 Unclassified

PART 1 – SOME THEORY

9 Unclassified

Some Theory Peak TPS vs Throughput?

• Third slide indicates 482tps peak

• Bell curve, arrival rate, measurements, averaging

periods could all account for this.

• HOWEVER – I am using fictional

transaction summary data based

on real world observations.

• All transaction summary data

used in this presentation is made

up to for the sole purpose of

illustrating the models within this presentation

10 Unclassified

Some Theory Maximum recommended CPU utilisation?

• System response time increases exponentially with utilisation

• Switch time measurements reflect this

• 80% maximum metric used by VocaLink

• Remember normal switch time in order or 0.1 second

• < 1 second is probably acceptable (ATM’s timeout at 30 seconds).

11 Unclassified

The Theory Average CPU utilisation vs Actual CPU utilisation

• When performing “what-if?” type analysis CPU utilisation is generally

considered uniform

• Application Support teams need to ensure a good balance

12 Unclassified

The Theory Priority based OS will save us

• Some would argue that the NonStop OS priority based scheduling makes

this work redundant?

• DP2 a particular issue here.

• Our application is a collection of high priority processes

• Gets busier as a whole

• Single CPU can become saturated with high priority processes

• Negative impact on of rest of application.

• Application function becomes unstable

• CPU imbalance means some CPU’s get saturated before others

• Cross switch transaction time goes up.

• Remember normal switch time in order or 0.1 second

• < 1 second is probably acceptable (ATM’s timeout at 30 seconds).

13 Unclassified

PART 2 – FORECASTS &

MODELS

14 Unclassified

Forecasts and models The information can / should we use

• Actual data from running system NS MEASURE

• Business unit volume forecasts Monthly volumes by service

• SLA volume commitments Where appropriate (ie FPS)

• Application vendor data Not available / reliable

• Hardware vendor information For what if scenarios

• Other models Profile data, ratio’s

• Availability policy Scheme and processing model dependent

• Capacity policy 80% CPU threshold Cross switch time driven

15 Unclassified

• Peak second for every hour

• Rolling 24 month planning horizon.

Forecasts and models The end result

17 Unclassified

Transaction volume forecasting Daily volume (txnsyyyy.xlsx) spreadsheet

• Transaction summary data dating back to 1998

• Actual daily volumes

• Forecast daily volumes

• Tracks actual vs forecast

• Traditionally used to predict volumes prior business taking this role

• This model can only look backwards

• Used to derive annual to peak month and month to peak day transaction

ratios

• Used to derive monthly daily transaction volume distribution

• Model tuned annually

18 Unclassified

Transaction volume forecasting Friday analysis (fridays.xlsx) spreadsheet

• Transaction summary data dating back to 1998

• Analysis of Friday daily volumes

• Actual peak day, hour, minute, second data

• Used to derive peak day to hour, peak hour to minute and peak minute to

second ratios

• Model tuned annually

19 Unclassified

Transaction volume forecasting Derived transaction ratios

• Peak period transaction ratio’s

• Derived from:

• txnsyyyy.xlsx

• fridays.xlsx

• Tuned annually

20 Unclassified

Transaction volume forecasting Business Unit volume forecast

• Business unit provide future

volumes

• Business unit responsible for

these, they have sight of new

business and industry trends so

we don’t need to.

• Forms part of contract between

IT and the business.

• Removes volume prediction

responsibility from IT.

• Based on calendar month.

21 Unclassified

Transaction volume forecasting Business Unit volume inserted in txnsyyyy.xlsx

• Business forecast volumes plug into transaction (txnsyyyy.xlsx) model

unchanged

• Month to peak Friday ratio used to predict peak Friday volume

22 Unclassified

Transaction volume forecasting Daily transactions worksheet takes values from business forecast

• Peak Friday volume plugged into daily

volume prediction worksheet.

23 Unclassified

Transaction volume forecasting Remaining Fridays populated using ratios

• Remaining Friday volumes predicted

using Friday ratios

24 Unclassified

Transaction volume forecasting Remaining weekdays populated using ratios

• Remaining daily volumes predicted

using week day ratios

25 Unclassified

Transaction volume forecasting Ratios used to calculate hour, minute, second volumes

• Peak Hour, minute and

second calculated using

ratios.

26 Unclassified

Transaction volume forecasting Ratio’s recap..

• A brief example showing the ratios at work

27 Unclassified

PART 2B – IMPROVED

TRANSACTION VOLUME

FORECAST

28 Unclassified

Improved transaction volume forecasting Peak day transaction distribution profile

29 Unclassified

Improved transaction volume forecasting Profile used to generate hourly volumes

• Daily volumes now distributed

according to daily profile.

• Derives max tpm per hour

30 Unclassified

• Peak second per hour derived from peak

minute per hour.

• The 2 models validate each other.

Improved transaction volume forecasting Ratios drill down to peak second per hour

31 Unclassified

PART 2C – WORKLOAD

MODELS

32 Unclassified

• Gives the business the ability to predict future machine utilisation.

• Allows adequate time to prepare for known volume growth,

− ie following new business take on.

− New product launch

• Allows the business to perform what if analysis.

• Allows for application benchmarking and comparison pre / post changes.

Workload Models Why create a workload model

33 Unclassified

Forecasts and models Raw NonStop Measure Report

1 * ?dictionary perfdict

2 * ?assign process to process

3 * open process;

4 * list by volume noprint, by subvol noprint, by filename noprint

5 * by volume nohead as a8

6 * by subvol nohead as a8

7 * by filename nohead as a8

8 * count (subvol over filename) nohead AS "M<ZZ9>"

9 * sum (cpu-busy-time over filename) nohead AS "M<ZZZZZZZZZ9>"

10 * sum (messages-sent over filename) nohead AS "M<ZZZZZZZ9>"

11 * sum (messages-received over filename) nohead AS "M<ZZZZZZZ9>"

12 * sum (recv-qtime over filename) nohead AS "M<ZZZZZZZZZZZ9>"

13 * ;

$AOS10 ZYQ00000 Z00006BX 2 12214910 30987 0 0

$AOS11 AT67POBJ N50Q 15 12237880 44086 7968 2538348

$AOS11 AT67POBJ SETLQ 5 0 0 0 0

$AOS11 AT67POBJ TIDELQ 3 306420 770 230 16163

$AOS11 AT67POBJ TRITON1Q 1 119767 314 99 1606

$AOS11 AT67POBJ TRITONQ 7 2155984 5812 2004 2472860

$AOS11 BA67POBJ EXTRQ 3 0 0 0 0

$AOS11 BA67POBJ HISO1Q 10 194172 280 184 16003

$AOS11 BA67POBJ HISO5Q 1 113841 85 107 29178

$AOS11 BA67POBJ INSHISO 2 14847 0 30 4881

$AOS11 BA67POBJ RIP 1 0 0 0 0

$AOS11 BA67POBJ T24HISO 1 204139 414 336 68292

$AOS11 SW67POBJ LINKQ 3 412824 755 441 46470

34 Unclassified

Forecasts and models Measure report imported into Excel

• Imported Measure data can be quite large.

• Summarised by object subvol and or object name

35 Unclassified

Forecasts and models Measure report imported into Excel

• Measure data summary

• Measure data used to benchmark system.

• Collected each Friday.

• Collected during V&P

testing

• CPU cost per transaction

established.

• Default non core application

“noise” established.

• Safe tps ascertained and

used to feed into other models.

36 Unclassified

PART 2D – COMBINING

FORCAST & WORKLOAD

MODELS

37 Unclassified

Combined forecast and workload Excel conditional formatting used to good effect

• Max tps of 376 used with

Excel “conditional

formatting”

• Danger times are obvious.

38 Unclassified

Combined forecast and workload (n-1) Seeing into the future

• Model can be rolled

forward for as far as the

business can predict.

• Typically 24 months.

39 Unclassified

Combined forecast and workload (n-1) What about failure scenario’s

• Simple maths can be used

to ascertain n-1 system

capacity.

• .

40 Unclassified

Combined forecast and workload (n-1) What about failure scenario’s

• Max (n-1) tps of 345 used

with Excel “conditional

formatting”

• Danger times are obvious.

41 Unclassified

• Impact of process relocations modelled in Excel

• Resultant n-1 impact shown.

Combined forecast and workload (n-1) CPU down capacity, by CPU

42 Confidential

PART 3 – EXAMPLE USE CASE

43 Unclassified

• Assumptions

• High level capacity with first CPU @ 90% Utilisation (n) = 357 tps

• High level capacity @ 90 CPU Utilisation (n-1) between 317 and 352tps

• Average capacity of 333 tps (n-1) used in following illustrations

• CPU fail to fix time 6 hours.

Example use case S Series capacity evaluation

44 Unclassified

(n) and (n-1) illustration, capacity vs workload February 2012 – April 2012 (max tps + RAG for each hour)

(n) (n-1)

45 Unclassified

(n) and (n-1) illustration, capacity vs workload May 2012 – July 2012 (max tps + RAG for each hour)

(n) (n-1)

46 Unclassified

Probability of failure? How to quantify the risk

• Can depend upon your sizing philosophy

• Size for < 80% with 1 CPU down..? Or 95% with all CPU’s up..?

• Impact of incident at quiet time not same as at busy time.

• Deviation from provisioning policy

− (ie >80% (n-1) utilisation forecast in next 12 months)

• System is 12 months from retirement

• Thought exercise performed… presented to management, attempted to

quantify risk.

• When communicating risk.. I recommend you don’t use the phrase

“imagine you’re in a casino..!

when talking to management...

47 Unclassified

• S Series Upgrade Options

Option 1 – Stay as Is

Option 2 – 2 x CPU upgrade

Option 3 – Add 2 x CPU

Option 4 – Migrate to NB50000

Probability of failure? Options considered

48 Unclassified

• Upgrade option comparison.

Probability of failure? CPU down capacity by failed CPU

49 Unclassified

Probability of failure? Lots of Maths (special thanks to Ian Murphy, VocaLink)

50 Unclassified

Probability of failure? Number of danger CPU’s in each hour including fix time

51 Unclassified

Probability of failure? Probability calculations

52 Unclassified

Probability of failure? Number of danger CPU’s in each hour

53 Unclassified

Probability of failure? Number of danger CPU’s in each hour including fix time

54 Unclassified

Probability of failure? Probability of service impacting failure (option 1)

55 Unclassified

Probability of failure? Probability of service impacting failure (option 2)

56 Unclassified

Probability of failure? Probability of service impacting failure (option 3)

57 Unclassified

Summary

• Transaction volume forecasting can be as simple as some rations, or more

complex with profiles.

• Workload and capacity can be modelled with Measure data

• Combine Volume and Workload to great effect

• Don't forget the failure scenarios

• Cheapest way to additional capacity is good n and n-1 CPU balance

• Use workload models in what if scenarios

• Probability of failure can be calculated but mostly academic

• Most of us are in the zero tolerance business, the service cannot fail.

• Especially true once risk identified.

• Many Thanks, Questions..?

58 Unclassified

59 Unclassified

Summary

• Thank you for your attention.

• Questions..?

damian.ward@vocalink.com

top related