Managing Performance with APM Acceptance Criteria
Michael Sydor
DOX08S #CAWorld
CA Technologies Service Assurance
DevOps
2 © 2014 CA. ALL RIGHTS RESERVED.
Abstract
Reliable processes for identifying appropriate metrics and validating these metrics via collaboration with the various stakeholders across the application lifecycle remain unaddressed. APM visibility exposes these metrics and directly supports their aggregation as transaction profiles and performance baselines. This establishes a framework for reliable acceptance criteria and also defines the roles and responsibilities through which the stakeholders can collaborate to both validate the monitoring configuration as well as align with business objectives.
Full paper in CATX 2014:: Beyond Deployment Automation - Realizing Dev/Ops Metrics and Collaboration through APM Visibility
Michael Sydor
CA Technologies
Sr. Engineering Services Architect
3 © 2014 CA. ALL RIGHTS RESERVED.
Agenda
THE CHALLENGE FOR DEVOPS INITIATIVE
NON-FUNCTIONAL REQUIREMENTS AND KPIS
ACCEPTANCE CRITERIA LIFECYCLE
CLOSING POINTS
1
2
3
4
4 © 2014 CA. ALL RIGHTS RESERVED.
The Challenge for DevOps: Simply deploying faster can make things worse!
Dev -> QA -> UAT -> Prod
Quarterly
Weekly
Daily
50
140 total
300+ total
5 © 2014 CA. ALL RIGHTS RESERVED.
We Know What Happens From Agile Sprints Issues encountered and resolved while new functionality is introduced.
6 © 2014 CA. ALL RIGHTS RESERVED.
Where are Performance Problems Identified?
Increasing performance testing maturity
Performance test established
UAT established
Production
Pre-production
QA
Performance Visibility introduced
FireFighting practice
NFRs and KPIs Non-functional requirements and key performance indicators
Putting teeth into a DevOps initiative
8 © 2014 CA. ALL RIGHTS RESERVED.
Who Cares About NFRs?
1 - WebSphere Commerce V5.4 Handbook: Architecture and Integration Guide
2 - Mastering the Requirements Process - Getting Requirements Right
REQUIREMENTS WEBSPHERE1
REQUIREMENTS MASTERING REQUIREMENTS2
APM VISIBILITY
General business understanding and objectives
Cultural, look-and-feel, usability and humanity, legal
Applications used in the solution Operational, environmental, maintainability, support
Security Security
Performance Performance
Capacity planning
Scalability
Availability
Testing
Customer (end-user) metrics
9 © 2014 CA. ALL RIGHTS RESERVED.
Baselines Configuration – Do we have a valid monitoring configuration?
Application – Do we have visibility into the key transactions?
Performance – Can we identify KPIs for availability, performance and capacity?
KPIs Suspect – significant because of frequency of execution
Validated – known to correlate with performance issues
Some Terms
10 © 2014 CA. ALL RIGHTS RESERVED.
Lifecycle Visibility Achieved
Pote
nti
al m
etri
cs
KP
Is
Unit test Functional test Stress test UAT Production Triage
10,000 5,000
3,000
1,500 2,500 2,500
20 30 50 10 30 35
60 40 55 45
15
Pote
nti
al
met
rics
K
PIs
Unit test Functional
test Stress test UAT Production Triage
5,500 4,500
40 100
35 75
Suspect KPI Validated KPI
Production-only visibility
11 © 2014 CA. ALL RIGHTS RESERVED.
KPI Management Maturity D
iag
no
stic
va
lue
KPI maturity
(Platform) (Application) (Transaction)
SGCM Stalls,
GC settings,
Concurrency,
Memory management trends
APC Availability,
Performance,
Capacity
EKB Errors,
Key resource performance,
Business transaction survey
12 © 2014 CA. ALL RIGHTS RESERVED.
KPI Evolution
PLATFORM Coarse information ..but not really APM
Application, transactions, resources The APM Advantage
GOOD BETTER (ADDITIONAL) BEST (ADDITIONAL)
Stalls Availability – connected status Errors
GC settings Availability – metric count Key resource performance
Concurrency Suspect performance Business transaction survey
Memory management (graph) Suspect capacity
17 © 2014 CA. ALL RIGHTS RESERVED.
Performance KPIs – Summary
High volume +
significant response time
18 © 2014 CA. ALL RIGHTS RESERVED.
Validation of KPIs
90 minutes before 30 minutes after
Incident confirmed
2 hour window uncorrelated
degraded
correlated
21 © 2014 CA. ALL RIGHTS RESERVED.
Baselines
None
Smoke test
Configuration Application
(transactions)
Performance
Often leads to a QA practice – Functional
Ineffective
No test Smoke test Use case test Performance/Stress test Load-to-failure
Capacity forecast
Often leads to a performance practice
22 © 2014 CA. ALL RIGHTS RESERVED.
Baselines – Summary
Foundation for any significant benefit from APM
You need to establish ‘normal’ before you can consistently triage. Or you need very capable staff and a LOT of experience.
You need to report on what is significant, not simply provide hundreds of metrics and “... Just go figure it out!”
Absence of baselines will reinforce a “why bother with QA” and “test-in-production” mentality.
Danger signs Focus on availability but no performance or capacity interest. Lots of metrics, metric groupings and dashboards but no report templates. You still can’t triage production incidents effectively.
23 © 2014 CA. ALL RIGHTS RESERVED.
Acceptance Criteria – KPIs
None
Package assembly
Stalls Errors Often leads to a QA practice – Functional
Ineffective
No test Smoke test Use case test Performance/Stress test Load-to-failure
Often leads to a performance practice
Memory profile Concurrency Response
time
24 © 2014 CA. ALL RIGHTS RESERVED.
Acceptance Criteria – Summary
Foundation for any pre-production review
You will need to ‘phase-in’ acceptance criteria. App server configuration tuning
Performance advisory
“We saw __X__. It is a potential concern and we will confirm in production.”
Performance exception
“We saw __Y__. It is a problem and you need signoff to continue to production.”
Performance requirement
“We sax __Z__. You cannot continue to production.”
Danger signs Lots of criteria but no process for remediation prior to production or confirmation in production.
25 © 2014 CA. ALL RIGHTS RESERVED.
Configuration baseline
Performance baseline
Application baseline
NFRs
FRs
Use cases
Compatible APM configurations
Suspect KPIs
Security
Scalability
Capacity plan Stress test
Certification
Hierarchal dashboards
Baseline report Management
module
Pre-production Checklist
Overhead absent Excess metrics absent Suspect KPIs identified Availability alert defined Acceptance criteria evaluated (performance) Saturation alert defined (scalability) Capacity alert defined Failover capability assessed Security certification Overview Architecture/operations Triage view Resource view Visibility assessed (transaction trace completeness) Business transaction definition
26 © 2014 CA. ALL RIGHTS RESERVED.
Pre-production Checklist
Overhead validated Excess metrics absent validated Suspect KPIs validated Availability alert validated Acceptance criteria validated Saturation alert validated Capacity alert validated Failover capability validated Security certification Overview validated Architecture/operations validated Triage view validated Resource view validated Visibility validated (transaction trace completeness) Business transaction definition validated
Pre-production review
Operational period
Incident
Triage and root-cause
Post-production review
Validation
Application audit
28 © 2014 CA. ALL RIGHTS RESERVED.
Resources
Community site
Cookbook: APM HealthCheck
Understanding which metrics matter (KPI discussion)
Cookbook: Application audit
More details on the baseline techniques and process
Blog entries
Redefine triage by learning the golden nuggets of APM...
What are KPIs and how can I get some quick?!
Big Data – What does it mean for APM????
Why does ABA find anomalies when there is nothing wrong in production?
APM best practices – Realizing Application Performance Management
available on Amazon.com and Apress.com
Baselines, test plans, app audits, triage, firefighting
Organizational models, service catalogs
29 © 2014 CA. ALL RIGHTS RESERVED.
Summary A Few Words to Review
Key topics
You cannot expect to deploy quicker to get better app quality.
APM gives you visibility into NFRs and KPIs.
Acceptance criteria is how you will harness DevOps deployment acceleration.
Findings
APM documents NFRs and KPIs.
Acceptance criteria pre-production allows for true proactive management of the app lifecycle.
Experiences
Agile techniques show what will happen without viable acceptance criteria.
KPIs are easy to find and manage via baselines.
Baselines make reporting and triage more effective.
30 © 2014 CA. ALL RIGHTS RESERVED.
For More Information
To learn more about DevOps, please visit:
http://bit.ly/1wbjjqX
Insert appropriate screenshot and text overlay from following “More Info Graphics” slide here;
ensure it links to correct page DevOps
31 © 2014 CA. ALL RIGHTS RESERVED.
For Informational Purposes Only
© 2014 CA. All rights reserved. All trademarks referenced herein belong to their respective companies.
This presentation provided at CA World 2014 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer references relate to customer's specific use and experience of CA products and solutions so actual results may vary.
Terms of this Presentation