software qa metrics dashboard benchmarking

Software Quality Metrics Benchmark Study

How Software Metrics and Dashboards are Applied in High Technology Companies

John Carter

TCGen, Inc.

Menlo Park, CA

www.tcgen.com

May 1, 2012

Root CauseAnalysis

AutomatedMetricsSystem

Normalization

TotalQuality

(Predictability/Features)

UsesExternal

Benchmarks Best

Rest

Release Slip Rate Percentage

Vertical A

xis Label

Horizontal Axis Label

Benchmark

Executive Summary from 10 Public Companies

The purpose of the benchmark study was to capture best practices in the application of SW metrics dashboards.

Ten technology companies were benchmarked against these questions:

• What metrics on software quality are reported to management?

• Internal quality metrics, external field detected metrics?

• How are they normalized? Customers in field, LOC?

• What are the most important?

• Are they tabular, graphical? How many? Are target values shown?

• How frequently are they reported? How many do you report on?

• What are key target values you look at for key metrics?

Key Highlights:

• There is no standard for the number of metrics, type of metrics, nor frequency of reporting

• However there are best practices around Software Quality Metrics – We can look at what separates the best from the rest

• The BEST have

1. Automated metrics tracking and analysis systems that allow drill down and reporting by product, release, customer

2. Normalization that ensures that the metrics are meaningful as the number of customers or the complexity of code increases

3. Root Cause Analysis system that systematically analyzes defects that escape the company and are found in the field

4. Quality metrics that go beyond product defects, and include release predictability and feature expectations

5. External benchmarks that are used to set goals (created by third parties to establish databases or perform surveys)

SWQA_Metrics_Benchmark_TCGen www.tcgen.com 2

3 Highly Regulated

7 from Technology

Networking

Storage

Computer

List of participants

3 Highly regulated companies 7 Networking/Computer/Storage

How We Approached the Analysis

• The Process Capability Maturity Model (CMM) defines five level of process maturity

– Level 1 (Initial, Chaotic)

– Level 2 (Repeatable)

– Level 3 (Defined)

– Level 4 (Managed, Measured)

– Level 5 (Optimizing)

• Metrics are a key parts of the CMM model, and Level 4 indicates mastery of metrics

• SW metrics are well characterized, and are often divided up between Product Quality Metrics, In-Process Metrics, and Metrics for SW Maintenance*

• From our survey of ten companies, we have derived a sense of metrics maturity, and have created our own rating of SW Metrics Maturity using five factors

– Automated, Root Cause Analysis, Normalized, External Benchmarks, and Total Quality (not just defects)

– The Best tend to have excellent scores on all five dimensions, the rest lag behind in one or more areas

– The best tend to have measures in the three areas defined above (Product, In-Process, and Maintenance)


* Stephen Kan, “Metrics and Models in Software Quality Engineering”, Addison-Wesley, 2003

“Best vs. Rest”

Example SW Metrics Maturity

1. Automated metrics tracking and analysis systems that allow drill down and reporting by product, release, customer

2. Normalization that ensures that the metrics are meaningful as the number of customers or the complexity of code increases

3. Root Cause Analysis system that systematically analyzes defects that escape the company and are found in the field

4. Quality metrics that go beyond product defects, and include release predictability and feature expectations

5. External benchmarks that are used to set goals (created by third parties to establish databases or perform surveys)


Root CauseAnalysis

AutomatedMetricsSystem

NormalizationTotal Quality

(Predictability/Features)

Uses ExternalBenchmarks

Best

Rest

The nature of the survey did not allow us to complete this chart for each participant, but this treatment would be very useful to evaluate where you are today and where you should focus

in the future to close gaps between the best and the rest.

Hypothetical Radar Chart: A 5 point scale, where mastery is indicated as a 5 (outermost), and absent is a 0 (innermost)

Dashboard – Drawn from Benchmarking

Title & Description So What Consistent Design

Labeled Axes Target Curves Narrative


Guiding Principles: Each metric should be linked to your overall quality objectives, which were derived from your overall strategy From the Benchmark Sample, the goals might be:

• Increasing Net Promoter Score (how highly you are recommended) • Increasing Release Predictability • Increasing Customer Satisfaction • Increasing Reported Quality (Field Quality) • Reducing time to repair • Reducing the number of Critical Accounts

Each chart has the following graphical properties:

• The charts are composed so that the ‘so what’ is very clear, and repeated for each so that it is clear to managers that only see them once a quarter, so they know why the metric is there and if there is any significance to the data, what the significance is.

• Targets should be on all graphs • Where benchmark data exists, it will also be shown on the chart • Each chart should have the following properties

• There should be between 4-8 metrics

• Two related metrics per screen

• Text describing & analyzing the data represented

Percent of Release Slips

Ve

rtical Axis Lab

el


This chart plots the percentage of actual versus planned schedule for major and minor releases.

• The target is derived to get to less than 5% slip by 2014,

closing the gap in a straight line, coming down from 22% where we are today

• The increase shown in November, 2011is driven by the A.2a release, which had to go through 2 alpha

• We expect a steeper drop in July, 2012 because of our new “Darken the Sky” program to provide requirements stability

• Benchmarking indicates that the best in class number is a slip rate of less than 15% (for 9 month release cycles).

Mean Time to Repair

Ve

rtical Axis Lab

el


This chart plots the average time, in weeks, that the customers had to wait for resolution. Measured in weekly intervals, data captured per release.

• The target is derived to get to the fastest resolution (and

reduce the number outstanding)

• The increase shown in January, 2012 is driven by the A.x release.

• The new methods for engineering releases should impact this in 2013

Benchmark

Major Release 2

Major Release 3

Best Practices

1. Use of third party firms to assess where your software defect performance stacks up against the competition & use of industry standard databases for software quality

2. Test Escapes Analysis Process to perform root cause analysis on all significant escapes to the field

3. SW Defects reported on dashboard includes broader measures like predictability, expectations

4. Automated, integrated system for real time metrics analysis and presentation to management is simply pulling up current data and reviewing it formally

5. Normalization for complexity and or accounts in the field to ensure that proper comparisons are made

6. Create compound metric that pulls together several important factors for the business

7. Institute metrics that show (unit and integration) statement coverage, branch coverage, all tests passing, and for functional testing, show requirements coverage and all tests passing

8. Institute metrics that show defect backlog, number of test cases planned, and Upgrade/Update failure rate, Early Return Index, Fault Slip Through

9. Bug tool kit that goes to the field with exhaustive and searchable data to help customers avoid reporting defects, learn about workarounds, and search with Google like strength

10. If external benchmark targets are not known, track improvement release over release

11. Focus on what is important. One participant only tracks release predictability and customer satisfaction

12. Use parametric estimation metrics – for example 4 days for a test case to ensure high quality, data driven schedule estimates (also helps demonstrate improvements over time)

In benchmarking studies like this, we often see some exemplary practices that demonstrate creative and effective ways to stay ahead.

SWQA_Metrics_Benchmark_TCGen 7 www.tcgen.com

Top 5

Metrics to Consider

Other Tips

Summary Statistics

Key Highlights: • 8 do report customer found defects to management (remaining 2

report customer sat at a high level) • 6 report on the order of 4 metrics to management, the remaining 4

report more or less • 5 include time to market as a metric in their quality dashboard • 4 report escapes or customer found defects caused by bad fixes • 4 companies have real time visibility of metrics, and they are

automatically updated on a daily basis • 3 companies reported on compound metrics that combine reliability,

availability, time to fix • 3 do not use targets for metrics reported to management, but only

report the improvement release to release • 3 normalize metrics (LOC on inside, or Units in Field on outside)


Implications

• Root cause analysis should be performed on defects from the field that are either critical or from regressions – Many companies have special processes for doing this effectively

• It appears that some participants have higher levels of automation and coverage for both unit, integration, and functional test – And it is measured

• Planning metrics, such as the number of days per test case should be used for prediction and improvement

• If you are growing, some normalization should be used. – It should be coarse (like judged Lines of Code, converted from Function Points)

• Walker Survey, Quest Database, and Manager-Tools.com are three recommended vendors for metrics and management – Walker Survey can determine how you stack up against your competitors regarding quality and

satisfaction – Quest is a TL 9000 database – Manager-Tools are helpful for developing QA managers

• Where absolute targets don’t exist, a target curve based on prior improvement should be used to answer ‘are we getting better?’