empirical study of software engineering results · empirical study of software engineering results...

© 2013 Carnegie Mellon University

Empirical Study of Software Engineering Results

Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 TSP Initiative, SEAP/SDD Sept. 18, 2013

If you are a data lover …

… then this interactive is for you.

3

TSP Initiative Kasunic & Chick │ Sept. 18, 2013 © 2013 Carnegie Mellon University

If you are a data-lover, don’t be shy …

… because you are in great company.

W. Edwards Deming

5


“When it comes to the really important

decisions, data trumps intuition every time.”

Jeff Bezos

6


That’s 12 quadrillion bytes

12 petabytes for data storage

… or 1015 bytes

7


TSP Symposium Attendee

8


Introduction

Format of the “Interactive”

Data Provenance

The Data

Next Steps

Topics

9


Please ask questions!

10


And … we will be asking you questions!

11


12


Looking for Fresh Ideas

Looking for FRESH ideas …

13


All comments are welcome!

14


What is the single most important question that you would want to be addressed through the analysis of TSP data?

Before

16


Introduction


Data Provenance

The Data

Next Steps

Topics

17


What Is Data Provenance?

Provenance, from the French provenir, "to come from," refers to the chronology of the ownership, custody or location of a historical object.

Data provenance refers to a record trail that accounts for the origin of a piece of data (in a document, repository, or database) together with an explanation of how it got to the present place.

Data provenance assists scientists with tracking their data through all transformations, analyses, and interpretations.

18


Master Jim at work … prying the data from the tool.

19


The TSP Data Repository includes two archives of approximately 50,000 files.

• 20-25% of those are supporting documents for a launch or postmortem (e.g., presentations, analysis spreadsheets, lists, surveys, etc.).

• 75-80% of those are TSP team performance data files.

There are more than 50 different file formats.

• At this time, only 22 of the needed file import utilities have been developed.

• Only those that contain TSP cycle or postmortem data were processed.

• About 60% of Archive #1 fit the criteria and have been processed. Archive #2 has not been processed yet.

Tests were conducted to ensure that extracted data represented unique projects.

20


Analysis Based on Teams’ Composite Data The data is composite team data; each record represents data that has been aggregated from individual team member data.

Team composite data

Individual team member data

To Analysis

21


To Analysis

Team composite data

Individual team member data

Statistical Analysis

22


Outlier Analysis

Exploratory analysis yielded some outliers that were removed from some of the analyses.

23


Introduction


Data Provenance

The Data • Team and Project Characteristics • Product Size • Schedule Performance • Quality Indicators

Topics

24


Country Source

93

20

0

10

20

30

40

50

60

70

80

90

100

United States Mexico

25


Project Start Date

Jul-2012Jul-2010Jul-2008Jul-2006Jul-2004Jul-2002Jul-2000

30

25

20

15

10

5

0

Project Start Date

Frequency

n=113

26


Project Duration (Calendar Days)

6005004003002001000

50

40

30

20

10

0

Project Duration (Days)

Freq

uenc

y

Mean = 119.4 Median = 91.0 n = 113

27


80644832160

30

25

20

15

10

5

0

Duration (Weeks)

Frequency

Project Duration (Weeks)

Mean = 16.9 Median = 13.0 n = 113

28


Team Size

181512963

18

16

14

12

10

8

6

4

2

0

Team Size

Frequency

Mean = 8.2 Median = 8.0 n = 111

What other types of “team and project characteristics” analyses would you find valuable?

30


Introduction


Data Provenance


Next Steps

Topics

31


24020016012080400

60

50

40

30

20

10

0

Total Size (KLOC)

Frequency

Actual Total Size

Mean = 24.0 Median = 3.8 n = 111

32


6050403020100

60

50

40

30

20

10

0

Actual Added & Modified Code (KLOC)

Frequency

Actual Added and Modified Size

Mean = 11.7 Median = 6.6 n = 112

33


Another View of the Size Data From the data, calculate the

1. log of each value

2. mean (�̅�𝑥) of values from #1

3. standard deviation (𝑠𝑠) of values from #1

4. exponent of values from #2 and #3.

Values of the relative size table:

Very Small

Small

Medium

Large

Very large

�̅�𝑥 − 2𝑠𝑠

�̅�𝑥 − 𝑠𝑠

�̅�𝑥

�̅�𝑥 + 𝑠𝑠

�̅�𝑥 + 2𝑠𝑠 �̅�𝑥 −𝑠𝑠 −2𝑠𝑠 2𝑠𝑠 𝑠𝑠 M S VS L VL

Bin Width = 1 SD

1 SD

1 SD

1 SD

1 SD

34


0

10

20

30

40

50

287 1,265 5,571 24,532 108,022

Frequency

Added and Modified Lines of Code

Added & Modified Code – Relative Sizes

VS S M L VL

Relative Size

What would you like to see in terms of analyses associated with product size?

36


Introduction


Data Provenance


Next Steps

Topics

What do you think is the average number of weekly task hours that teams are able to accomplish?

38


Mean Team Member Weekly Task Hours

23.620.016.412.89.25.62.0

18

16

14

12

10

8

6

4

2

0

Mean Team Member Weekly Task Hours

Frequency

Mean = 10.3 Median = 9.0 n = 111

39


9075604530150

40

30

20

10

0

Productivity (LOC/Hr)

Frequency

Productivity

Mean = 10.3 Median = 7.1 n = 112

40


Let’s Look At Some Scatter Plots

X

y Using linear regression:

y = mx + b

where: m is the slope b is the y-intercept

r is a measure of the correlation between x values and y values.

Values of r2 ≥ 0.5 indicate a meaningful relationship.

41


There Are A Few Rules …

42


Plan Task Hours Vs. Actual Task Hours

R² = 0.8038

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 2000 4000 6000 8000 10000

Actual Task Hours

Plan Task Hoursn=113

43


Duration: Planned Weeks Vs. Actual Weeks

R² = 0.7401

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100 120

Duration - Actual Weeks

Duration - Planned Weeksn = 113

44


Actual Task Hours Vs. Added & Modified LOC

R² = 0.3012

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

0 2,000 4,000 6,000 8,000 10,000

Actual Added & Modified LOC

Actual Task Hoursn = 113

45


Team Size vs. Productivity

R² = 0.0297

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20

Productivity(LOC/Hr)

Team Sizen = 111

46


Actual Added & Modified LOC Per Staff Week

2000160012008004000

60

50

40

30

20

10

0

Total LOC Per Staff Week

Frequency

Mean = 204.9 Median = 100.9 n = 109

n = 109 Actual A&M LOC Per Staff Week

47


Plan Vs. Actual Hours for Completed Parts

R² = 0.952

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

0 2,000 4,000 6,000 8,000 10,000

Actual Hours for Completed Parts

Plan Hours for Completed Partsn = 113

48


Schedule Growth Beyond Baseline

12080400-40-80

80

70

60

50

40

30

20

10

0

Schedule Growth from Baseline (hours)

Frequency

Mean = 2.8 Median = 0.0 n = 113

49


907560453015

50

40

30

20

10

0

Final Earned Value

Frequency

Final Earned Value

Mean = 85.2 Median = 94.3 n = 110

Are there other types of “schedule performance” analyses that you would like to see?

51


Introduction


Data Provenance


Next Steps

Topics

52


Total Defects Injected Per KLOC

480400320240160800

25

20

15

10

5

0

Defects Injected per KLOC

Frequency

Mean = 54.7 Median = 31.5 n = 79

53


Defect Density – DLD Review

2824201612840

40

30

20

10

0

Defects Per KLOC - DLD Review

Frequency

Mean = 5.2 Median = 2.2 n = 106

54


Defect Density – Code Review

33.628.824.019.214.49.64.80.0

16

14

12

10

8

6

4

2

0

Defects Per KLOC - Code Review

Frequency

Mean = 8.4 Median = 5.2 n = 106

55


120100806040200

50

40

30

20

10

0

Defects Per KLOC - Inspection

Frequency

Defect Density – Code Inspection

Mean = 6.7 Median = 3.3 n = 106

56


Defect Density – Unit Test

847260483624120

35

30

25

20

15

10

5

0

Defects Per KLOC - Unit Test

Frequency

Mean = 5.0 Median = 3.8 n = 106

< 5

57


Defect Density – Build and Integration Test

16.814.412.09.67.24.82.40.0

50

40

30

20

10

0

Defects Per KLOC - Build and Integration Test

Frequency

Mean = 1.7 Median = 0.7 n = 106 < 0.5

58


15129630

60

50

40

30

20

10

0

Defects Per KLOC - System Test

Frequency

Defect Density – System Test

Mean = 1.5 Median = 0.15 n = 106 < 0.2

59


2824201612840

40

30

20

10

0

Defects Per KLOC - DLD Review

Frequency

33.628.824.019.214.49.64.80.0

16

12

8

4

0

Defects Per KLOC - Code Review

Frequency

120100806040200

40

30

20

10

0

Defects Per KLOC - Inspection

Frequency

847260483624120

30

20

10

0

Defects Per KLOC - Unit Test

Frequency

15129630

48

36

24

12

0

Defects Per KLOC - Build and Integration T

Frequency

15129630

60

45

30

15

0

Defects Per KLOC - System Test

Frequency

Defect Density - Summary

60


Defect Density – Median of Defects Per KLOC

0.15

0.7

3.8

3.3

5.2

2.2

0 1 2 3 4 5 6

System Test

Build/Integration Test

Unit Test

Code Inspection

Code Review

DLD Review

Defects Per KLOC (Median)

< 0.2

< 0.5

< 5

61


Actual UT Defect Density Vs. ST Defect Density

R² = 0.0031

0

2

4

6

8

10

12

14

16

18

0 5 10 15 20 25 30

System Test Defect Density

Unit Test Defect Densityn=111

62


The Yield Quality Measure

Defect removal phase

Development phase injects defects Intermediate product with defects

Phase Yield

Phase Yield Defect removal phase


Defect removal phase


The yield of a phase is the percentage of defects removed in that phase.

Phase Yield

63


Yield: Detailed-Level Design Review

1.00.80.60.40.20.0

25

20

15

10

5

0

Yield - Detailed-Level Design Review

Frequency

Mean = 31.6% Median = 29.3% n = 98

64


1.00.80.60.40.20.0

16

14

12

10

8

6

4

2

0

Yield - Detailed-Level Design Inspection

Frequency

Yield: Detailed-Level Design Inspection

Mean = 49.3% Median = 50.0% n = 83

65


0.900.750.600.450.300.150.00

25

20

15

10

5

0

Yield - Code Review

Frequency

Yield: Code Review

Mean = 29.2% Median = 30.1% n = 109

66


0.900.750.600.450.300.150.00

14

12

10

8

6

4

2

0

Yield - Code Inspection

Frequency

Yield: Code Inspection

Mean = 30.3% Median = 26.1% n = 110

67


1.00.80.60.40.20.0

20

15

10

5

0

Yield - Unit Test

Frequency

Yield: Unit Test

Mean = 49.7% Median = 46.2% n = 106

68


Summary: Median Phase Yields

46.2

26.1

30.1

50.0

29.3

0.0 10.0 20.0 30.0 40.0 50.0 60.0

Unit Test

Code Inspection

Code Review

DLD Inspection

DLD Revew

69


Process Yield

1.00.90.80.70.60.50.4

12

10

8

6

4

2

0

Process Yield

Frequency

Mean = 73.6% Median = 73.7% n = 77

70


Review of Some Definitions …

In the TSP:

𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨𝑨 𝑪𝑪𝑪𝑪𝑪𝑪 =𝑹𝑹𝑹𝑹𝑹𝑹𝑨𝑨𝑹𝑹𝑹𝑹 & 𝑰𝑰𝑰𝑰𝑨𝑨𝑨𝑨𝑹𝑹𝑰𝑰𝑰𝑰𝑨𝑨𝑰𝑰𝑰𝑰 𝑻𝑻𝑨𝑨𝑻𝑻𝑹𝑹𝑻𝑻𝑰𝑰𝑰𝑰𝑨𝑨𝑨𝑨 𝑫𝑫𝑹𝑹𝑹𝑹𝑹𝑹𝑨𝑨𝑰𝑰𝑨𝑨𝑻𝑻𝑹𝑹𝑰𝑰𝑰𝑰 𝑻𝑻𝑨𝑨𝑻𝑻𝑹𝑹

× 𝟏𝟏𝟏𝟏𝟏𝟏

𝑭𝑭𝑨𝑨𝑨𝑨𝑨𝑨𝑭𝑭𝑨𝑨𝑹𝑹 𝑪𝑪𝑪𝑪𝑪𝑪 =𝑻𝑻𝑹𝑹𝑨𝑨𝑰𝑰 𝑻𝑻𝑨𝑨𝑻𝑻𝑹𝑹

𝑻𝑻𝑰𝑰𝑰𝑰𝑨𝑨𝑨𝑨 𝑫𝑫𝑹𝑹𝑹𝑹𝑹𝑹𝑨𝑨𝑰𝑰𝑨𝑨𝑻𝑻𝑹𝑹𝑰𝑰𝑰𝑰 𝑻𝑻𝑨𝑨𝑻𝑻𝑹𝑹× 𝟏𝟏𝟏𝟏𝟏𝟏

71


Appraisal Cost of Quality

50403020100

14

12

10

8

6

4

2

0

Appraisal Cost of Quality (Percent)

Frequency

Mean = 26.6% Median = 27.0% n = 86

72


Failure Cost of Quality

75604530150

20

15

10

5

0

Failure Cost of Quality (Percent)

Frequency

Mean = 22.0% Median = 16.9% n = 86

73


R² = 0.0347

0

20

40

60

80

100

120

140

0 10 20 30 40 50 60

Defects Per KLOC Removed During Appraisal Phases

Appraisal Cost of Quality (Percent)

Appraisal: COQ vs. Defects Removed

n=81

74


R² = 0.495

0

100

200

300

400

500

600

700

0 100 200 300 400 500

Defects Removed in Code Review

Actual Time in Code Review (Hours)

Time in Code Review Vs. Defects Removed

n=112

75


R² = 0.6272

0

100

200

300

400

500

600

0 100 200 300 400 500 600

Defects Removed

DuringInspection

Actual Time in Inspection (Hours)

Time in Inspection Vs. Defects Removed

n=112

76


R² = 0.3505

0

50

100

150

200

250

300

350

0 200 400 600 800 1000 1200

Def

ects

Fou

nd

Actual Time in Unit Test (Hours)

Time in Unit Test Vs. Defects Removed

n=112

77


R² = 0.1996

0

50

100

150

200

250

300

0 100 200 300 400 500 600 700

Defects Found

Actual Time in System Test (Hours)

Time in System Test Vs. Defects Removed

n=29

78


Summary: Time in Phase Vs. Defects Removed

Phase Correlation

Appraisal Code Review 0.50 Inspection 0.63

Test Unit Test 0.35 System Test 0.20

Predictive

79


Code Review to Code – Actual Time in Phase

0.900.750.600.450.300.150.00

18

16

14

12

10

8

6

4

2

0

Code Review to Code - Actual Time

Frequency

Mean = 0.3 Median = 0.3 n = 113 > 0.5

80


4.503.753.002.251.500.750.00

25

20

15

10

5

0

Design to Code - Actual Time

Frequency

Design to Code – Actual Time in Phase

Mean = 0.9 Median = 0.8 n = 113 > 1.0

81


Design Review to Design – Actual Time in Phase

1.21.00.80.60.40.20.0

20

15

10

5

0

Design Review to Design - Actual Time

Frequency

Mean = 0.28 Median = 0.23 n = 113 > 0.5

82


3.753.002.251.500.750.00

6

5

4

3

2

1

0

Req. Inspection to Req. - Actual Time

Frequency

Req. Inspection to Req. – Actual Time in Phase

Mean = 0.88 Median = 0.46 n = 19

> 0.25

For “quality indicators,” what additional analyses would you like to see?

What is the single most important question that you would want addressed through the analysis of TSP data?

After

85


Topics

Introduction


Data Provenance


Next Steps

86


• Review your feedback from today and adjust the analysis approach accordingly.

• Extract the data from Process Dashboard tool submitted files.

• Extract and analyze individual team member data.

• Continue with the data analysis. Publish the results.

87


Contact Information

Mark Kasunic Senior Member of Technical Staff TSP Initiative Telephone: +1 412-268-5863 Email: [email protected]

U.S. Mail Software Engineering Institute Customer Relations 4500 Fifth Avenue Pittsburgh, PA 15213-2612 USA

Web www.sei.cmu.edu www.sei.cmu.edu/contact.cfm

Customer Relations Email: [email protected] Telephone: +1 412-268-5800 SEI Phone: +1 412-268-5800 SEI Fax: +1 412-268-6257

88


Copyright 2012 Carnegie Mellon University.

This material is based upon work supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center.

Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Department of Defense.

NO WARRANTY

THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.

This material has been approved for public release and unlimited distribution except as restricted below.

Internal use:* Permission to reproduce this material and to prepare derivative works from this material for internal use is granted, provided the copyright and “No Warranty” statements are included with all reproductions and derivative works.

External use:* This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other external and/or commercial use. Requests for permission should be directed to the Software Engineering Institute at [email protected].

*These restrictions do not apply to U.S. government entities.

mailto:[email protected]

empirical study of software engineering results · empirical study of software engineering results...

Documents