promise 2011: "empirical validation of human factors on predicting issue resolution time in...

Post on 06-May-2015

2.697 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Promise 2011:"Empirical validation of human factors on predicting issue resolution time in open source projects"Anh Nguyen Duc, Daniela Cruzes, Claudia Ayala and Reidar Conradi.

TRANSCRIPT

Empirical validation of human factors in

predicting issue lead time in open source

projects

Nguyen Duc Anh, Daniela S. Cruzes,Claudia Ayala and Reidar Conradi

1

Outline

• Introduction

• Research questions

• Research methodology

• Results

• Conclusions

• Future work

Introduction

• Software maintenance and evolution• Fixing bugs, implementing new feature requests, and

enhancing current system features• Mozilla bug tracking system receives 170 issue reports/ day,

Eclipse projects receives 120 reports/ day (Kim & Whitehead 2006)

• Issue Lead Time Prediction is challenging due to the:• Dynamics of software evolution, and• Lack of clear understanding of the factors

influencing issue lead time.

3

Previous Studies on Issue Lead Time Prediction

4

• Main focus is on characteristics of the issue only.• Ex: priority, effort, number of comments.

• Little focus on the Human factors aspect:• Developer’s experience, ability, reputation• Developer’s collaboration

• Developer’s capability & collaboration in developing a software module can affect how likely they are to introduce bugs in the module Are they useful for classifying/ predicting issue lead time as

well?

Previous Studies on Bug Lead Time Prediction

5

Giger et al. 2010

Bougie et al. 2007

Bhattacharya et al. 2011

Anbalagan et al. 2009

Hooimeijer et al. 2007

No of comments X X X

Reporter X X

Assignee X X

Severity X X X X

Priority X X

Operating system type

X

Open time X X

Platform X

No of attachment X X

No of dependencies

X

No of developers X X

Daily load X

Submitter reputation

X

Bug category X

Research questions

• RQ1. Do human factor metrics improve classification of issue lead time?

• RQ2. Which characteristics of issues increase the predictive power of a linear regression model for predicting issue lead time?

• RQ3. What is the accuracy of classification/ prediction models achieved?

6

Info.\Projects Qt Qpid GeronimoMain organization involved Qt (Nokia)

Red Hat, JP Morgan IBM

Collection time frame 85 months 51 months 87 months

Number of stakeholders 133 39 60

Number of issues 16818 3016 5697Number of selected issues 9921 2278 4787

Projects

7

• Issue lead time: • Duration between creation time and resolution time• Valid issues with stakeholders assignment• RESOLVED issues

Dependent variable

8

Independent variables

t0 t1 t2

No. of past reported, resolved issues,Past issues resolution time

Description length,Issue type, VersionCreation time ...

Nature of an issue

Past performance of reporter/ assignee

Collaboration in resolving issue

No. of comment,No. of stakeholders Metrics

Dimension

Past Present Near future

Issue i

predict ?

tresolved∆t

9

• Stakeholder past performance • Reporter experience (ExpR)

• Assignee experience (ExpA)

• Assignee Average past issue lead time (Apit)

Independent variables

1 _ 1exp ( , ) ( ) :j created issr rep t count iss t t

1 _ 1exp ( , ) ( ) :j resolved issa dev t count iss t t

1

_ _ _ 11

1 _ _ 1

:( , )

: 1

k

ii resolved i created i resolved ii

j ki created i resolved i

t t t t t tapit dev t

t t t t tk

10

• Post submission collaboration • The number of comments (NoC)

• The number of involved stakeholders (NoS)

Independent variables

_ 1 2( ) ( ) : [ , ]i comment cnoc iss count c t t t

_ 1 2( ) ( ) : ( ) : [ , ]ji j comment cnos iss count s c s t t t

11

Research methodology

12

# Model Qt Qpid Geronimo

1 Issue features 84.59% 58.52% 59.56 %

2Issue features + ExpR

85.53%(+0.94

%)

60.18%(+1.66%)

61.77%(+2.21%)

3Issue features+ ExpA

85.78%(+1.19

%)

60.72%(+2.2%)

62.00%(+2.44%)

4Issue features + Apit

87.46%(+2.87

%)

70.59%(+12.07

%)

62.90%(+3.34%)

5 Issue + NoC

86.56%(+1.97

%)

59.83%(+1.31%)

72.72%( +13.16

%)

6 Issue + NoS

86.77%(+2.18

%)

62.20%(+3.68%)

66.13%(+6.57%)

9 All90.58%(+5.99

%)

72.78%(+14.26

%)

73.22%(+13.66

%)

Classification resultsAccuracy of binary classification models

13

Conclusions:

1. Number of comments and average past issue lead time are effective complementary variables in classifying issue lead time.

Univariate and Multivariate analysis

Variables Qt Qpid Geronimo

Description length –0.123** 0.065** 0.118**

Priority –0.157** 0.021 –0.021ExpR 0.372** 0.222** –0.113**ExpA –0.186** –0.021 –0.168**NoC 0.008* 0.243** 0.416**NoS 0.123** 0.309** 0.303**Apit 0.799** 0.284** 0.222**

Spearman correlation with issue resolution time

14

Variables Qpid Geronimo QtIntercept –17.859 –6.478** –47.130**Description length 0.004 0.003 –0.001Priority –7.549 –10.740** –53.090**ExpR 0.110** 0.045* –0.892**ExpA –0.051* –0.010 –1.432**NoC 1.617 2.710** 1.607NoS 43.038** 11.38** 20.500**Apit 0.386** 0.588** 0.837**Model R2 = 0.2922

Adjusted R2 = 0.2809

R2 = 0.3226Adjusted

R2 = 0.3196

R2 = 0.5954Adjusted R2 = 0.595

Linear regression models

Conclusions• RQ1. Do human factor metrics improve classification of issue lead time?

• Yes. Accuracy improvement up to 12%

• RQ2. Which human factor metrics contribute significantly to issue lead time prediction in the linear regression models?

Project Qpid Qt Geronimo

AnalysisMulti

UniMulti

UniMulti

Uni

Reporter exp. ++ ++ -- ++ + --Assignee exp. - -- -- --Number of comments ++ + ++ ++

Number of stakeholders ++ ++ ++ ++ ++ ++

Average past resolution time

++ ++ ++ ++ ++ ++

15

Conclusions• RQ3. What are the accuracy of classification/ prediction

models can be achieved?Study Dependent var. Dataset R2

Bhattacharya et al. 2009

Bug fixing time Firefox 0.401Thunderbird 0.498Seamonkey 0.366Eclipse 0.301

Anbalagan et al. 2011

Ubuntu 5.10 0.98Ubuntu 6.04 0.81

This study Issue lead time Qpid 0.292Qt 0.595Geronimo 0.326

16

Consistent with other studies, but issue report based prediction models yield far from desirable predictive power

Future work

• Investigation of other input variables: mailing list & version control system comments

• Add more projects to the analysis

• Use other prediction techniques: non-linear regression

• Compare open source vs. closed source

17

Empirical validation of human factors in

predicting issue lead time in open source

projects

Nguyen Duc Anh, Daniela S. Cruzes,Claudia Ayala and Reidar Conradi

18

top related