a case study of bias in bug-fix datasets

SAIL, School of Computing, Queen’s University, Kingston, Canada

A Case Study of Bias in Bug-Fix Datasets

Thanh H. D. Nguyen, Bram Adams, Ahmed E. Hassan

We need bug prediction• Problem:

• Quality improvement resource is limited.• Solution:

• Bug prediction identifies defect-prone modules.

Our focus is data quality

What if there is sample

We should consider bias in our studies

Stanford graduate student housing survey

Unlinked bugs have:Higher severityLess experience[Bird al et. 2009] Linkage Bias

Tagging BiasAbout 2/3 of all bugs

reports are not defects[Antoniol al et. 2008].

Biases are threats to validity of software quality studies

• Because of linkage bias, our models:• neglect higher severity bugs.• neglect less experienced developers.

• Because of tagging bias, our models:• inaccurately consider more bugs that existed.

Do biases really exist? How do biases

affect our research?

Near ideal data:Linkage is enforced.Tagging is provided.

Severity

Experience

Maturity

Release pressure

Collaboration

✔✔−−−

−✔✔−−

Conjecture: Biases are properties of the

software process, not of missing links.

Do linkage biases exist in Jazz?

Severity

Experience

Maturity

Release pressure

Collaboration

✔✔−−

Question:How does

tagging biases affect our research?

Do tagging biases exist in Jazz?

How tagging biases affect our research?

Files Defects + Tasks

A 5B 4C 6D 1

Defects only

Not biasWhich we should use

BiasWhich we

normally use

Spearman: .94Pearson: .97

Conjecture: It might be ok to

use biased data.

a case study of bias in bug-fix datasets

Software

bug eat bug world

mfc datasets: large-scale benchmark datasets for media

effective bug triage based on historical bug-fix...

heartbleed bug

volunteer bias lead time bias length bias stage migration...

citing datasets

insect-damaged wheat: suni bug, cereal bug, sunn pest, wheat...

brown marmorated stink bug · brown marmorated stink bug...

fair and balanced? bias in bug-fix datasets -...

component chart - bug club and phonics bug

evaluating the presence and impact of bias in bug-fix...

bias plus variance decomposition for survival analysis...

newdatasetsforbugpredictionanda...

agile bodensee - agile testing: bug prevention vs. bug...

bug club – phonics bug printed material magnetic board bug...

community - ecology: good bug, bad bug

how did i miss that bug? managing cognitive bias in testing

debugging, bug finding and bug avoidance part 3

bagrada bug, painted bug, bagrada hilaris (burmeister)...

bug club credits - pearson education€¦ · bug club...