so, you are saying that our software quality was screwed up by ... the release engineer?! (keynote @...

Post on 07-May-2015

1.950 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Software release engineering is the discipline of integrating, building, testing, packaging and delivering qualitative software releases to the end user. Whereas software used to be released in shrink-wrapped form once per year, modern companies like Intuit, Google and Mozilla only need a couple of days or weeks in between releases, while lean start-ups like IMVU release up to 50 times per day! Shortening the release cycle of a software project requires considerable process and development changes in order to safeguard product quality, yet the scope and nature of such changes are unknown to many. For example, while migrating towards rapid releases allows faster time-to-market and user feedback, it also implies less time for testing and bug fixing. To understand these implications, we empirically studied the development process of Mozilla Firefox in 2010 and 2011, a period during which the project transitioned to a shorter release cycle. By comparing crash rates, median uptime, and the proportion of post-release bugs before and after the migration, we found that (1) with shorter release cycles, users do not experience significantly more post-release bugs and that (2) less bugs are being fixed, but those that are fixed are fixed faster. In order to validate these findings, we interviewed Mozilla QA personnel and empirically investigated the changes in software testing effort after the migration towards rapid releases. Analysis of the results of 312,502 execution runs of Mozilla Firefox from 2006 to 2012 (5 major traditional and 9 major rapid releases) showed that in rapid releases testing has a narrower scope that enables deeper investigation of the features and regressions with the highest risk, while traditional releases run the whole test suite. Furthermore, rapid releases make it more difficult to build a large testing community, forcing Mozilla to increase contractor resources in order to sustain testing for rapid releases. In other words, rapid releases are able to bring improvements in software quality, yet require changes to bug triaging, fixing and testing processes to avoid unforeseen costs. Finally, we also discuss a case study on software integration in the Linux kernel using git. We show the important factors that not only impact patch acceptance but also the time until acceptance, which is essential when optimizing the release engineering process for rapid releases.

TRANSCRIPT

MC IS

So, you are saying that our software quality was screwed up by ... the

release engineer?!

Bram AdamsMC IS

Who is Bram Adams?

Wolfgang De MeuterVrije Universiteit Brussel

Herman TrompGhent University

Ahmed E. HassanQueen's University

MC IS

(Lab on Maintenance, Construction and Intelligence

of Software)

MC ISJojo

MiniParastouYou?

Tejinder Dhaliwal Foutse Khomh

Ying Zou

Mika Mäntylä

Emelie Engström Kai Petersen

Ahmed E. Hassan

joint work with

Daniel German

MC IS

So, you are saying that our software quality was screwed up by ... the

release engineer?!

Bram AdamsMC IS

MC IS

MC IS

On average we deploy new code fifty times

a day.

Continuous Delivery

http://goo.gl/qPT6

VCS

continuousintegration

9 min.

15k tests

test

staging/production

8

Continuous Delivery

http://goo.gl/qPT6

VCS

continuousintegration

9 min.

15k tests

test

staging/production

8

Continuous Delivery

http://goo.gl/qPT6

VCS

continuousintegration

9 min.

15k tests

6 min.

test

staging/production

8

IMVU, eh?BENEVOL-ers

Work fast and don’t be afraid to break

things.

http://goo.gl/UlCW

James Whittaker

Build a little and then test it.

Build some more and test some more.

Even Desktop Apps Release More Frequently

2002 2003 2004 2005 2006 2007 2008 2009 2010 20110

8

16

24

32

40

#releases

?

12 http

://en

.wik

iped

ia.o

rg/w

iki/H

isto

ry_o

f_Fi

refo

x

http

://ha

cks.

moz

illa.

org/

wp-

cont

ent/

uplo

ads/

2012

/05/

rapi

d-re

leas

e.jp

g

... although it's Kind of an Illusion!

reduce cycle time!

Rapid Release in a Nutshell

packaging & delivery

in-house/3rd partydevelopment

integrate

buildtest

http://behrns.files.wordpress.com/2008/03/ikea-car.jpg

14

Are Rapid Releases Worth the Effort?BENEVOL-ers

=? Certificate

of High Quality

rapid release

rapid release=?OR

1.0 1.5 2.0 3.0 3.5 3.6 4.0 5.0 6.0

7.0 8.0

9.0

Traditional Release Cycle Rapid Release Cycle

Do Faster Releases Improve Software Quality? - An Empirical Case Study of Mozilla Firefox (MSR '12, Khomh et al.)

1.0 1.5 2.0 3.0 3.5 3.6 4.0 5.0 6.0

7.0 8.0

9.0

Traditional Release Cycle Rapid Release Cycle

Do Faster Releases Improve Software Quality? - An Empirical Case Study of Mozilla Firefox (MSR '12, Khomh et al.)

Socorro crash reports

Bugzilla bug reports

1.0 1.5 2.0 3.0 3.5 3.6 4.0 5.0 6.0

7.0 8.0

9.0

Traditional Release Cycle Rapid Release Cycle

release&model

vs.

Same # Post-release Bugs

13 20

1.6 1.7

0

1

2

3

4

5

Traditional Release (TR) Rapid Release (RR)

Num

ber o

f Pos

t Rel

ease

Bug

s Per

Day

Same # Post-release Bugs

We have about 30 of these project branches active at any one time. That means as a company we can have

30 or more experiments safely running in parallel....I’m guessing that most of the bugs end up being integration issues [of these branches into

the main trunk] otherwise the project branch wouldn’t have merged back in with mainstream devel.

643

370

0

120

240

360

480

600

720

840

960

1080

1200

Traditional Release (TR) Rapid Release (RR)

Med

ian

UpT

ime

in S

econ

ds

Crashes Pop Up Earlier

Firefox 4 was a crisis time for Firefox and Mozilla, since it took so long to complete and was a painful

engineering effort. We decided to make our development more agile by switching to rapid

release at the same time as we were making a bunch of other changes [...], and

starting Firefox OS development

vs.bug&fixing

1.0 1.5 2.0 3.0 3.5 3.6 4.0 5.0 6.0

7.0 8.0

9.0

Traditional Release Cycle Rapid Release Cycle

release&model

654 159

16

6

0

10

20

30

40

50

60

70

80

90

100

Traditional Release (TR) Rapid Release (RR)

Bug

Age

in D

ays

Bugs are Fixed Faster

59%

37%

67%

43%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

TraditionalRelease (TR) -

Main

Rapid Release(RR) - Main

TraditionalRelease (TR) -

Beta

Rapid Release(RR) - Beta

% o

f Bug

s Fix

ed

Proportionally Less Bugs Fixed

I’m not surprised that fewer bugs are fixed: some defects are trivial and some will take weeks of careful code inspection to find and fix. All those bugs would

have fallen in the same bucket for a 12+ month release cycle, but now the harder ones get

pushed out to later releases

can be better interpreted that we are being less effective at triaging bugs with rapid release, or that we have more beta users and so the incoming

bug rates are larger

same #bugs

... but faster crashes

bugs are fixed faster

... but less bugs are

being fixed

Shall we Blame the Testers?BENEVOL-ers

packaging & delivery

in-house/3rd partydevelopment

http://behrns.files.wordpress.com/2008/03/ikea-car.jpg

33

integrate

buildtestreduce cycle time!

Rapid Release in a Nutshell

2.0 3.0 3.5 3.6 4.0

'07 '08 '09 '10 '11 '12'06

5.0-13.0

Did the System Testing Process Change?

Rapid Release Model vs. Software Testing Effort – Study of Mozilla Firefox Project (ICSM '13, Mäntylä et al.)

Different Test Phases

integration + build

release

(selection of) unit tests

all unit tests

integration tests

user acceptance/system tests

test phase time to run (hours)

capacity tests

smoke tests

testcase id #8410 - all tab options unchecked

Testcase ID #: 8410

Summary: All Tab options unchecked

Product: Firefox

Branch: 3.6 Branch

Regression Bug ID #: None specified

Enabled?

Community Enabled?

Steps to Perform:1. Open Tool | Options (Preferences) | Tabs. Make sure all check

boxes are unchecked.

Expected Results:Confirm that all four preferences do not work, since they are disabled:

A new window is opened instead of a new tab.You will not be warned when closing multiple tabs.You will not be warned when opening multiple tabs (>15 at once,e.g. the default RSS folder in the toolbar).The tab bar is not shown when only one tab is open.Opening a link in a new tab will open it in the background.

Testgroup(s) Subgroup(s)

3.6 Full Functional Tests (FFTs) (137) FX 3.6 FFT - Tabbed Browsing (1242)

recent test results for testcase id #8410 - all tab options uncheckedsubmission date result id# testcase id#: summary product platform branch locale

2010-11-02 02:16:00 336691 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-07-07 10:07:17 279135 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-05-12 11:20:12 257854 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-01-17 02:21:29 232877 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-01-11 20:56:35 230165 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-01-07 10:50:21 228669 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2009-10-22 03:22:31 213485 8410: All Tab options unchecked Firefox 3.6 Branch en-US

search test results: recent failures | recently unclear | most common failures | testcases most frequently marked as unclear | testgroup popularity |results with comments

search testcases: testcases needed! | recently added | recently changed | by id:

welcomelog in

get testing!run testsview/search tests

reportingsearch resultsadvanced searchtestdaystest runsstatistics

helplitmus tutoriallitmus wiki

Litmus

Test Scenario & Expected Outcome

Recorded Test

Results

testcase id #8410 - all tab options unchecked

Testcase ID #: 8410

Summary: All Tab options unchecked

Product: Firefox

Branch: 3.6 Branch

Regression Bug ID #: None specified

Enabled?

Community Enabled?

Steps to Perform:1. Open Tool | Options (Preferences) | Tabs. Make sure all check

boxes are unchecked.

Expected Results:Confirm that all four preferences do not work, since they are disabled:

A new window is opened instead of a new tab.You will not be warned when closing multiple tabs.You will not be warned when opening multiple tabs (>15 at once,e.g. the default RSS folder in the toolbar).The tab bar is not shown when only one tab is open.Opening a link in a new tab will open it in the background.

Testgroup(s) Subgroup(s)

3.6 Full Functional Tests (FFTs) (137) FX 3.6 FFT - Tabbed Browsing (1242)

recent test results for testcase id #8410 - all tab options uncheckedsubmission date result id# testcase id#: summary product platform branch locale

2010-11-02 02:16:00 336691 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-07-07 10:07:17 279135 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-05-12 11:20:12 257854 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-01-17 02:21:29 232877 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-01-11 20:56:35 230165 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2010-01-07 10:50:21 228669 8410: All Tab options unchecked Firefox 3.6 Branch en-US

2009-10-22 03:22:31 213485 8410: All Tab options unchecked Firefox 3.6 Branch en-US

search test results: recent failures | recently unclear | most common failures | testcases most frequently marked as unclear | testgroup popularity |results with comments

search testcases: testcases needed! | recently added | recently changed | by id:

welcomelog in

get testing!run testsview/search tests

reportingsearch resultsadvanced searchtestdaystest runsstatistics

helplitmus tutoriallitmus wiki

06/2006 - 06/2012

312,502 test executions

1,547 test cases

6,058 different testers

2,009 tested

builds

10% of test cases are

automated

147 releases

Litmus

35,000 - 55,000 executions/release

Testing More Intense

6,000 - 9,000 executions/release

... but twice as many

test executions per day

To survive under the time constraints of a rapid release we’ve had to cut the fat and

focus on those test areas which are prone to failure, less on ensuring legacy support

700 - 1,100 different test cases

100 - 170 different test cases

Smaller Scope of Testing

a fixed set of tests for areas prone to regression (Flash plugin testing for example)

anda dynamically changing set of tests to cover recent regressions we chemspilled for and high risk features

1,000 - 1,900 testers

8 - 34 testers

Smaller Testing Team

The weakest point [in rapid release] is that it’s harder to develop a large community which more

accurately represents the scale and variability of the general population. Frequently this means that we don’t hear about issues until after release, in effect turning our release early adopters into beta testers

1) the core team has remained largely unchanged since adopting rapid release 2) the contract team has nearly

doubled ... We can scale up our team much faster through contractors than through

hiring. The time afforded to us to make the switch to rapid releases left little room for failure which is why we

took that approach.

Test Priorities

less locales being tested ...

Traditional Rapid

0.0

0.5

1.0

1.5

2.0

Loca

les

per d

ay

Traditional Rapid

02

46

8

OSs

per

day

... but more operating systems

we now distribute across Betas. For example, we might test Windows 7, OSX 10.8, and Ubuntu in one Beta then

Windows XP, Mac OSX 10.7, and Ubuntu in another Beta

In Other Words ...

Testing has to become more

Focused

The greatest strength [of rapid releases] is that the scope of what needs to be tested is narrow so we can focus all of our energy on deep diving into a few areas

Release Engineering

deployment

in-house/3rd partydevelopment

http://behrns.files.wordpress.com/2008/03/ikea-car.jpg

49

integrate

buildtestreduced cycle time!

Will My Patch Make It?

(And How Fast?)

50

MSR '13, Jiang et al.

I do hold out hope that Google does come around and works to fix their codebase to get it merged upstream to stop the huge blockage that they have now caused in a large number of embedded Linux hardware companies […] But I need the help of the Google developers to make it happen, without them, nothing can change.

http://www.kroah.com/log/linux/android-kernel-problems.html51

GregKroah-Hartman

Linux 3.5

The Linux Integration Process

52

linux-usb

linux-scsi

lkml

subsystemmaintainer 1

subsystemmaintainer 2

Reviewing Integration Staging

maintainer Linus Torvalds

contributor 1

contributor 2

contributor 3

Research Questions

How manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

subsystemmaintainer 1

subsystemmaintainer 2

Linux 3.5

Case Study Setup

54

Reviewing Integration Staging

maintainer Linus Torvalds

linux-usb

linux-scsi

lkml

contributor 1

contributor 2

contributor 3

Linux 3.5

Case Study Setup

54

Reviewing Integration Staging

Linus Torvalds

linux-usb

linux-scsi

lkml

contributor 1

contributor 2

contributor 3

Case Study Setup

54

Linus Torvalds

linux-usb

linux-scsi

lkml

Case Study Setup

54

Linus Torvalds

linux-usb

linux-scsi

lkml

Case Study Setup

54

Linus Torvalds

email1

email3

email2 email patch2

email patch1

email patch3...

Case Study Setup

54

...commit3

commit2

commit patch1

commit1

commit patch2

commit patch3

email1

email3

email2 email patch2

email patch1

email patch3...

checksum1

checksum3

checksum2

...

Case Study Setup

54

...commit3

commit2

commit patch1

commit1

commit patch2

commit patch3

email1

email3

email2 email patch2

email patch1

email patch3...

checksum1

checksum3

checksum2

...

Case Study Setup

54

...commit3

commit2

commit patch1

commit1

commit patch2

commit patch3

email1

email3

email2 email patch2

email patch1

email patch3...

Experience

Email

Review ExperiencePatch

Experience

Commit

55

5 Dimensions(29 Patch Metrics)

size: LOC > 50

Number of reviewers > 3 ?

not accepted Number of review messages > 3 ?

Is this first patch in thread?

not acceptedaccepted

Decision Tree

Building Decision Trees

56

How manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

33% of Patches Make it ...

592005 2006 2007 2008 2009 2010 2011 2012

accepted/rejected patches

year

perc

enta

ge o

f pat

ches

0

20000

40000

60000

80000

100000

120000

28.6328.7

27.03

32.83 32.79 33.8733.55 30.74

71.37

71.3

72.97

67.1767.21 66.13

66.45

69.26

% accepted by linus% rejected by linus

# o

f p

atch

es

72.97%

67.17%

71.3%

71.73%

69.26%

66.45%

66.13%67.21%

28.27%28.7%

32.79%32.83%

27.03%

30.74%33.55%

33.87%

ACCEPT

REJECT

... and it takes 1~6 Months!

60

2005 2006 2007 2008 2009 2010 2011 2012

year

perc

enta

ge o

f acc

epte

d pa

tche

s of

eac

h ye

ar

020

4060

80

instantlywithin_hourwithin_day

within_weekwithin_monthwithin_quarter

within_half_yearwithin_yeartook_ages

Text

% acceptedpatches

2005 2006 2007 2008 2009 2010 2011 2012

year

perc

enta

ge o

f acc

epte

d pa

tche

s of

eac

h ye

ar

010

2030

40

Reviewing Speeds Up ...re

view

ing

time

(#da

ys)

2005 2006 2007 2008 2009 2010 2011 2012

year

perc

enta

ge o

f acc

epte

d pa

tche

s of

eac

h ye

ar

010

2030

4050

6070

instantlywithin_hourwithin_day

within_weekwithin_monthwithin_quarter

within_half_yearwithin_yeartook_ages

2005 2006 2007 2008 2009 2010 2011 2012

year

perc

enta

ge o

f acc

epte

d pa

tche

s of

eac

h ye

ar

010

2030

4050

6070

instantlywithin_hourwithin_day

within_weekwithin_monthwithin_quarter

within_half_yearwithin_yeartook_ages

... while Integration Slows Down

inte

grat

ion

time

(#da

ys)

2005 2006 2007 2008 2009 2010 2011 2012

year

perc

enta

ge o

f acc

epte

d pa

tche

s of

eac

h ye

ar

010

2030

4050

6070

instantlywithin_hourwithin_day

within_weekwithin_monthwithin_quarter

within_half_yearwithin_yeartook_ages

How manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

33% patch

acceptance

integration becomes slower

How manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

33% patch

acceptance

integration becomes slower

Experience Matters Most for Patch Acceptance

precision:73% recall:68.47%

65

How manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

33% patch

acceptance

integration becomes slower

by experienced

developers

mature patch

specific subsystem

How manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

33% patch

acceptance

integration becomes slower

by experienced

developers

mature patch

specific subsystem

Experience, Reviewers and Patch Spread Matter for Reviewing Time

68

How manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

33% patch

acceptance

integration becomes slower

by experienced

developers

mature patch

specific subsystem

less spread out

by more experienced developers

Release Engineering

70

Software Quality

?

Rapid Release vs. Quality?

Certificateof

High Quality

Trunk- vs. Branch-based Development

72

vs.

branch-based integration

trunk-based integration

"The Evolution of the Linux Build System" (Adams et al.)

73

Linux 2.6.16.18

0.01 0.11 1.0

1.2 2.0 2.2

2.4 2.6.0 2.6.21.5

How to Evolve Build Systems in a Healthy Way?

How to Focus Testing?

How to Keep up with 3rd Party Dependencies?

75 3rd party branch

Where are the Release Engineering Tools?

76

RELEASE IDE

test your build!

focus your testing efforts

what did the other team break

now ;-)

keep poking those upstream guys until they

give in :-)

How to Align Release Engineers with the Rest of the Organization?

On average we deploy new code fifty times

a day.

MC IS

same #bugs

... but faster crashes

bugs are fixed faster

... but less bugs are

being fixed

Testing has to become more

FocusedHow manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

33% patch

acceptance

integration becomes slower

by experienced

developers

mature patch

specific subsystem

less spread out

by more experienced developers

RELENG 2013

1st International Workshop on Release Engineering

http://releng.polymtl.ca May 20, 2013, San Francisco, USA

84 participants

... and 10

academic talks

2 keynotes

6 practitionertalks

On average we deploy new code fifty times

a day.

MC IS

Testing has to become more

FocusedHow manypatches are

merged?

What kind of patch is merged

more likely?

What kind of patch is accepted

faster?

33% patch

acceptance

integration becomes slower

by experienced

developers

mature patch

specific subsystem

less spread out

by more experienced developers

same #bugs

... but faster crashes

bugs are fixed faster

... but less bugs are

being fixed

top related