ayelet israeli and dror g. feitelson, “the linux kernel as a case study in software evolution”....

61
Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar 2010. Presented by Dror Feitelson.

Upload: jeffrey-wilkinson

Post on 21-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Ayelet Israeli and Dror G. Feitelson,“The Linux Kernel as a Case Study in Software Evolution”.Journal of Systems and Software 83(3), pp. 485-501, Mar 2010.

Presented by Dror Feitelson.

Page 2: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Synopsis

• A study of 810 versions of the Linux kernelreleased over 14 yearscomparing the evolution of the systemto Lehman’s Laws of software evolution.

• Conclusion: several laws are supported by the data.

• Observation: average complexity is decreasing with time.

Page 3: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Linux Background

• First announced August 1991• First release March 1994• Dual release scheme till 2003

– Odd versions are development (1.1, 1.3, 2.1, 2.3, 2.5)

– Even versions are production (1.0, 1.2, 2.0, 2.2, 2.4)

• New release scheme in 2.6– New version every 2-3 months– Development is distributed, no official releases

• Full source code of all versions available online

Page 4: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Linux Kernel Versions

• Paper used all 810 versions from March 1994 to August 2008 (all .h and .c files)– 144 production– 429 development– 237 of 2.6

• Unprecedented scale of investigation• Other researchers used only production or

only .c or a sample of versions• Some versions (test kernels and release

candidates) were missed

Page 5: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

1.0 v1.0/linux-1.01.1 v1.1/v1.1.0

v1.1/linux-1.1.*1.2 v1.2/linux-1.2.*1.3 v1.3/linux-1.3.*

v1.3/linux-pre2.0.*2.0 v2.0/linux-2.0.*2.1 v2.1/linux-2.1.*

v2.1/linux-2.2.0-pre*2.2 v2.2/linux-2.2.*2.3 v2.3/linux-2.3.*

v2.3/linux-2.3.99-pre*v2.4/old-test-kernels/linux-2.4.0-*

2.4 v2.4/linux-2.4.*2.5 v2.5/linux-2.5.*

v2.6/pre-releases/linux-2.6.0-test*2.6 v2.6/linux-2.6.*

v2.6/testing/v2.6.*/linux-2.6.*-rc* v2.6/longterm/v2.6.*/linux-2.6.*

Kernel version locations on www.kernel.org

Page 6: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

CPP Problems

• Kernel is littered with preprocessor directives• Removed them in order to analyze all the code

– This is what the developers see• Sometimes this leads to incorrect syntax

– Files where this happened were ignored– About 1.5% of the code

• Alternative is to perform preprocessing (used by others)– Induces code bloat (macros and #include)– Only one configuration of the system

Page 7: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Evolution Background• Textbooks: Software developed in well-defined phases:

– Elicit requirements– Create specifications– Design the system– Implement– Test and correct– Install and maintain

• Reality: Software evolves:– Start with a small useful project– Users will introduce new requirements– Adapt the system to do what is needed– Needs cannot be anticipated in advance

Page 8: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Three Types of Programs

• S-type: derived from well defined formal specifications

• P-type: can’t derive a formal solution, so use an iterative process to find and refine a solution

• E-type: a program that becomes embedded in its environment and changes with it; mechanizing an activity changes it and induces new requirements

Continuous evolution instead of

development and then maintenance

Page 9: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Early Data

• Data on OS/370– Size in modules– As function of

release serial no.• Main results

– Steady growth– Ripple effect– Instability in late releases

OS/360-370

S(i) = S(i-1)+0.19

1

2

3

4

5

6

7

1 5 9 13 17 21 25

SizeRelativeto RSN 1

RSN

Page 10: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Lehman's Laws

1) Continuing change (adaptation)2) Increasing complexity (unless refactored)3) Self regulation (of rate of change)4) Invariant work rate (inertia)5) Conservation of familiarity (of users and

developers)6) Continuing growth (more features)7) Declining quality (unless maintained)8) Feedback system (at multiple levels)

Page 11: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

The Idea

• Lehman used little data from closed-source systems

• A lot of data is now available• Use Linux data to see if it supports Lehman’s

Laws• In particular try to use software metrics to

quantify the laws

Page 12: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law VI

Continuing GrowthThe functional capability of E-type systems must be continually enhanced to maintain

user satisfaction over system lifetime

• Law requires new functionality to be added• Can also be interpreted as requiring growth

in size• Are the two interpretations equivalent?

Page 13: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Lehman’s Data

• Size in modules as function of release number for OS/360 and other systems

• Grows, but growth rate often seen to decline– Though not for OS/360– Turski suggested inverse square law: – Idea: effort E is spent on all possible

interactions among si modules– Leads to a model where

21i

ii sss

E

3 isi

OS/360-370

S(i) = S(i-1)+0.19

1

2

3

4

5

6

7

1 5 9 13 17 21 25

SizeRelativeto RSN 1

RSN

Page 14: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law VI and Linux

• The dominant effect (so we deal with it first)• The easiest to measure and analyze

– If interpreted as size• Growth is super-linear (quadratic?)• Explained by positive feedback with growth of

developer base• Functional growth is harder to quantify

Page 15: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Godfrey & Tu Linux Data

• LOC or tarball size as function of date, 1994-2000

• Focus on development versions• Growth rate seen to increase• Fits quadratic model• Largely verified by others• Also for other (but not all) open-source

systems

Page 16: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Release Number vs. Time

• Doesn’t matter if releases are regular– In Linux before 2.6 they are not

• Changes growth shape if irregular• Question of interleaving multiple versions

– Assume version 2.3 was released after 3.0– If sorted by number their order is reversed– Justified because related to 2.2, not to 3.0

Page 17: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Linux Growth Data

Page 18: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Super-linear growthContradicts Lehman and Turski who claimed growth should slow down due to increasing complexity

Page 19: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Functionality

• Previous results good for all common size metrics

• Different results if try to measure functional growth

• System calls are leveling out– Possibly reflects maturity, as predicted by Torvalds

• Config options are growing faster– Indicates growth is in internal mechanisms rather

than user-visible services

Page 20: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

System Calls

Page 21: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Config Options

Page 22: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law I

Continuing ChangeAn E-type system must be continually

adapted, else it becomes progressively less satisfactory in use

• This means that software must evolve• “Adapt” implies keeping up with a changing

environment

Page 23: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law I and Linux

• Change is obviously true– In 2.6 a new version is released every 2-3 months

• Change is achieved through growth• Adaptation to changing hardware environment• Hard to distinguish adaptation from growth

– Is adding support for sound cards a new feature or adaptation to a changing environment?

Page 24: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Adaptation to New Hardware

• Special case of operating system environment• Confined to two subdirectories

– arch (supported architectures)– drivers (supported peripherals)

• Together about 60% of the code• Grow together with the rest of the system at

about the same rate

Page 25: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

arch + drivers vs. Whole Kernel

Page 26: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law II

Increasing ComplexityAs an E-type system is changed its complexity

increases and it becomes more difficult to evolve unless work is done to maintain it

and reduce the complexity

• Functionality costs in complexity• Two-sided law: supported either way

Page 27: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law II and Linux

• Complexity not necessarily increasing• System is largely modular (e.g. no coupling

between file systems, scheduler, and drivers)• New functions being added are short and

simple• Growing number but reduced fraction of

high-MCC functions• Active work to reduce complexity

Page 28: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

McCabe Cyclomatic Complexity (MCC)

• Introduced by McCabe in 1976• Essentially counts the minimal number of

paths through the code• Suggestion: functions with MCC>10 may

require refactoring• Easily calculated by counting predicates

– All while, for, if, and case statements• Widely used in tools and research• Has been criticized, but no better alternatives

Page 29: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Measuring MCC

• Use commercial static analysis tool (klocwork)– Requires compilation of the code– Therefore limited to specific configuration– Some bug and usage problems

• Use free tool (pmccabe)– But not in this paper

• Write your own script– Simple and what we need– Danger of bugs and not being standard

Page 30: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Results• Total MCC grows with code

Page 31: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Results• Total MCC grows with code• But average MCC per function is decreasing

Page 32: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Distribution of MCC

Page 33: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Possible Explanations

• Many new functions being added, and they tend to be simpler than the old ones– Indeed, new functions tend to have lower MCC

• Code is being actively improved with time

Page 34: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

High-MCC Functions

• Distribution of MCC values is heavy-tailed• Highest values are in the hundreds

– 369 functions with MCC ≥ 100 over the years• Some of these functions evolve

– Massive reduction in MCC as in sys32_ioctl– Gradual growth of MCC– Occasional large growth in production version

• Very long, but actually not very complex

Page 35: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Tail of MCC Distribution

Page 36: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

An Aside on Heavy Tails

• Definition: tail decays as a power law

• CDF:

• CCDF:

• Heavy tail:

• LLCD:

)Pr()( xXxF

)Pr()( xXxF

20)( axxF a

xaxF log)(log

Page 37: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law VII

Declining QualityUnless rigorously adapted and evolved to

take into account changes in the operational environment, the quality of an E-type system

will appear to be declining

• Again can be supported either way• What is “quality”?

Page 38: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law VII and Linux

• Question of how to quantify quality• Quality is most probably not decreasing• It may even be improving

Page 39: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Perceived Quality

• If quality declines system will fall out of use• Linux usage is strong and growing• Ergo Linux quality is not declining

Page 40: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Measured Quality

Oman’s Maintainability Index (MI)

• HV = Halstead’s volume (N ln n)– Bits required to write the function

• MCC = McCabe Cyclomatic Complexity• LoC = Lines of Code• pCM = percent Comment lines

– Interpreted as fraction (0-1) rather than percent

pCMLoCMCCHVMI 46.2sin50ln2.1623.0ln2.5171

Page 41: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Changes in MI

Page 42: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law IV

Invariant Work rateThe work rate of an organization evolving an E-type software system tends to be constant over the operational lifetime of that system

or phases of that lifetime

• Large organizations have inertia• What about open source communities?

Page 43: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law IV and Linux

• Work on Linux is growing superlinearly• Fraction of files handled is near constant• Release rate is near constant

– 5-10 days per minor release till 2.5– 2-3 months for new version in 2.6

Page 44: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Interpretation 1: Work Hours

• Data not available• Ill-defined: developers typically have other

daytime job• Nevertheless, work rate is most probably not

constant– Growth in developer base– Increased growth rate of code

Page 45: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Interpretation 2: Elements Handled

• Suggested by Lehman• Use development versions (+ 1st year of 2.4)• Includes number added (reflects growth)• Absolute number grows with time• Fraction of existing files relatively constant

Page 46: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar
Page 47: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Interpretation 3: Release Rate

• Release rate of development versions 1996-2003 around 3-6/month– Lower in 2.4

Page 48: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Releases per Month

Page 49: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Interpretation 3: Release Rate

• Release rate of development versions 1996-2003 around 3-6/month

• Production versions have high minor release rate until next development version is forked

Page 50: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Rate of Minor Releases

Linear slope =steady release rate

Page 51: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Interpretation 3: Release Rate

• Release rate of development versions 1996-2003 around 3-6/month

• Production versions have high minor release rate until next development version is forked

• Since 2003 (version 2.6) new version every 2-3 months

• Conclusion: seems to support constant rate

Page 52: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law V

Conservation of FamiliarityIn general, the incremental growth (growth rate trend) of E-type systems is constrained

by the need to maintain familiarity

• Capacity of humans to change constrains the rate of change

Page 53: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law V and Linux

• Rapid development releases imply small change between versions

• Production versions branch off from development versions again with small change

• Large difference between production versions– So user familiarity is not conserved

• Users may continue to use production version for long time– Evidence for need for conservation of familiarity

Page 54: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law III

Self RegulationGlobal E-type system evolution is feedback

regulated

• Reflects a balance between forces that demand change, and constraints on what can actually be done

Page 55: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Lehman’s Ripple

• Ripple indicates negative feedback control• Or maybe alternation of major/minor releases?

Page 56: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Increments of Growth

• Large increment reflects desire to add more new functionality

• Small increment reflects need to stabilize• Alternations reflect self regulation• Also seen to some degree in Linux

Page 57: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Alternating Increments

Page 58: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law VIII

Feedback SystemE-type evolution processes are multi-level, multi-loop, multiagent feedback systems

• Extension of law III?

Page 59: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Law VIII and Linux

• Archetypal open-source system• Continued development based on feedback

from users– Defect reports– Bug fixes– Contribution of code

• Change of release scheme in 2.6 reflects need for more rapid dissemination

• Hard to quantify

Page 60: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Lehman’s Laws and Linux: Summary

• Some laws are two-sided– II (complexity), VII (quality)

• Some laws are qualitative– I (adaptation), III (self regulation), V (familiarity),

VII (quality), VIII (feedback)• Laws need to be interpreted and quantified

– II (complexity), IV (work rate), VII (quality)

Page 61: Ayelet Israeli and Dror G. Feitelson, “The Linux Kernel as a Case Study in Software Evolution”. Journal of Systems and Software 83(3), pp. 485-501, Mar

Lehman’s Laws and Linux: Summary

I change Adaptation to new hardwareII complexity Not increasingIII self

regulationMaybe

IV work rate Constant release rate, superlin. growthV familiarity Within production versionsVI growth SuperlinearVII quality Not decreasingVIII feedback Inherent in open source paradigm