Transcript
Page 1: Situated learning among open source software developers

A Master Thesis Presentation

(Dartington Pottery Training Workshop, 1978)

Author:

Josef HardiEuropean Master in Software Engineering

Supervisors:

Prof. Barbara RussoDr. Richard Torkar

Situated Learning in Open Source Software Developers:

The Case of Google Chrome Project

Thursday, August 4, 2011

Page 2: Situated learning among open source software developers

Introduction

• Situated Learning is the learning that occurs in workplaces [Brown et al., 1989].

• No separation between ‘knowing’ and ‘doing’.

• Situated learning is primarily practiced by the community of practitioners.

1/18Thursday, August 4, 2011

Page 3: Situated learning among open source software developers

Existing Findings

2/18

• Learning curve effect.

• “That the more times a task has been performed, the less time will be required on each subsequent iteration.” [T.P. Wright, 1936]

• [Huntley, 2003]: Mozilla is reported to exhibit a strong learning curve compared to Apache.

• [Au et al., 2009]: Learning is universally present in OSS projects.

Thursday, August 4, 2011

Page 4: Situated learning among open source software developers

• Data are taken from each individual instead of from an aggregation of individuals.

• More insights to individual characteristics.

• i.e., Knowledge depreciation and team roles as factors that affect the learning process.

Distinctions in this Thesis

3/18Thursday, August 4, 2011

Page 5: Situated learning among open source software developers

4/18

Research Question 1:Is learning present in

OSS developers?

Hypothesis 1:

There is a relation between the accumulated

experience and the performance.

Hypothesis 2:Knowledge depreciates over

time among the OSS developers.

Hypothesis 3:Core developers resolve

issues faster.

Research Question 2:What are the factors that

affect learning?

Thursday, August 4, 2011

Page 6: Situated learning among open source software developers

• Google Chrome Project.

• Duration: 10 months ~ 10 releases (December 2008 - October 2009).

Case Study

5/18Thursday, August 4, 2011

Page 7: Situated learning among open source software developers

Construct Input Data

Research Methodology

6/18

Data CollectionData exploration

Review Interaction Data

Issue Report Data ExperiencePerformance Team Role

Identification of Learning Curve Models and Data Fitting

1 2

34

Thursday, August 4, 2011

Page 8: Situated learning among open source software developers

Research Methodology:

Data Collection

7/18

Issue Report Data(5,160 entries)

1. Unrelated project areas,2. Invalid issue status,3. Empty owner name.

Issue Report =[ID, Type, Area, Status, Owner, Open date,

Assigned date, Started date, Close date]

1 2 3 4

Thursday, August 4, 2011

Page 9: Situated learning among open source software developers

8/18

Interaction =[Owner, Reviewer, Comment date]

Review Interaction Data(12,037 entries)

"ben","sky",1226700214"ben","sky",1226706864"ben","pkasting",1226707765"mal","tony",1226809276"sgk","tony",1226874776"phajdan.jr","deanm",1227808551"phajdan.jr","deanm",1227809341"phajdan.jr","mark",1228496086...

Research Methodology:

Data Collection1 2 3 4

Thursday, August 4, 2011

Page 10: Situated learning among open source software developers

Issue Report Data

Issue Report Data

Releases

Dev

elop

ers

...

Experience

Releases

Dev

elop

ers

...

Performance

9/18

Research Methodology:

Data Exploration

Measure Experience Number of resolved issues

Measure PerformanceAverage of issue resolution time.

Sample = 274 developers

1 2 3 4

Thursday, August 4, 2011

Page 11: Situated learning among open source software developers

10/18

Research Methodology:

Data Exploration

Review Interaction

Data

Releases

Dev

elop

ers

...

Team RoleEstimate Team Role

Core and periphery structure model[Borgatti, 1999]

Sample = 274 developers

1 2 3 4

• Core entails a dense, cohesive structure and periphery entails a sparse, loose structure.

• The estimation is performed by using UCINET.

Thursday, August 4, 2011

Page 12: Situated learning among open source software developers

Research Methodology:

Construct Input Data

11/18

274 Developers

Not all of them working in a long-term.

Participate for at least 8 releases

38 Long-term Contributors

Refine

new longitudinal data

sets

1 2 3 4

Thursday, August 4, 2011

Page 13: Situated learning among open source software developers

Releases

Ave

rage

tim

e of

res

olvi

ng is

sues

(log

days

)

12/18

Input data set:

PerformanceThe data distribution in the group of long-term developers

Thursday, August 4, 2011

Page 14: Situated learning among open source software developers

Am

ount

of r

esol

ved

issu

es(N

)

13/18

The data distribution in the group of long-term developers

Releases

Input data set:

Experience

Thursday, August 4, 2011

Page 15: Situated learning among open source software developers

46%54%

R1

39%

61%

R2

39%

61%

R3

45%55%

R4

53% 47%

R5

47% 53%

R6

47% 53%

R7

42%58%

R8

42%58%

R9

39%

61%

R10

14/18

The team composition in the group of long-term developers

Input data set:

Team Role

Thursday, August 4, 2011

Page 16: Situated learning among open source software developers

Note

Research Methodology:

Identification of Learning Curve Models and Data Fitting

15/18

1 2 3 4

Model 1:

Model 2:

Thursday, August 4, 2011

Page 17: Situated learning among open source software developers

Result Summary

Hypothesis Variable Model 1 Model 2 Supported?

H1 KnowledgeStock -0.01*** -0.01*** Yes

H2 Lambda 0.94*** 0.94*** Yes

H3 TeamRole NA 0.18 No

16/18

*** Statistically significant p < 0.001

Thursday, August 4, 2011

Page 18: Situated learning among open source software developers

• The improvement in the solving issues might be caused by the improvement in the system design.

• Some of the issue data are incomplete

Threats to ValidityInternal Validity

Construct Validity

• The estimation of Core and Periphery structure might not reflect the real situation. However, the communication pattern is the best indicator.

External Validity

• Both models have a very low statistical prediction power (less than 5%).

17/18Thursday, August 4, 2011

Page 19: Situated learning among open source software developers

• I affirmed that learning is present in open source software developers.

• Knowledge does not significantly depreciate in the Google Chrome team.

• It is inconclusive to claim core developers work faster than those who are in the periphery.

• Methodological contribution: A method to harvest and analyze data from code review.

Conclusion

18/18Thursday, August 4, 2011

Page 20: Situated learning among open source software developers

Thank you!

Bolzano, 8 October 2010Thursday, August 4, 2011


Top Related