sociotechnical mining information from software …...6) change prediction 2014 marco aurélio...

41
Sociotechnical Information From Software Repositories UNIVERSITY OF SÃO PAULO, BRAZIL UFU November/2014 Marco Aurélio Gerosa

Upload: others

Post on 26-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Mining Sociotechnical

Information From Software

Repositories

UNIVERSITY OF SÃO PAULO, BRAZIL

UFUNovember/2014

Marco Aurélio Gerosa

Page 2: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Repositories of repositories

2014 Marco Aurélio Gerosa ([email protected]) 2

https://github.com/about/presshttp://octoverse.github.com/

11.3 million repositories5.4 million usersIn 2013:• 3 million new users• 152 million pushes• 25 million comments• 14 million issues• 7 million pull requests

36K projectshttp://en.wikipedia.org/wiki/CodePlex

30K projectshttps://launchpad.net

324K projects3.4 million developers

http://sourceforge.net/apps/trac/sourceforge/wiki/What%20is%20SourceForge.net

250K projects

http://en.wikipedia.org/wiki/Comparison_of_open_source_software_hosting_facilities

93K projects1 million users

200 projectshttp://projects.apache.org/indexes/alpha.html

661K projects29 billion of lines of codes3 million users

33K projects

http://www.ohloh.net/

Page 3: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Mining

Data mining = “computational process of discovering patterns in large data sets”

2014 Marco Aurélio Gerosa ([email protected]) 3

http://2014.msrconf.org/

“The Mining Software Repositories (MSR) field analyzes the rich data available in software repositories to uncover interesting and actionable information about software systems and projects.”

Mining

Page 4: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Programmers of a given language are happier than the others?

Page 5: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Sentiment analysis on commits

2014 Marco Aurélio Gerosa ([email protected]) 5

http://geeksta.net/geeklog/exploring-expressions-emotions-github-commit-messages/

http://www.igvita.com/slides/2012/bigquery-github-strata.pdf

GitHub Data Challenge 2nd place

Page 6: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

If a programmer knows a language, which others does she know?

Page 7: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Programming language relations

2014 Marco Aurélio Gerosa ([email protected]) 7

https://github.com/mjwillson/ProgLangVisualise

http://www.igvita.com/slides/2012/bigquery-github-strata.pdf

Page 8: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

When are the bugs inserted?

Page 9: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Don’t program on Fridays

2014 Marco Aurélio Gerosa ([email protected]) 9

http://www.slideshare.net/taoxiease/software-analytics-towards-software-mining-that-matters

Page 10: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Which files are more buggy?

Page 11: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Which files are more buggy?

2014 Marco Aurélio Gerosa ([email protected]) 11

Page 12: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

And what about social data?IS IT IMPORTANT FOR SOFTWARE ENGINEERING?

Page 13: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

David Parnas

2014 Marco Aurélio Gerosa ([email protected]) 13

“Software Engineering = Multi-person development of multi-version programs”David Parnas

Page 14: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Frederick Brooks

2014 Marco Aurélio Gerosa ([email protected]) 14

Frederick Brooks

http://en.wikipedia.org/wiki/The_Mythical_Man-Month#Communication

“To avoid disaster, all the teams working on a project should remain in contact with each other in as many ways as possible”

Page 15: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Conway’s law

2014 Marco Aurélio Gerosa ([email protected]) 15

“Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations”

Page 16: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Agile manifesto

2014 Marco Aurélio Gerosa ([email protected]) 16

“Individuals and interactions over processes and tools”

Page 17: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

2014 Marco Aurélio Gerosa ([email protected]) 17

Social aspects matter!

Page 18: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Mining software repositories

2014 Marco Aurélio Gerosa ([email protected]) 18

Mining

Information about a project

Information about an ecosystem

Information about Software

Engineering

Decision making

Software understanding

Support maintenance

Empirical validation of ideas

& techniques

Collaboration and software production

Practitioner Researcher

Discussion listsComments on issuesCode commentsUser reportsQ&A sitesSocial media

Source code and artifacts

Issue trackersProject management systemsReputation systems

Applications

Tag cloud from MSR 2014 CFP

Page 19: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Coordination requirements

2014 Marco Aurélio Gerosa ([email protected]) 19

Cataldo, M., Dependencies in geographically distributed software development: Overcoming the limits of modularity, PhD Thesis

Santana, F. et a. “XFlow: An Extensible Tool for Empirical Analysis of Software Systems Evolution”. ESELAW 2011

Page 20: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Programmers who changed this function also changed …

2014 Marco Aurélio Gerosa ([email protected]) 20

Thomas Zimmermann, Peter Weissgerber, Stephan Diehl, and Andreas Zeller. 2005. Mining Version Histories to Guide Software Changes. IEEE Trans. Software Eng. 31, 6 (June 2005), 429-445. DOI=10.1109/TSE.2005.72 http://dx.doi.org/10.1109/TSE.2005.72

Page 21: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Some of our work

Page 22: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Change Coupling

Page 23: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Change coupling Files frequently changed together share some sort of dependency [Gall et al. 1998]

The concept has proven useful in several different studies:◦ Software quality analysis [Cataldo & Nambiar, 2010]◦ Bugs prediction [D’Ambros et al. , 2009a]◦ Change prediction and change impact analysis [Zimmermann et al., 2005]◦ Uncover cross-cutting concerns [Adams et al., 2010]◦ Uncover design flaws and opportunities for refactoring [D’Ambros et al.,

2009b]◦ Understand and evaluate software architecture [Zimmermann et al., 2003]◦ Requirements traceability [Ali et al., 2013]◦ Maintain documentation [Kagdi et al., 2006]

2014 Marco Aurélio Gerosa ([email protected]) 23

Strong change dependency from B to A (the opposite is a much weaker

dependency)

A logical dependency denotes an implicit and evolutionary relationship between software artifacts

Page 24: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

1.1) Structural x Change coupling

2014 Marco Aurélio Gerosa ([email protected]) 24

Analysis of 150K commits of the ASF showed that:- 93% of the change dependencies did not involve structural dependencies- 95% of the structural dependencies did not imply in a change dependency

Oliva, G.A., Gerosa, M. A. (2011) “On the Interplay between Structural and Logical Dependencies in Free Software”. Brazilian Symposium on Software Engineering (SBES 2011)

Overlap between change dependencies and structural dependencies

Gustavo Oliva, PhD candidate

S t r u c t u r a lD e p e n d e n c y C o - c h a n g e s ?

Page 25: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

1.2) Change coupling originsManual classification of commits to understand the origins of change coupling

2014 Marco Aurélio Gerosa ([email protected]) 25

CategoryJoint-

changesTotal %

Refactoring elements that belong to a same semantic class

80 19.6%

Structural dependencies on a changing semantic class

9 2.2%

Cross-cutting concerns 165 40.4%Overloaded revision 60 14.7%

Repository operations 21 5.1%Structural dependencies on

specific elements66 16.2%

Other reasons 7 1.7%Total 408  

Oliva, G.A., Santana, F., Gerosa, M. A., Souza, C. (2011) “Towards a Classification of Logical Dependencies Origins: A Case Study”. Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution (IWPSE-EVOL '11)

Gustavo Oliva, PhD candidate

Page 26: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

1.3) Preprocessing commits

2014 Marco Aurélio Gerosa ([email protected]) 26

Evaluation in the Apache code repository showed that the produced grouping corresponded to 4.6% of the number of commits

What about commit habits/practices/policies? Social aspects matter!

Oliva, G. A., Santana, F., Gerosa, M. A., Souza, C. (2012), “Preprocessing Change-Sets to Improve Logical Dependencies Identification”, 6th Int. Workshop on Software Quality and Maintainability (SQM 2012)

Commit = change?Using the sliding time window approach [Zimmermann & Weißgerber, 2004] to group SVN commits

Gustavo Oliva, PhD candidate

Page 27: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

How to identify design degradation?

Page 28: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

2) Design degradation identification

2014 Marco Aurélio Gerosa ([email protected]) 28

Rigidity and fragility [Martin & Martin, 2006] identification based on commit metadata

Oliva, G., Steinmacher, I., Wiese, I.S., Gerosa, M.A. “What Can Commit Metadata Tell Us About Design Degradation?”, In: 13th International Workshop on Principles on Software Evolution (IWPSE 2013), Saint Petersburg.

Gustavo Oliva, PhD candidate

5.3

Rigidity => designs difficult to change due to ripple effects => commit density (number of changed files per commit) Fragility => designs break in different areas when a change is performed => commit dispersion (distance in the directory tree among file paths included in a commit)

Page 29: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Who are the key developers?

Page 30: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

3) Key developers characterization

2014 Marco Aurélio Gerosa ([email protected]) 30

Key developers participation

Oliva, G., Santana, F.W., da Silva, J. T., Oliveira, K.C.M., Werner, C.M.L., Souza, C.R.B. & Gerosa, M.A., “Evolving the System’s Core: A Case Study on the Identification and Characterization of Key Developers in Apache Ant”, Computing and Informatics [to appear].

Page 31: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Do tests characteristics indicate the quality of the code under test?

Page 32: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

4) Unit tests’ feedback for code quality

2014 Marco Aurélio Gerosa ([email protected]) 32

Number of asserts indicate - Cyclomatic complexity?- LOC ?- Method calls ?

22 ASF projects 3 industry projects

- “Asserted Objects” metric presents better results than “number of asserts”- Statistically difference in 20% of the projects

Mauricio Aniche, PhD candidate

Aniche, M., Oliva, G.A., Gerosa, M.A., “What Do the Asserts in a Unit Test Tell Us about Code Quality? A Study on Open Source and Industrial Projects”, 17th European Conference on Software Maintenance and Reengineering (CSMR 2013).

Page 33: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Does refactoring reduce cyclomatic complexity?

Page 34: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

5) Refactoring

2014 Marco Aurélio Gerosa ([email protected]) 34

Most part of the documented refactoring does not reduce cyclomatic complexity. However, 23% of the documented refactoring reduce cyclomatic complexity while 12% of the other commits have the same effect.

Sokol, F., Aniche, M.F., Gerosa, M.A., “Does the Act of Refactoring Really Make Code Simpler? A Preliminary Study”,. In: IV Brazilian Workshop of Agile Methods (WBMA 2013).

Francisco Sokol, BSc

www.metricminer.org.br

Page 35: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Is it possible to predict changes, bugs, and change dependencies?

Page 36: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

6) Change prediction

2014 Marco Aurélio Gerosa ([email protected]) 36

Using social, process, and architectural metrics to improve change proneness prediction

13 projects, 6 classifiers, 11 metrics

Social and architectural metrics improved the prediction model

Similar results when considering projects grouped by change ratio

Igor Wiese, PhD candidate

Wiese, I. S, Nassif Jr, Steinmacher I, Re Reginaldo, Gerosa, M.A “Comparing communication and development networks for predicting file change proneness: An exploratory study considering process and social metrics”,. In: International  Workshop on Software Quality and Maintainability (SQM 2014).

Page 37: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

Why do newcomers dropout from OSS projects?

Page 38: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

7) Supporting Newcomers to OSS projects

Analysis of newcomers dropout reasons

Use of MSR to identify newcomers

2014 Marco Aurélio Gerosa ([email protected]) 38

Steinmacher, I., Wiese, I.S., Chaves, A.P., Gerosa, M.A., Why do newcomers abandon open source software projects?, 6th Int. Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2013)

Igor Steinmacher, PhD candidate

Systematic review on awareness in DSD [JCSCW 2013]

C O M U N I C A Ç Ã O

C O O R D E N A Ç Ã OC O O P E R A Ç Ã O

g e r a c o m p r o m i s s o s g e r e n c i a d o s p e l a

o r g a n i z a a s t a r e f a s p a r a

d e m a n d aA w a r e n e s s

  

 

c o m u m + a ç ã oA ç ã o d e t o r n a r c o m u m

c o + o r d e m + a ç ã oA ç ã o d e o r g a n i z a r

e m c o n j u n t o

c o + o p e r a r + a ç ã oA ç ã o d e o p e r a r

e m c o n j u n t o

Page 39: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

How to use MSR to support newcomers?

Page 40: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

7) Supporting Newcomers to OSS projects◦ Systematic Literature Review on barriers for newcomers to OSS◦ Qualitative analysis of interviews

402014 Marco Aurélio Gerosa ([email protected])

Igor Steinmacher, PhD candidate

Steinmacher, I., Wiese, I., Conte, T., Gerosa, M.A., Redmiles, D.F. The Hard Life of Newcomers to OSS Projects, 7th International Workshop on Cooperative and Human Aspects of Software Engineering Steinmacher, I.; Graciotto Silva, M. A. ; GEROSA, M.A., Barriers faced by newcomers to open source projects: a systematic review. 10th International Conference on Open Source Systems (OSS 2014)

Page 41: Sociotechnical Mining Information From Software …...6) Change prediction 2014 Marco Aurélio Gerosa (gerosa@ime.usp.br) 36 Using social, process, and architectural metrics to improve

2014 Marco Aurélio Gerosa ([email protected]) 41