a data driven journey through research on software engineering

44
A DATA-DRIVEN JOURNEY THROUGH RESEARCH ON SOFTWARE ENGINEERING Mario Sangiorgio

Upload: mario-sangiorgio

Post on 19-Jun-2015

542 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: A data driven journey through research on software engineering

A DATA-DRIVEN JOURNEY THROUGH RESEARCH ON SOFTWARE ENGINEERING

Mario Sangiorgio

Page 2: A data driven journey through research on software engineering

MOTIVATION

Getting a better idea of what’s going on in software engineering research community

through a quantitative approach

Page 3: A data driven journey through research on software engineering

RELATED WORKS•C. Ghezzi - Keynote at ICSE 2008

Reflections on 40+ years of software engineering research and beyond

•L. Briand - Keynote at ICSM 2011Useful software engineering research: leading a double agent life

•D. Rosemblum - Keynote at ASE 2012Whither software engineering research?

Page 4: A data driven journey through research on software engineering

SUBJECTS OF OUR STUDY

researchers

affiliations geographical areas

research topics

Page 5: A data driven journey through research on software engineering

DATA

Page 6: A data driven journey through research on software engineering

ACADEMIC LITERATURE

Page 7: A data driven journey through research on software engineering

SELECTED PUBLICATIONS

REPRESENTATIVENESS

AUTHORITATIVENESS

Page 8: A data driven journey through research on software engineering

DATA SOURCES

Articles published and their authors

Citations, authors and affiliation details

COMPLETE XML DATABASE

APIs

Page 9: A data driven journey through research on software engineering

COLLECTED DATAVenue Number of papers From To

TSE 3043 1975 2012TOSEM 295 1992 2012

ICSE 2907 1976 2012ASE 1116 1997 2012

ESEC/FSE 416 1987 2012TOTAL 7777 1975 2012

9865 researchers 278794 citations

Page 10: A data driven journey through research on software engineering

ANALYSIS

Page 11: A data driven journey through research on software engineering

AUTHOR ANALYSIS

Who published the most?

Are there sub-communities?

Page 12: A data driven journey through research on software engineering

MOST PROLIFIC AUTHORSSoftware

EngineeringICSE ASE ESEC/FSE TSE TOSEM

Basili60

Bohem28

Xie24

Clarke8

Basili33

Notkin13

Notkin56

Basili26

Grundy18

D. Jackson8

Briand26

Rothermel8

Kramer49

Osterweil23

Hosking16

Ernst7

Weyuker18

Roman6

Harrold46

Kramer21

Egyed16

Notkin7

Knight17

Wolf6

Xie46

Notkin21

Lo16

Uchitel7

Kramer16

Harrold6

Page 13: A data driven journey through research on software engineering

SUB-COMMUNITY DETECTION

For each venue we consider the top most

prolific authors

We compute the set similarity between all

the pair of venuesJ(A,B) =

|A \B||A [B|

Page 14: A data driven journey through research on software engineering

SUB-COMMUNITIES

−0.2 0.0 0.2 0.4 0.6

−0.2

0.0

0.2

0.4

mds[,1]

mds[,2]

TSE

TOSEM

ICSE

ASE

FSE

Page 15: A data driven journey through research on software engineering

TOPIC ANALYSIS

What is the topic of a paper?

What are the hot topics in software engineering?

How have they evolved?

Page 16: A data driven journey through research on software engineering

CITATION NETWORK

Papers in the dataset

Page 17: A data driven journey through research on software engineering

CITATION NETWORK

Internal citations

Page 18: A data driven journey through research on software engineering

CITATION NETWORK

Complete citations

Citations from specific venues

Page 19: A data driven journey through research on software engineering

EXAMPLE

What is the topic of the yellow paper?

Page 20: A data driven journey through research on software engineering

EXAMPLEWhat is the topic of the yellow paper?

Topic Direct citationsTopic A 2Topic B 0General 1

What is the topic of the general paper?

Page 21: A data driven journey through research on software engineering

EXAMPLEWhat is the topic of the yellow paper?

Topic Direct citationsTopic A 2Topic B 1General 1

Topic profileTopic profile

Topic A 66%

Topic B 33%

Page 22: A data driven journey through research on software engineering

SOFTWARE ENGINEERING TOPICS

Topic Fraction of papersProgramming Languages 9.34%

Formal Methods 8.49%Software Reliability 6.13%Distributed Systems 5.96%

Software Maintenance 5.92%Testing 4.64%

Software Quality 4.53%Models 4.36%

Software Architectures 4.36%

Page 23: A data driven journey through research on software engineering

TOPICS IN THE ‘70STopic Fraction of papers

Programming Languages 16.71%Performance 7.95%

Operating Systems 7.29%Database Systems 6.84%Formal Methods 6.65%

Software Architectures 6.14%Knowledge Engineering 5.69%

Distributed Systems 4.94%Software Maintenance 4.18%

By far the most represented

Topics from other fields

Page 24: A data driven journey through research on software engineering

TOPICS IN THE ‘80STopic Fraction of papers

Programming Languages 10.48%Distributed Systems 9.30%

Knowledge Engineering 8.47%Software Reliability 6.68%Formal Methods 6.51%

Information Systems 5.55%Software Maintenance 5.04%

Models 4.35%Artificial Intelligence 3.74%

Significant rise

Other fields, related to

distributed systems

Not only code

Page 25: A data driven journey through research on software engineering

TOPICS IN THE ‘90STopic Fraction of papers

Formal Methods 8.29%Programming Languages 8.13%

Distributed Systems 6.80%Software Maintenance 6.55%Software Architectures 5.34%

Software Quality 4.80%Knowledge Engineering 4.67%

Models 4.65%Information Systems 4.40%

Change of the most published

topic

Focus on software quality

Page 26: A data driven journey through research on software engineering

TOPICS IN THE 2000STopic Fraction of papers

Formal Methods 9.93%Programming Languages 8.37%

Testing 6.86%Software Maintenance 6.58%

Software Reliability 6.22%Software Quality 5.72%

Models 4.80%Empirical Studies 4.76%

Software Architectures 4.38%

Analysis of open source repositories

Still lot of emphasis on

software quality

Page 27: A data driven journey through research on software engineering

NEED FOR A FINER ANALYSIS

SOLUTION: sliding window instead of fixed subdivision

Topics change constantly, not once in a decade

Page 28: A data driven journey through research on software engineering

TESTING

0

0.05

0.09

0.14

0.18

1975 1980 1985 1990 1995 2000 2005

Page 29: A data driven journey through research on software engineering

EMPIRICAL STUDIES

0

0.05

0.09

0.14

0.18

1975 1980 1985 1990 1995 2000 2005

Page 30: A data driven journey through research on software engineering

SERVICES

0

0.05

0.09

0.14

0.18

1975 1980 1985 1990 1995 2000 2005

Page 31: A data driven journey through research on software engineering

DISTRIBUTED SYSTEMS

0

0.05

0.09

0.14

0.18

1975 1980 1985 1990 1995 2000 2005

Page 32: A data driven journey through research on software engineering

PROGRAMMING LANGUAGES

0

0.05

0.09

0.14

0.18

1975 1980 1985 1990 1995 2000 2005

Page 33: A data driven journey through research on software engineering

PER-VENUE INSIGHTSVenue Peculiarities

TSE Biased towards empirical works

TOSEM More focused on formal aspects

ICSE Balanced with respect to other venues

ESEC/FSE Formal, with interests in testing, modeling and requirements engineering

ASE Interests in program analysis and automated reasoning

Page 34: A data driven journey through research on software engineering

AFFILIATION ANALYSIS

Where do the most prolific authors work?

How much research is done in industry?

Page 35: A data driven journey through research on software engineering

AFFILIATION PROFILE

Author AffiliationAuthor A 1Author B 2Author B 2

Affiliation profileAffiliation profile

Affiliation 1 33%

Affiliation 2 66%

Page 36: A data driven journey through research on software engineering

MOST PROLIFIC AFFILIATIONSAffiliation Papers

IBM 186.32Carnegie Mellon University 166.52University of Texas, Austin 122.62

University of Maryland 106.83Microsoft 101.63

AT&T Bell Laboratories 101.37University of California, Irvine 98.17

Georgia Institute of Technology 94.75Massachusetts Institute of Technology 93.24

University of Virginia 81.55

ALL FROM THE USA

Page 37: A data driven journey through research on software engineering

PER-VENUE INSIGHTSVenue Peculiarities

TSE Is the venue with more industrial contribution

TOSEM European universities among the top contributors

ICSE Balanced set of contributors we saw in the other venues

ESEC/FSE Despite ESEC, there is no bias towards Europe

ASE Industrial contribution is less relevant.Some affiliations appear only in its top list.

Is Europe more formal?

Is it linked to the presence of empirical works?

It is representative

Page 38: A data driven journey through research on software engineering

INDUSTRY VS ACADEMIA

0

0.25

0.50

0.75

1.00

1970 1975 1980 1985 1990 1995 2000 2005

Industry Academia

Page 39: A data driven journey through research on software engineering

GEOGRAPHICAL ANALYSIS

Where does the contribution come from?

Page 40: A data driven journey through research on software engineering

GEOGRAPHICAL AREAS

North America

Europe

Asia&

Oceania

AfricaSouth

America

Page 41: A data driven journey through research on software engineering

LOCATION OF A PAPERAffiliation profileAffiliation profile

Affiliation 1 20%Affiliation 2 30%Affiliation 3 50%

LocationsLocationsAffiliation 1 North AmericaAffiliation 2 EuropeAffiliation 3 Europe

Location profileLocation profile

North America 20%

Europe 80%

Page 42: A data driven journey through research on software engineering

GEOGRAPHICAL DISTRIBUTION

0

0.25

0.50

0.75

1.00

1970 1975 1980 1985 1990 1995 2000 2005Europe North America South America Asia & Oceania Africa

Page 43: A data driven journey through research on software engineering

CONCLUSION

Academic literature contains a lot of information about a scientific community

With data mining techniques we can unveil it and get some interesting insights

Page 44: A data driven journey through research on software engineering

QUESTIONS?