co-evolving code-related and database-related changes in data intensive software system
DESCRIPTION
Current empirical studies on the evolution of software systems are primarily analyzing source code. Very few studies, however, focus on data-intensive software systems (DISS), in which a significant part of the total development effort is devoted to maintaining and evolving the database schema, typically through the use of a specific database management system and database technology (such as Hibernate). Adding new functionality to the system may affect the database structure and, conversely, modifying the database structure may impact the source code associated to it. Because of this, evolution of the DISS requires the source code and the database schema to co-evolve. Very little empirical studies have been carried out to study the co-evolution between source code changes and database schema changes in a DISS.TRANSCRIPT
Co-‐Evolving Code-‐Related and Database-‐Related Changes in aData-‐Intensive SoEware System
Mathieu Goeminne, Alexandre Decan, Tom MensService de Génie Logiciel, Université de Mons
FNRS Projet de Recherche “Data-‐Intensive SoEware System EvoluIon”in collaboraIon with A. Cleve and L. Meurice (Université de Namur)
hPp://informaIque.umons.ac.be/genlog/projects/disse
16 December 2013 -‐ BENEVOL, Mons
Context
• Focus on data-‐intensive so0ware systems (DISS)
• Expand empirical MSR research to include database-‐related acBviBes
• Study co-‐evoluBon between code and database
• Carry out empirical studies on open source DISS
2
16 December 2013 -‐ BENEVOL, Mons
Research QuesIons
• RQ1: Is there any relaIon between how source code files and database-‐related files evolve?
• RQ2: What is the effect of migraIng to new database technology?
• RQ3: How do developers divide their work and how does this evolve over Ime?
3
16 December 2013 -‐ BENEVOL, Mons
Case Study: OSCAR
• Canadian research network SCOOP– Social Collaboratory for Outcome Oriented Primary care
• hPp://scoop.leadlab.ca
• Open source tool infrastructure for Electronic Medical Records (EMR)
• hPp://github.com/scoophealth
• OSCAR: EMR system for healthcare– Support for billing, chronic disease management tools, prescripIon module, scheduling, ...• Data available on hPps://github.com/scoophealth/oscar.git
4
16 December 2013 -‐ BENEVOL, Mons
Case Study: OSCAR
5
characteris/c value
duraIon 3,939 days ( > 129 months)
dates from Nov 2002 Ill Aug 2013
number of commits 18,727
number of disInct files 20,718 (of which 54% code files)
number of file touches 93,721
number of disInct developers 100
16 December 2013 -‐ BENEVOL, Mons
EvoluIon of OSCAR
• Monthly aggregated proporIon of JSP and Java files in OSCAR
6
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
2003-07#
2004-01#
2004-07#
2005-01#
2005-07#
2006-01#
2006-07#
2007-01#
2007-07#
2008-01#
2008-07#
2009-01#
2009-07#
2010-01#
2010-07#
2011-01#
2011-07#
2012-01#
2012-07#
2013-01#
jsp#
java#
16 December 2013 -‐ BENEVOL, Mons
EvoluIon of OSCAR -‐ Social Dimension
• Monthly number of disInct acIve developers for OSCAR
7
0"
5"
10"
15"
20"
25"
2003'07"
2004'01"
2004'07"
2005'01"
2005'07"
2006'01"
2006'07"
2007'01"
2007'07"
2008'01"
2008'07"
2009'01"
2009'07"
2010'01"
2010'07"
2011'01"
2011'07"
2012'01"
2012'07"
2013'01"
16 December 2013 -‐ BENEVOL, Mons
EvoluIon of OSCAR
• Growth of source code files and database-‐related files
8
0"
1000"
2000"
3000"
4000"
5000"
6000"
2003)07"
2004)01"
2004)07"
2005)01"
2005)07"
2006)01"
2006)07"
2007)01"
2007)07"
2008)01"
2008)07"
2009)01"
2009)07"
2010)01"
2010)07"
2011)01"
2011)07"
2012)01"
2012)07"
2013)01"
pure"
sql"
16 December 2013 -‐ BENEVOL, Mons
EvoluIon of OSCAR -‐ Social Dimension
• How does the acIvity of developers evolve over Ime?
9
Develop
er
16 December 2013 -‐ BENEVOL, Mons
IntroducIon of Persistence Provider
• Hibernate (introduced in OSCAR since July 2006)– Java object-‐relaIonal mapping (ORM) library
• XML files map Java classes to database tables and Java data types to SQL data types
• facilitates data query and retrieval• generates SQL calls and relieves the developer from manual result set handling and object conversion
• JPA (introduced in OSCAR since July 2008)– Java Persistence API
– Uses Java annotaIons instead of XML files for ORM
10
16 December 2013 -‐ BENEVOL, Mons
IntroducIon of Persistence Provider
• SQL = code file containing embedded SQL query
• HIB = Java file targeted by Hibernate XML file
• JPA = Java file containing JPA annotaIon
• pure = code files not containing any of these
11
0%#10%#20%#30%#40%#50%#60%#70%#80%#90%#
100%#
2003-07-01#
2004-01-01#
2004-07-01#
2005-01-01#
2005-07-01#
2006-01-01#
2006-07-01#
2007-01-01#
2007-07-01#
2008-01-01#
2008-07-01#
2009-01-01#
2009-07-01#
2010-01-01#
2010-07-01#
2011-01-01#
2011-07-01#
2012-01-01#
2012-07-01#
2013-01-01#
2013-07-01#
JPA# HIB#
SQL# pure#
Monthly aggregated number of acIve files
0"
200"
400"
600"
800"
1000"
1200"
1400"
1600"
2003)07)01"
2003)11)01"
2004)03)01"
2004)07)01"
2004)11)01"
2005)03)01"
2005)07)01"
2005)11)01"
2006)03)01"
2006)07)01"
2006)11)01"
2007)03)01"
2007)07)01"
2007)11)01"
2008)03)01"
2008)07)01"
2008)11)01"
2009)03)01"
2009)07)01"
2009)11)01"
2010)03)01"
2010)07)01"
2010)11)01"
2011)03)01"
2011)07)01"
2011)11)01"
2012)03)01"
2012)07)01"
2012)11)01"
2013)03)01"
2013)07)01"
JPA" HIB" SQL"
16 December 2013 -‐ BENEVOL, Mons
IntroducIon of Persistence Provider
• Who is involved in introducing changes in database-‐related code?
12
Develop
er
16 December 2013 -‐ BENEVOL, Mons
EvoluIon of OSCAR -‐ Social Dimension
• How do developers divide their work?
13
Number of developers that introduce database-related code in some file for the first time
Java$(87)$ JSP$(86)$
OSCAR$developers$(100)$3"
24"
10" 10"
1" 0"
SQL$(53)$
HIB$
24"
JPA$9" 8" 11"
16 December 2013 -‐ BENEVOL, Mons
Preliminary Conclusions
• RQ1: Code-‐related and database-‐related files evolve together (no “phased” co-‐evoluIon)
• RQ2: MigraIon to Hibernate, then JPA, but embedded SQL sIll remains important
• RQ3: No clear separaIon of acIviIes between developers– The majority of developers changes both db-‐related and db-‐unrelated code
– No observed periods dedicated to a specific acIvity
14
16 December 2013 -‐ BENEVOL, Mons
Future Work
• ConInue studying co-‐evoluIon between code-‐related and db-‐related changes– Refine our results by analysing changes at finer granularity
• Analyse database schema changes and their impact on source code (collaboraIon with UNamur)– Detect change paPerns in code and database schema
• Study impact of introducing persistence providers– Analyse migraIon paPerns in code
– How do persistence providers reduce impact of changes in database schema?
• Study and compare with other DISS
15