process mining software repositories
DESCRIPTION
Process Mining Software Repositories. Master project kickoff presentation Wouter Poncin , [email protected]. Agenda. Introduction Existing approaches Project goal Prototype Design Current work. Introduction. Software development teams Software repositories Analysis. - PowerPoint PPT PresentationTRANSCRIPT
Process Mining Software Repositories
Master project kickoff presentation
Wouter Poncin, [email protected]
/ Department of Mathematics and Computer Science 19-04-2023
Agenda
• Introduction• Existing approaches• Project goal• Prototype• Design• Current work
PAGE 2
/ Department of Mathematics and Computer Science 19-04-2023
Introduction
• Software development teams• Software repositories• Analysis
PAGE 3
/ Department of Mathematics and Computer Science 19-04-2023
Existing approaches
• NavTracks [Sin05]• eROSE [Zim05]• DynaMine [Liv05]• MarmoSet [Spa05]• projectWatcher [Gut04]
• Traceability links [Kag07]• Improve bug finding [Wil05]• Predict change [Yin04]
PAGE 4
/ Department of Mathematics and Computer Science 19-04-2023
Existing approaches – multiple data sources
• Hipikat: recommends relevant software artifacts based on the current context of a developer [Čub05]
PAGE 5
Images from: http://www.cs.ubc.ca/labs/spl/projects/hipikat/
/ Department of Mathematics and Computer Science 19-04-2023
Existing approaches – multiple data sources
• Alitheia Core: a platform for software engineering research [Gou09]
PAGE 6
Images from: http://www.sqo-oss.org/
/ Department of Mathematics and Computer Science 19-04-2023
Existing approaches – multiple data sources
• Other approaches:• Wolf et al. [Wol09]:
Mining task-based social networks to explore collaboration in software teams.
• Bird et al. [Bir06]:Mining email social networks
• Robles et al. [Rob05]:Developer identification methods for integrated data from various sources
PAGE 7
/ Department of Mathematics and Computer Science 19-04-2023
Existing approaches – problems
• Mostly single data source• Problems with multiple data source approaches:
• Provide artifact centered view (Hipikat)• Focus on metric calculation (Alitheia Core)
• No analysis on global process overview• Example analysis questions:
− How does the real (mined) organizational model relate to the ‘used’ organizational model?
− How to classify developers of open source projects? [Nak02]− Does the project follow a given development process model?
(waterfall / XP / …)
PAGE 8
/ Department of Mathematics and Computer Science 19-04-2023
Existing approaches – problems
• Mostly single data source• No analysis on global process overview• Solution: process mining
PAGE 9
19-04-2023
Intermezzo: process mining
/ Department of Mathematics and Computer Science PAGE 10
Image from: http://prom.win.tue.nl/research/wiki/_detail/research/processmining.gif
/ Department of Mathematics and Computer Science 19-04-2023
Intermezzo: process mining
• Input: event log• Output: models
PAGE 11
Case ID Task Name Event Type Originator Timestamp 1 File Fine Completed Anne 20-07-2004 14:00:002 File Fine Completed Anne 20-07-2004 15:00:001 Send Bill Completed system 20-07-2004 15:05:002 Send Bill Completed system 20-07-2004 15:07:003 File Fine Completed Anne 21-07-2004 10:00:003 Send Bill Completed system 21-07-2004 14:00:004 File Fine Completed Anne 22-07-2004 11:00:004 Send Bill Completed system 22-07-2004 11:10:001 Process Payment Completed system 22-07-2004 15:05:001 Close Case Completed system 24-07-2004 15:06:002 Send Reminder Completed Mary 20-08-2004 10:00:003 Send Reminder Completed John 21-08-2004 10:00:002 Process Payment Completed system 22-08-2004 09:05:002 Close case Completed system 22-08-2004 09:06:004 Send Reminder Completed John 22-08-2004 15:10:004 Send Reminder Completed Mary 22-08-2004 17:10:004 Process Payment Completed system 29-08-2004 14:01:00 4 Close Case Completed system 29-08-2004 17:30:003 Send Reminder Completed John 21-09-2004 10:00:003 Send Reminder Completed John 21-10-2004 10:00:003 Process Payment Completed system 25-10-2004 14:00:003 Close Case Completed system 25-10-2004 14:01:00
Example from: [Med09]
/ Department of Mathematics and Computer Science 19-04-2023
Project goal
• The goal of this project is to develop an application which facilitates process analysis of data from various software repositories, in an easy manner.
• Facilitate export data to log• Various repositories combine data• Various repositories later add new types of data• Easy manner add a data source by URL
• Open source & closed source projects
PAGE 12
/ Department of Mathematics and Computer Science 19-04-2023
Prototype
• Console application• Input: repository url’s• Output: MXML process log• Analysis: ProM
• Simple developer matching• High level events• Case: originator
PAGE 13
/ Department of Mathematics and Computer Science 19-04-2023
Prototype
• Project: Gallery (web based photo gallery software)http://sourceforge.net/projects/gallery/
• Used data sources:• SVN repository (20740 revisions)• TRAC tickets (1028)• Mailing list archives:
‘devel’ (2867 messages), ‘translate’ (108 messages),‘announce’ (69 messages)
PAGE 14
/ Department of Mathematics and Computer Science 19-04-2023
Prototype – analysis
PAGE 16
Legend:- yellow: TRAC ticket- white: SVN revision- red: Mail (translations)- blue: Mail (devel)- green: Mail (announce)
/ Department of Mathematics and Computer Science 19-04-2023
Prototype – analysis
PAGE 17
Legend:- yellow: TRAC ticket- white: SVN revision- red: Mail (translations)- blue: Mail (devel)- green: Mail (announce)
/ Department of Mathematics and Computer Science 19-04-2023
Design
• Application requirements:• Support multiple data sources (software repositories)
• Caching of data from data sources• Define data filters• Developer matching• Define mapping from data elements to log elements• Easy addition of new plugins for data source types / export
types
PAGE 19
/ Department of Mathematics and Computer Science 19-04-2023
Design
• Issues• How to define a case• Level of granularity of events• How to define developer matching (manual/automatic)
PAGE 20
/ Department of Mathematics and Computer Science 19-04-2023
Design
• Data sources to support:• Subversion• CVS• Git (used for jQuery / mootools for example)
• Bugzilla• TRAC• Wiki articles (+history)• SourceForge mailinglists• SourceForge thumbs up/down• Twitter
PAGE 21
/ Department of Mathematics and Computer Science 19-04-2023
Design
• Analysis tools:• ProM: www.processmining.org (open source)• Futura Reflect: www.futuratech.nl• Interstage Business Process Manager• Fluxicon: www.fluxicon.com• And others…
PAGE 22
/ Department of Mathematics and Computer Science 19-04-2023
Current work
• Finish application development• Developer matching• Case definition• Internal cache• Implement data source plugins
• Analyze projects• (Large) open source projects
− Like Firefox, WordPress, Filezilla for example
• SEP / student projects
PAGE 23
/ Department of Mathematics and Computer Science 19-04-2023
References
• [Bir06] Bird, C., Gourley, A., Devanbu, P., Gertz, M., Swaminathan, A. Mining email social networks. In MSR '06: Proceedings of the 2006 international workshop on Mining software repositories, pages 137–143, New York, NY, USA, (2006). ACM.
• [Čub05] Cubranic, D., Murphy, G.C., Singer, J., Booth, K.S. Hipikat: A project memory for software development. IEEE Trans. Softw. Eng., 31(6):446–465, (2005).
• [Gou09] Gousios, G., Spinellis, D. Alitheia core: An extensible software quality monitoring platform. Software Engineering, International Conference on, pages 579–582, (2009).
• [Gut04] Gutwin, C., Penner, R., Schneider, K. Group awareness in distributed software development. In CSCW '04: Proceedings of the 2004 ACM conference on Computer supported cooperative work, pages 72–81, New York, NY, USA, (2004).
• [Kag07] Kagdi, H., Maletic, J.I., Sharif, B. Mining software repositories for traceability links. In ICPC '07: Proceedings of the 15th IEEE International Conference on Program Comprehension, pages 145–154, Washington, DC, USA, (2007). IEEE Computer Society.
• [Liv05] Livshits, B., Zimmermann, T. DynaMine: nding common error patterns by mining software revision histories. In ESEC/FSE-13: Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 296–305, New York, NY, USA, (2005). ACM.
• [Med09] Medeiros, A.K.A. de, Aalst, W.M.P. van der. Process mining towards semantics. pages 35–80, (2009).
• [Moc00] Mockus, A., Fielding, R.T., Herbsleb, J. A case study of open source software development: the apache server. In ICSE '00: Proceedings of the 22nd international conference on Software engineering, pages 263–272, New York, NY, USA. ACM.
PAGE 25
/ Department of Mathematics and Computer Science 19-04-2023
References
• [Nak02] Nakakoji, K., Yamamoto, Y., Nishinaka, Y., Kishida, K., Ye, Y. Evolution patterns of open-source software systems and communities. In IWPSE '02: Proceedings of the International Workshop on Principles of Software Evolution, pages 76–85, New York, NY, USA, (2002). ACM.
• [Rob05] Robles, G., Gonzalez-Barahona, J.M. Developer identication methods for integrated data from various sources. In MSR '05: Proceedings of the 2005 international workshop on Mining software repositories, pages 1–5, New York, NY, USA, (2005). ACM.
• [Sin05] Singer, J., Elves, R., Storey, M. Navtracks: Supporting navigation in software maintenance. In ICSM '05: Proceedings of the 21st IEEE International Conference on Software Maintenance , pages 325–334, Washington, DC, USA, (2005). IEEE Computer Society.
• [Spa05] Spacco, J., Strecker, J., Hovemeyer, D., Pugh, W. Software repository mining with marmoset: an automated programming project snapshot and testing system. SIGSOFT Softw. Eng. Notes, 30(4):1–5, (2005).
• [Wil05] Williams, C.C., Hollingsworth, J.K. Automatic mining of source code repositories to improve bug finding techniques. Software Engineering, IEEE Transactions on, 31(6):466–480, June 2005.
• [Wol09] Wolf, T., Schröter, A., Damian, D., Panjer, L.D., Nguyen, T.H.D. Mining task-based social networks to explore collaboration in software teams. IEEE Softw., 26(1):58–66, (2009).
• [Yin04] Ying, A.T.T., Murphy, G.C., Ng, R., Chu-Carroll, M.C. Predicting source code changes by mining change history. IEEE Transactions on Software Engineering, 30(9), (2004).
• [Zim05] Zimmermann, T., Dallmeier, V., Halachev, K., Zeller, A. eROSE: guiding programmers in eclipse. In OOPSLA '05: Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, pages 186–187, New York, NY, USA, (2005). ACM.
PAGE 26