[ieee 2013 35th international conference on software engineering (icse) - san francisco, ca, usa...

4
YODA: Young and newcOmer Developer Assistant Gerardo Canfora 1 , Massimiliano Di Penta 1 , Stefano Giannantonio 2 , Rocco Oliveto 2 , Sebastiano Panichella 1 1 University of Sannio, Via Traiano, 82100 Benevento, Italy 2 University of Molise, Contrada Fonte Lappone, 86090 Pesche (IS), Italy {canfora,dipenta}@unisannio.it, [email protected], [email protected], [email protected] Abstract—Mentoring project newcomers is a crucial activity in software projects, and requires to identify people having good communication and teaching skills, other than high expertise on specific technical topics. In this demo we present Yoda (Young and newcOmer Developer Assistant), an Eclipse plugin that identifies and recommends mentors for newcomers joining a software project. Yoda mines developers’ communication (e.g., mailing lists) and project versioning systems to identify mentors using an approach inspired to what ArnetMiner does when mining advisor/student relations. Then, it recommends appropriate men- tors based on the specific expertise required by the newcomer. The demo shows Yoda in action, illustrating how the tool is able to identify and visualize mentoring relations in a project, and suggest appropriate mentors for a developer who is going to work on certain source code files, or on a given topic. Demo URL: http://youtu.be/4yrbYT-LAXA Index Terms—Developer Mentoring, Developer Recom- menders, Mining Software Repositories. I. I NTRODUCTION When a new developer—let us refer to her as Alice—joins a software project, she needs to become confident with the software components she is going to work on. In the context of small projects or small organizations, Alice will easily find a person able to instruct her about the new project and—if she is also joining the organization for the first time—about other aspects of the development process, such as code and quality standards, tool to be adopted, etc. Let us now consider a different context. Alice is joining a large (more than 1,000 developers), worldwide distributed open source project, to work on a very specific component, say a network protocol component. First, not everybody would have the specific skills to train Alice. Second, let us assume that Alice was able to get in touch with a senior project developer, Bob, skilled enough in network protocols. It turns out that this person does not have very good training capability. For this reason—and also considering that most of the communication occurs through emails because of the geographically-distributed nature of the project—Alice does not benefit enough from the communication with Bob. Alice should have contacted Jim instead of Bob. While possessing less technical experience than Bob, Jim already demonstrated himself capable to successfully train people in the past. In summary, a good mentor is not just an expert, is also a person with very good communication skills and, above all, with good attitude to instruct other people. Previous studies have indicated the importance of mentoring in software projects [1]. Also, a survey we conducted over five open source projects [2] revealed that developers perceive the mentoring activity as important/very important. More important, they indicated that, in such activity, communication skills can be more important than technical expertise. In order to help newcomers in finding suitable mentors (or to help software project managers in recommending mentors to newcomers), we have defined Yoda (Young and newcOmer Developer Assistant), an approach to identify likely mentors in software projects by mining data from software repositories, and to support the project manager in recommending possible mentors when a newcomer joins a project [2]. The Yoda approach [2] consists of two phases. In a first phase, Yoda identifies good past project mentors, using heuristics inspired to those [3] ArnetMiner 1 uses to identify advisor-student relationships in academic collaborations. In essence, Jim would be a good Mentor for Alice if (i) Jim and Alice exchanges a high number of emails, Jim has a larger communication activity than Alice, (iii) Jim joined the project before Alice, and (iv) during the first period of her activity, Alice mainly collaborated with Jim. In a second phase, based on a specific request for help of the newcomers, Yoda finds and recommends mentors—among those identified in the first phase—skilled on the requested topic. This is done using an IR-based approach similarly to what Anvik et al. previously proposed for bug triaging [4]. Results of an empirical evaluation conducted on data from five open source projects—Apache httpd, the FreeBSD kernel, PostgreSQL, Python, and Samba—indicate that Yoda is able to identify mentors with a precision greater than 80% and to recommend mentors with a precision greater than 77% [2]. In this demo, we present an Eclipse plugin that implements the Yoda approach and that, based on the project source code repository, its mailing list, and a query the newcomer explicitly makes (e.g., “I need help on the network component protocol”), or a query inferred from the files opened by the newcomer, is able to recommend appropriate mentors. Paper structure. Section II describes the Yoda approach. Section III introduces the Yoda Eclipse plugin and shows it in action on a concrete scenario taken from the Samba project. Section IV concludes the paper. II. YODA: THE METHOD This section summarizes the approach to identify and rec- ommend mentors. Further details about the approach and its validation can be found in [2]. 1 http://arnetminer.org 978-1-4673-3076-3/13/$31.00 c 2013 IEEE ICSE 2013, San Francisco, CA, USA Formal Demonstrations 1331

Upload: sebastiano

Post on 24-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

YODA: Young and newcOmer Developer AssistantGerardo Canfora1, Massimiliano Di Penta1, Stefano Giannantonio2, Rocco Oliveto2, Sebastiano Panichella1

1University of Sannio, Via Traiano, 82100 Benevento, Italy2University of Molise, Contrada Fonte Lappone, 86090 Pesche (IS), Italy

{canfora,dipenta}@unisannio.it, [email protected], [email protected], [email protected]

Abstract—Mentoring project newcomers is a crucial activityin software projects, and requires to identify people having goodcommunication and teaching skills, other than high expertise onspecific technical topics. In this demo we present Yoda (Young andnewcOmer Developer Assistant), an Eclipse plugin that identifiesand recommends mentors for newcomers joining a softwareproject. Yoda mines developers’ communication (e.g., mailinglists) and project versioning systems to identify mentors usingan approach inspired to what ArnetMiner does when miningadvisor/student relations. Then, it recommends appropriate men-tors based on the specific expertise required by the newcomer.The demo shows Yoda in action, illustrating how the tool is ableto identify and visualize mentoring relations in a project, andsuggest appropriate mentors for a developer who is going towork on certain source code files, or on a given topic.Demo URL: http://youtu.be/4yrbYT-LAXA

Index Terms—Developer Mentoring, Developer Recom-menders, Mining Software Repositories.

I. INTRODUCTION

When a new developer—let us refer to her as Alice—joinsa software project, she needs to become confident with thesoftware components she is going to work on. In the contextof small projects or small organizations, Alice will easily finda person able to instruct her about the new project and—if sheis also joining the organization for the first time—about otheraspects of the development process, such as code and qualitystandards, tool to be adopted, etc.

Let us now consider a different context. Alice is joininga large (more than 1,000 developers), worldwide distributedopen source project, to work on a very specific component,say a network protocol component. First, not everybodywould have the specific skills to train Alice. Second, let usassume that Alice was able to get in touch with a seniorproject developer, Bob, skilled enough in network protocols.It turns out that this person does not have very good trainingcapability. For this reason—and also considering that mostof the communication occurs through emails because of thegeographically-distributed nature of the project—Alice doesnot benefit enough from the communication with Bob. Aliceshould have contacted Jim instead of Bob. While possessingless technical experience than Bob, Jim already demonstratedhimself capable to successfully train people in the past. Insummary, a good mentor is not just an expert, is also a personwith very good communication skills and, above all, with goodattitude to instruct other people.

Previous studies have indicated the importance of mentoringin software projects [1]. Also, a survey we conducted overfive open source projects [2] revealed that developers perceive

the mentoring activity as important/very important. Moreimportant, they indicated that, in such activity, communicationskills can be more important than technical expertise.

In order to help newcomers in finding suitable mentors (orto help software project managers in recommending mentorsto newcomers), we have defined Yoda (Young and newcOmerDeveloper Assistant), an approach to identify likely mentors insoftware projects by mining data from software repositories,and to support the project manager in recommending possiblementors when a newcomer joins a project [2].

The Yoda approach [2] consists of two phases. In afirst phase, Yoda identifies good past project mentors, usingheuristics inspired to those [3] ArnetMiner1 uses to identifyadvisor-student relationships in academic collaborations. Inessence, Jim would be a good Mentor for Alice if (i) Jim andAlice exchanges a high number of emails, Jim has a largercommunication activity than Alice, (iii) Jim joined the projectbefore Alice, and (iv) during the first period of her activity,Alice mainly collaborated with Jim. In a second phase, basedon a specific request for help of the newcomers, Yoda findsand recommends mentors—among those identified in the firstphase—skilled on the requested topic. This is done using anIR-based approach similarly to what Anvik et al. previouslyproposed for bug triaging [4].

Results of an empirical evaluation conducted on data fromfive open source projects—Apache httpd, the FreeBSD kernel,PostgreSQL, Python, and Samba—indicate that Yoda is ableto identify mentors with a precision greater than 80% and torecommend mentors with a precision greater than 77% [2].

In this demo, we present an Eclipse plugin that implementsthe Yoda approach and that, based on the project sourcecode repository, its mailing list, and a query the newcomerexplicitly makes (e.g., “I need help on the network componentprotocol”), or a query inferred from the files opened by thenewcomer, is able to recommend appropriate mentors.

Paper structure. Section II describes the Yoda approach.Section III introduces the Yoda Eclipse plugin and shows it inaction on a concrete scenario taken from the Samba project.Section IV concludes the paper.

II. YODA: THE METHOD

This section summarizes the approach to identify and rec-ommend mentors. Further details about the approach and itsvalidation can be found in [2].

1http://arnetminer.org

978-1-4673-3076-3/13/$31.00 c© 2013 IEEE ICSE 2013, San Francisco, CA, USAFormal Demonstrations

1331

A. Mentor Identification

As mentioned in the introduction, first of all we identifywho, in the past, demonstrated to be a good mentor in aproject. To this aim, we first identify all project newcomersover time, and all possible pairs of a developer and a projectnewcomer. Then, we rank these pairs using a score inspired towhat ArnetMiner uses when recognizing advisors in academiccollaborations. Specifically, the score (mentorship level) be-tween newcomer nc and project member tmk is defined as:

ML(nc, tmk) =

5∑i=1

wi · fi (1)

where the fi factors are defined as follows:

• f1: captures whether, after a newcomer joins a project,she mainly collaborates with a specific person. Moreprecisely, it is the percentage of emails exchanged in thefirst period of activity of a newcomer (by default set toone month) with a given project member.

• f2: captures the fact that candidate mentors are people byfar more prolific—in terms of exchanged emails—thannewcomers. It is measured in terms of the difference be-tween the number of emails sent/received by a candidatementor and those sent/received by the newcomer.

• f3: captures the “age” difference between the newcomerand the candidate mentor, which is supposed to have along experience in the project. It is measured as the dif-ference in months between the date when the newcomerexchanged her first email in the project mailing list andthe date when the candidate mentor did.

• f4: captures whether the mentor is one of the peoplethe newcomer starts to collaborate first. It is defined interms of time difference (in months) between the firstemail exchanged by a newcomer, and the first email thenewcomer exchanged with the likely mentor.

• f5: captures the technical activity performed by thecandidate mentor, measured by the number of commitsshe has performed.

To allow aggregating the five factors, we normalize them inthe interval [0 . . . 1]. The aggregation into the score of equation(1) weights the five factors using coefficients configurable bythe user. A previous empirical analysis [2] showed that thesetting (w1 = 0.5, w2 = 0.25, w3 = 0.25, w4 = 0, w5 = 0)provides the best performances for the projects mentioned inthe introduction. As it can be noticed, the technical activity(f5) is, per se, not essential for identifying candidate mentors.

Then, instead of cutting the ranked list using an arbitrarythreshold, we use a scaled threshold t [5] based on the valuesof the factors computed considering the newcomer nc andthe top developers in the ranked list, tnc = λ · TOPnc,where TOPnc is the value of the ML computed between thenewcomer nc and the top developer in the list, while λ ∈ [0, 1].Although λ can be calibrated, we found that λ = 0.5 ensures—at least on our dataset—a precision above 60%.

Versioning system

Mailinglist

Developers' Communication

Eclipse + Yoda

Factbase

Developer

Asks for help

Writes code

Recommends mentors

Project Manager

Recommendsmentor

Browses info about developers

Fig. 1. Yoda in Eclipse: information flows.

B. Mentor Recommendation

The second phases of the approach aims at selectingmentors—among the candidate ones—that better fit theskills/knowledge required by a specific newcomer. Yoda relieson an approach largely inspired to bug triaging approaches [4],[6], [7], which assign a bug to a developer that in the pastworked on a bug having a (textually) similar bug report.

We instantiate an Information Retrieval (IR) [8] processto rank the available mentors, where each document di withi = 1 . . . n consists of the union of the text of all emailsexchanged in the past by a candidate mentor tmi, while thequery qnc is represented by a request for help submitted by thenewcomer nc. Both the query and documents are processedby applying the various phases of an IR process (stop wordsremoval, Porter stemming, tf-idf weighting schema), and thenthe query is compared with the di using an asymmetric Dicesimilarity measure [8], appropriate for cases like this onewhere the document is by far longer than the query.

III. INTEGRATING YODA IN ECLIPSE

Fig. 1 depicts the Yoda Eclipse plugin flow of information.Specifically, Yoda extracts communication information byparsing mailing lists downloaded from the project mailing listarchives, and retrieves information about changes performedby developers by retrieving the commit logs from the version-ing systems. Such information is stored in a fact database,and is used to identify a list of candidate mentors to berecommended to newcomers/project managers.

To use Yoda, the developer (or project manager) has to setsome preferences (including various Yoda parameters). Then,as shown in Fig. 2, she can mine data from mailing lists andfrom the versioning system to identify mentoring relationshipsin the past project history.

In the following, we will describe how Yoda can be usedfrom two perspectives: (i) of a project manager, who wants tohelp various newcomers in the project, recommending them

1332

Fig. 2. Mining candidate mentors from software repositories.

appropriate mentors, and, in general, is interested to monitorproject collaborations to understand who is assuming a lead-ership role; and (ii) of a project newcomer, who is willingto contribute to a feature involving some specific source codefiles, and needs some support for her work. Our demonstrationis based on data from the Samba project. For privacy reasonswe have anonymized last names in the screenshots. A moredetailed description of Yoda at work is provided in a videoavailable online2.

Concerning the first perspective, the project manager—by selecting “Mentorship graph” from the Yoda menu—canaccess a tab with a list of developers (Fig. 3(a)), and select adeveloper for which one wants to understand the role playedin the project. The list shown in this tab already indicateswhether the developer played the role of mentor, mentoree, orboth. After selecting a developer, it is possible to visualize acollaboration graph (Fig. 3(b)), that starts from the selecteddeveloper and represents mentor/mentoree relationships up toa given distance set in the Yoda preference, e.g., only directrelations if the distance is one, mentor/mentoree of developersmentored by the selected developers if the distance is two,etc. For a given developer, green arrows are directed towardsmentorees, while red arrows are directed towards mentors.In our example of Fig. 3(b), we can see a graph startingfrom Stefan M., and representing relations towards his mentors(Andrew B. and Jelmer V.) and towards his mentorees (SamL. and Anatoliy A.). It can be noticed that, for the sake ofsimplicity, the opposite relations (e.g., red arrow between SamL. and Stefan M.) are not shown.

Clicking on a specific developer—say Stefan M. in ourcase—a new tab (Fig. 3(c)), is shown, from which it is possibleto analyze detailed social and technical information of thedeveloper. Specifically, it is possible to know when the devel-oper entered the project, the number of commits performed

2http://distat.unimol.it/tools/YODA/

(a) Selecting a developer

(b) Mentorship graph

(c) Detailed information about a specific developer

Fig. 3. How Yoda (a) shows mentorship relations in a project and (b) allowsto browse information about a developer and to get in touch with him.

over different years, and the list of her mentors/mentorees. Ifneeded, the tool also allows to send an email to this developer.

Let us now see how Yoda can be used from a newcomer’sperspective. In such a perspective, Yoda allows a developerto formulate a request for help in two ways, namely, (i)implicit query, based on the context—i.e., source code files—which the newcomer is (interested to) working on; or (ii)explicit query, i.e., by writing a natural language sentenceexpressing the need for help on a particular topic, component,

1333

(a) Seeking experts about specific source code files

(b) Getting available mentors with the desired expertise

Fig. 4. Mentor recommendation using an implicit query based on the contextwhich the developer is working on.

etc. As for the implicit query, let us suppose a developer isworking on a specific source code file, say auth sam reply.c.By right-clicking (Fig. 4(a)) on the file name, and by selecting“Show mentor” from the Yoda menu, it is possible to identifydevelopers that have enough expertise for such a file. The edgebetween the file and the developer is labeled with a confidencevalue (between 0 and 1) corresponding to the Dice similaritybetween the (implicit) query and the developer corpus. Amongthe various developers, Yoda can recommend—with a different(green) edge, and with a different icon—the availability ofpeople that, besides being expert on the file also demonstratedto be good mentors in the past. As for the explicit query, letus suppose a developer has a specific request for help. Byselecting “Find mentor” from the Yoda menu, it is possibleto submit a request for help and identify developers that haveenough expertise for such a request (Fig. 5). Specifically, oncesubmitted the request for help, Yoda provides the newcomerwith the list of candidate mentors ranked according to the Dicesimilarity between the query and the developer corpus.

IV. CONCLUSIONS AND WORK-IN-PROGRESS

This demo presented Yoda, an Eclipse plugin aimed atrecommending mentors in software projects. The tool is based

Fig. 5. Explicit (natural language) request for mentor.

on an approach [2] that uses information from mailing lists andversioning systems to identify who, in the past, turned out tobe a good mentor, similarly to how ArnetMiner recognizesstudent/advisor relationships. Then, it recommends mentorswith a specific expertise using an approach similar to IR-basedbug triaging [4]. The tool can be used by developers to getcontext-sensitive help, or by project managers to understandthe role played by various developers in the project, and torecommend mentors to project newcomers.

Future work aims at enhancing Yoda with additional fea-tures, above all the possibility to support project managers inbuilding teams. We also plan to integrate in Yoda an instantmessaging system in order to allow —other than asynchronouscommunications—also synchronous communications betweenproject developers.

REFERENCES

[1] B. Dagenais, H. Ossher, R. K. E. Bellamy, M. P. Robillard, and J. de Vries,“Moving into a new software project landscape,” in Proceedings ofthe 32nd ACM/IEEE International Conference on Software Engineering.Cape Town, South Africa: ACM Press, 2010, pp. 275–284.

[2] G. Canfora, M. Di Penta, R. Oliveto, and S. Panichella, “Who is goingto mentor newcomers in open source projects?” in 20th ACM SIGSOFTSymposium on the Foundations of Software Engineering (FSE-20),, Cary,NC, USA - November 11 - 16, 2012. Research Triangle Park, NC, USA:ACM Press, 2012, p. 44.

[3] C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, and J. Guo, “Miningadvisor-advisee relationships from research publication networks,” in Pro-ceedings of the 16th International Conference on Knowledge Discoveryand Data Mining. Washington, DC, USA: ACM, 2010, pp. 203–212.

[4] J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage:Recommenders for development-oriented decisions,” ACM Transactionson Software Engineering and Methodology, vol. 20, no. 3, p. 10, 2011.

[5] G. Antoniol, G. Canfora, G. Casazza, A. De Lucia, and E. Merlo,“Recovering traceability links between code and documentation,” IEEETransactions on Software Engineering, vol. 28, no. 10, pp. 970–983, 2002.

[6] G. Canfora and L. Cerulo, “Supporting change request assignment inopen source development,” in Proceedings of the 2006 ACM Symposiumon Applied Computing. Dijon, France: ACM Press, 2006, pp. 1767–1772.

[7] A. Tamrawi, T. T. Nguyen, J. M. Al-Kofahi, and T. N. Nguyen, “Fuzzyset and cache-based approach for bug triaging,” in Proceedings of the19th Symposium on the Foundations of Software Engineering and 13rdEuropean Software Engineering Conference. Szeged, Hungary: ACMPress, 2011, pp. 365–375.

[8] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval.Addison-Wesley, 1999.

1334