entagrec: an enhanced tag recommendation system for software information sites

27
EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites Shaowei Wang, David Lo, Bogdan Vasilescu, Alexander Serebrenik @b_vasilescu @aserebrenik

Upload: alexander-serebrenik

Post on 29-May-2015

325 views

Category:

Science


7 download

DESCRIPTION

Software engineers share experiences with modern technologies by means of software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged. To improve the quality of tags in software information sites, we propose EnTagRec, an automatic tag recommender based on historical tag assignments to software objects and we evaluate its performance on four software information sites, StackOverflow, AskUbuntu, AskDifferent and FreeCode. We observe that that EnTagRec achieves Recall@5 scores of 0.805, 0.815, 0.88 and 0.64, and Recall@10 scores of 0.868, 0.876, 0.944 and 0.753, on StackOverflow, AskUbuntu, AskDifferent and FreeCode, respectively. In terms of Recall@5 and Recall@10, averaging across the 4 datasets, EnTagRec improves TagCombine, which is the state of the art approach, by 27.3\% and 12.9\% respectively.

TRANSCRIPT

Page 1: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

Shaowei Wang, David Lo,

Bogdan Vasilescu, Alexander Serebrenik

@b_vasilescu @aserebrenik

Page 2: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 212-04-2023

Page 3: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 312-04-2023

Page 4: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 412-04-2023

Page 5: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 512-04-2023

Page 6: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 612-04-2023

Page 7: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 712-04-2023

???

Page 8: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 812-04-2023

EnTagRec

Page 9: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 912-04-2023

EnTagRec TagCombine

r@5 0.805 0.595

p@5 0.346 0.221

r@5 0.815 0.568

p@5 0.358 0.251

r@5 0.88 0.675

p@5 0.369 0.278

r@5 0.64 0.639

p@5 0.382 0.381

Xia et al. MSR’13

Page 10: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 1012-04-2023

EnTagRec TagCombine

r@5 0.805 0.595

p@5 0.346 0.221

r@5 0.815 0.568

p@5 0.358 0.251

r@5 0.88 0.675

p@5 0.369 0.278

r@5 0.64 0.639

p@5 0.382 0.381

Page 11: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

EnTagRec: How have we done it?

Page 12: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

EnTagRec: How have we done it?

L-LDA [Ramage et al. 2009]

tokenization, identifier splitting, stop words, stemming

Page 13: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

EnTagRec: How have we done it?

Page 14: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 1412-04-2023

I have Java daemon which I want to pass shell commands. For example…

P( | )?

Page 15: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 1512-04-2023

Java daemon want pass shell command exampl daemon load configur possibl

P( | )?Actually (preprocessing…)

Page 16: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 1612-04-2023

Java daemon want pass shell command exampl daemon load configur possibl

P( | )?Tags = nouns (phrases)

Page 17: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 1712-04-2023

JavaP( )|daemonP( )|

shellP( )|…

Estimate from the training data

Combine to get P for the entire text

Page 18: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 1812-04-2023

Java daemon want pass shell command exampl daemon load configur possibl

…P( )| …P( )|…P( )|

Page 19: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 1912-04-2023

Supercalifragilisticexpialidocious

Page 20: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 2012-04-2023

Supercalifragilisticexpialidocious

0.85

Page 21: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

EnTagRec: How have we done it?

α*BIC + β*FICTrain α and β

Page 22: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

EnTagRec

• is better than BIC and FIC separately

/ department of mathematics and computer science Page 2212-04-2023

EnTagRec BIC FIC

r@5 0.805 0.565 0.593p@5 0.346 0.232 0.258

r@5 0.815 0.505 0.637p@5 0.358 0.212 0.282

r@5 0.88 0.523 0.713p@5 0.369 0.212 0.298

r@5 0.64 0.391 0.545p@5 0.382 0.230 0.322

Page 23: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

EnTagRec

• is better than BIC and FIC separately• is better than TagCombine

/ department of mathematics and computer science Page 2312-04-2023

Page 24: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 2412-04-2023

So, what was all this about?

Page 25: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 2512-04-2023

Page 26: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 2612-04-2023

Page 27: EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

/ department of mathematics and computer science Page 2712-04-2023