entagrec: an enhanced tag recommendation system for software information sites
DESCRIPTION
Software engineers share experiences with modern technologies by means of software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged. To improve the quality of tags in software information sites, we propose EnTagRec, an automatic tag recommender based on historical tag assignments to software objects and we evaluate its performance on four software information sites, StackOverflow, AskUbuntu, AskDifferent and FreeCode. We observe that that EnTagRec achieves Recall@5 scores of 0.805, 0.815, 0.88 and 0.64, and Recall@10 scores of 0.868, 0.876, 0.944 and 0.753, on StackOverflow, AskUbuntu, AskDifferent and FreeCode, respectively. In terms of Recall@5 and Recall@10, averaging across the 4 datasets, EnTagRec improves TagCombine, which is the state of the art approach, by 27.3\% and 12.9\% respectively.TRANSCRIPT
EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites
Shaowei Wang, David Lo,
Bogdan Vasilescu, Alexander Serebrenik
@b_vasilescu @aserebrenik
/ department of mathematics and computer science Page 212-04-2023
/ department of mathematics and computer science Page 312-04-2023
/ department of mathematics and computer science Page 412-04-2023
/ department of mathematics and computer science Page 512-04-2023
/ department of mathematics and computer science Page 612-04-2023
/ department of mathematics and computer science Page 712-04-2023
???
/ department of mathematics and computer science Page 812-04-2023
EnTagRec
/ department of mathematics and computer science Page 912-04-2023
EnTagRec TagCombine
r@5 0.805 0.595
p@5 0.346 0.221
r@5 0.815 0.568
p@5 0.358 0.251
r@5 0.88 0.675
p@5 0.369 0.278
r@5 0.64 0.639
p@5 0.382 0.381
Xia et al. MSR’13
/ department of mathematics and computer science Page 1012-04-2023
EnTagRec TagCombine
r@5 0.805 0.595
p@5 0.346 0.221
r@5 0.815 0.568
p@5 0.358 0.251
r@5 0.88 0.675
p@5 0.369 0.278
r@5 0.64 0.639
p@5 0.382 0.381
EnTagRec: How have we done it?
EnTagRec: How have we done it?
L-LDA [Ramage et al. 2009]
tokenization, identifier splitting, stop words, stemming
EnTagRec: How have we done it?
/ department of mathematics and computer science Page 1412-04-2023
I have Java daemon which I want to pass shell commands. For example…
P( | )?
/ department of mathematics and computer science Page 1512-04-2023
Java daemon want pass shell command exampl daemon load configur possibl
P( | )?Actually (preprocessing…)
/ department of mathematics and computer science Page 1612-04-2023
Java daemon want pass shell command exampl daemon load configur possibl
P( | )?Tags = nouns (phrases)
/ department of mathematics and computer science Page 1712-04-2023
JavaP( )|daemonP( )|
shellP( )|…
Estimate from the training data
Combine to get P for the entire text
/ department of mathematics and computer science Page 1812-04-2023
Java daemon want pass shell command exampl daemon load configur possibl
…P( )| …P( )|…P( )|
/ department of mathematics and computer science Page 1912-04-2023
Supercalifragilisticexpialidocious
/ department of mathematics and computer science Page 2012-04-2023
Supercalifragilisticexpialidocious
0.85
EnTagRec: How have we done it?
α*BIC + β*FICTrain α and β
EnTagRec
• is better than BIC and FIC separately
/ department of mathematics and computer science Page 2212-04-2023
EnTagRec BIC FIC
r@5 0.805 0.565 0.593p@5 0.346 0.232 0.258
r@5 0.815 0.505 0.637p@5 0.358 0.212 0.282
r@5 0.88 0.523 0.713p@5 0.369 0.212 0.298
r@5 0.64 0.391 0.545p@5 0.382 0.230 0.322
EnTagRec
• is better than BIC and FIC separately• is better than TagCombine
/ department of mathematics and computer science Page 2312-04-2023
/ department of mathematics and computer science Page 2412-04-2023
So, what was all this about?
/ department of mathematics and computer science Page 2512-04-2023
/ department of mathematics and computer science Page 2612-04-2023
/ department of mathematics and computer science Page 2712-04-2023