entagrec: an enhanced tag recommendation system for software information sites

Post on 29-May-2015

325 Views

Category:

Science

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

Software engineers share experiences with modern technologies by means of software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged. To improve the quality of tags in software information sites, we propose EnTagRec, an automatic tag recommender based on historical tag assignments to software objects and we evaluate its performance on four software information sites, StackOverflow, AskUbuntu, AskDifferent and FreeCode. We observe that that EnTagRec achieves Recall@5 scores of 0.805, 0.815, 0.88 and 0.64, and Recall@10 scores of 0.868, 0.876, 0.944 and 0.753, on StackOverflow, AskUbuntu, AskDifferent and FreeCode, respectively. In terms of Recall@5 and Recall@10, averaging across the 4 datasets, EnTagRec improves TagCombine, which is the state of the art approach, by 27.3\% and 12.9\% respectively.

TRANSCRIPT

EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

Shaowei Wang, David Lo,

Bogdan Vasilescu, Alexander Serebrenik

@b_vasilescu @aserebrenik

/ department of mathematics and computer science Page 212-04-2023

/ department of mathematics and computer science Page 312-04-2023

/ department of mathematics and computer science Page 412-04-2023

/ department of mathematics and computer science Page 512-04-2023

/ department of mathematics and computer science Page 612-04-2023

/ department of mathematics and computer science Page 712-04-2023

???

/ department of mathematics and computer science Page 812-04-2023

EnTagRec

/ department of mathematics and computer science Page 912-04-2023

EnTagRec TagCombine

r@5 0.805 0.595

p@5 0.346 0.221

r@5 0.815 0.568

p@5 0.358 0.251

r@5 0.88 0.675

p@5 0.369 0.278

r@5 0.64 0.639

p@5 0.382 0.381

Xia et al. MSR’13

/ department of mathematics and computer science Page 1012-04-2023

EnTagRec TagCombine

r@5 0.805 0.595

p@5 0.346 0.221

r@5 0.815 0.568

p@5 0.358 0.251

r@5 0.88 0.675

p@5 0.369 0.278

r@5 0.64 0.639

p@5 0.382 0.381

EnTagRec: How have we done it?

EnTagRec: How have we done it?

L-LDA [Ramage et al. 2009]

tokenization, identifier splitting, stop words, stemming

EnTagRec: How have we done it?

/ department of mathematics and computer science Page 1412-04-2023

I have Java daemon which I want to pass shell commands. For example…

P( | )?

/ department of mathematics and computer science Page 1512-04-2023

Java daemon want pass shell command exampl daemon load configur possibl

P( | )?Actually (preprocessing…)

/ department of mathematics and computer science Page 1612-04-2023

Java daemon want pass shell command exampl daemon load configur possibl

P( | )?Tags = nouns (phrases)

/ department of mathematics and computer science Page 1712-04-2023

JavaP( )|daemonP( )|

shellP( )|…

Estimate from the training data

Combine to get P for the entire text

/ department of mathematics and computer science Page 1812-04-2023

Java daemon want pass shell command exampl daemon load configur possibl

…P( )| …P( )|…P( )|

/ department of mathematics and computer science Page 1912-04-2023

Supercalifragilisticexpialidocious

/ department of mathematics and computer science Page 2012-04-2023

Supercalifragilisticexpialidocious

0.85

EnTagRec: How have we done it?

α*BIC + β*FICTrain α and β

EnTagRec

• is better than BIC and FIC separately

/ department of mathematics and computer science Page 2212-04-2023

EnTagRec BIC FIC

r@5 0.805 0.565 0.593p@5 0.346 0.232 0.258

r@5 0.815 0.505 0.637p@5 0.358 0.212 0.282

r@5 0.88 0.523 0.713p@5 0.369 0.212 0.298

r@5 0.64 0.391 0.545p@5 0.382 0.230 0.322

EnTagRec

• is better than BIC and FIC separately• is better than TagCombine

/ department of mathematics and computer science Page 2312-04-2023

/ department of mathematics and computer science Page 2412-04-2023

So, what was all this about?

/ department of mathematics and computer science Page 2512-04-2023

/ department of mathematics and computer science Page 2612-04-2023

/ department of mathematics and computer science Page 2712-04-2023

top related