relevance based ranking of video comments on youtube
TRANSCRIPT
Authors
University Politehnica of Bucharest
Relevance-Based Ranking of Video Comments on YouTube
Andrei ȘerbănoiuTraian Rebedea [email protected]
Overview
• Introduction• Motivation• System architecture • Classification of relevant comments• Ranking of relevant comments• Results• Conclusions
12.04.23 Sesiunea de Licenţe - Iulie 2012 2
Introduction• Text classification and raking for comments on YouTube
videos– First: classification whether the comment is relevant or not
for the given video file– Second: ranking the relevant comments
• Focus on identifying relevant information• Comments have a very small number of words –
sometimes less than 10, on average of the order of tens
• Relevance is evaluated with respect to the information collected from other online sources about the video
12.04.23 CSCS 2013 – Bucharest, Romania 3
Existing research• We have not been able to identify any previous
research in the direction of identifying relevant comments
• YouTube research– Identify the relevant features of community acceptance
(comments with many “likes”)– Extract the sentiment orientation– Differentiate between clean and noisy comments
• Other research– Ranking Comments on the Social Web (uses Digg)
12.04.23 CSCS 2013 – Bucharest, Romania 4
Motivation
• The Police – Every Breath You Take
12.04.23 CSCS 2013 – Bucharest, Romania 5
Motivation
• Most commented video• “10 questions that every intelligent Christian
must answer”• 1,429,425 comments on 30th May 2013 (early
morning)
• How many of these comments are spam?• Which ones would be most relevant to the
video?12.04.23 CSCS 2013 – Bucharest, Romania 6
Solution• Ranking of the comments according to relevance
• Steps:1. Automatically link video with other online sources
relevant to it2. Filter comments to remove noisy comments3. Rank the remaining comments according to
relevance computed using NLP techniques
• Our solution works for music videos
12.04.23 CSCS 2013 – Bucharest, Romania 7
System architecture
12.04.23 CSCS 2013 – Bucharest, Romania 8
Processing pipeline
comments = fetchYouTubeComments();comments = filterComments(comments);commentTopics=createCommentTopics(comments)resources = getResources(wikipedia,allmusic,lyrics);for(int i=0;i<commentTopics.length;i++){
computeRelevance(commentTopics[i], resources);}
12.04.23 CSCS 2013 – Bucharest, Romania 9
Preprocessing
• Comments retrieved with YouTube Data API– Only used last 100 comments per video
• Filter comments not written in English using JLangDetect
• Extracted the main topics for each comment using Mallet => 5 topics per comment
• Expanding the topics with synonyms and hypernyms from WordNet
12.04.23 CSCS 2013 – Bucharest, Romania 10
Pre-classification of comments• Objective: to reduce the number of comments considered for
ranking by identifying noise• Classification based on a neural network by using a set of
simple linguistic features• Multilayered Perceptron implemented in Weka
• Features– Number of non-ASCII characters– Number of capital letters– Number of newlines– Number of digits– Number of trivial and swear-words– Number of words in comment– Average word size– Number of punctuation marks– Common text spam count
12.04.23 CSCS 2013 – Bucharest, Romania 11
Pre-classification of comments
• Trained on a small corpus with 100 relevant comments and 100 noisy comments
• Examples of noisy comments:– "Step 1: Pause this videoStep 2: Google 'Rainymood'Step 3: Click the first linkStep 4: Unpause this videoStep 5: Thumbs? up this comment, enjoy and thankme later"– "Those 3,175 haters listen to? 'Techno'. “– " IF YOU LIKE DIRTY DIANA SONG THE SINGER '' STEFANO GIORGINI '' DID A GREAT? REMAKE
STEFANO IS A VERY GOOD SINGER SONGWRITER I THINK YOU WILL LIKE HIS VERSION JUST LOOK FOR '' STEFANO GIORGINI '' DIRTY DIANA" "
12.04.23 CSCS 2013 – Bucharest, Romania 12
Pre-classification of comments
• Results of pre-classification stage
12.04.23 CSCS 2013 – Bucharest, Romania 13
Type of Instances No. Instances %
Correctly Classified Instances 174 87.46
Incorrectly Classified Instances 26 12.54
Total Number of Instances 200 -
Relevance scoring stage
• Initial approach• Extract topics from comments as previously
mentioned (Mallet + WordNet)
• Fetch Wikipedia articles for artist and song name
• Score computed based in number of appearances of the topics from the comments in the articles
12.04.23 CSCS 2013 – Bucharest, Romania 14
Relevance scoring stage
• Second approach: topic-based scoring• Similar to the previous one, but topics are also
extracted from the Wikipedia articles with Mallet
• Scoring is done based on:– Number of topics extracted from each comment– Wikipedia topic matches for each comment
12.04.23 CSCS 2013 – Bucharest, Romania 15
Relevance scoring stage• Third approach• Multiple-source topic-based scoring
• Additional source added to the Wikipedia articles– Information from allmusic.com website on artists and
songs– Information from song lyrics
• Topics matched between comments and Wikipedia + Allmusic articles, plus exact match of lyrics
• Final relevance score is a weighted sum of the previous factors
12.04.23 CSCS 2013 – Bucharest, Romania 16
Results
12.04.23 CSCS 2013 – Bucharest, Romania 17
Comment Relevance
maybe your friend should know that being english, have a picture in abbey road and "sing" all you need is love" won't make one direction? a group like the beatles...
662
my mom said she doesn't like the beatles and she said that john was only good to look at? not to hear. my dad said, " haha so true!." i'm an orphan now.
968
you shouldn't be listening to the beatles since these seem to turn your friends into enemies! beatles are all about peace!? you are not getting their message!
983
please read this ! hey i know u just wanna listen to the song but i still have to write this hoping someone will see it and that someone will care .i'm a? young musician from croatia so this spam is my only chance to get noticed.please check out my channel and i promise u won't be sorry.i appreciate your time because music means everything to me, thank you! ?
1309
i didn't mean fight other places. i meant focus on the hurt people in your own country first, then expand to the others. if people don't agree with peace that's an opinion. not a fact, and people often take offense to opinions. there isn't? anything to take offense to, they say something that's all it is. they said it, don't put meaning to it. world peace - i meant the whole world having peace there
1639
Results
• Difficult to assess whether the impact of the relevance measure
• Interpreting the comments is subjective – Need human annotators
• The order of the comments is completely different from the one presented now on YouTube (correlation lower than 0.031 for the first 100 comments)
• Method 1 is also not correlated with the other two methods
• Methods 2 and 3 have a higher correlation: 0.124
12.04.23 CSCS 2013 – Bucharest, Romania 18
Conclusions• 2-stage method for ranking comments on YouTube• The first stage removes noisy comments• The second stage tries to link the comments with
information from other web pages relevant for the video
• Relevance is computed based on topic-modeling with Mallet
• • Results are encouraging, but need to find a more
rigorous method of assessing them• Results are better than the usual results provided by
YouTube, however the processing time for each video should not be neglected
12.04.23 CSCS 2013 – Bucharest, Romania 19
Thank you!
• Questions?
• Discussion
12.04.23 CSCS 2013 – Bucharest, Romania 20