towards twitter hashtag recommendation using distributed word representations and a deep feed...
DESCRIPTION
Towards Twitter hashtag recommendation using distributed word representations and a deep feed forward neural networkTRANSCRIPT
ELIS – Multimedia Lab
Towards Twitter Hashtag Recommendation Using Distributed Word Representations and a Deep Feed
Forward Neural Network
CSSC-2014New Delhi, 24 September 2014
Abhineshwar Tomar, Frederic Godin, Baptist Vandersmissen, Wesley De Neve, Rik Van de Walle
Multimedia Lab, Ghent University – iMinds, BelgiumImage and Video Systems Lab, KAIST, South Korea
2
ELIS – Multimedia Lab
Introduction Goal Motivation Methodology Results Conclusion Future work
Overview
3
ELIS – Multimedia Lab
Introduction Goal Motivation Methodology Results Conclusion Future work
Overview
4
ELIS – Multimedia Lab
• An online social network service that enables users to send and read short 140-character text messages, called "tweets" or "microposts"
Tweet ormicropostRetweet
(sharing)
Favorite(like or
bookmark)
Mention(starts with @)
Hashtag(starts with #)
5
ELIS – Multimedia Lab
Note the presence of both textual and (embedded) visual information!
Famous Tweets
6
ELIS – Multimedia Lab
• Usage in general- 271 million monthly active users- 500 million Tweets are sent per day
• Hashtags- Only 8% of the tweets contain hashtags- 3% of the hashtags are used more than 5 times
Twitter Statistics
7
ELIS – Multimedia Lab
Hashtags on Twitter
Hashtag usage:- topic-based indexing & search
• #socialnetwork• #Reddit
- conversational/event clustering• #www2014
Observation: only 8% of tweets contain a hashtag
8
ELIS – Multimedia Lab
Introduction Goal Why Methodology Results Conclusion Future work
Overview
9
ELIS – Multimedia Lab
Generate hashtags that adhere to the semantic and linguistic regularity of a tweet
Goal
10
ELIS – Multimedia Lab
Introduction Goal Motivation Methodology Results Conclusion Future work
Overview
11
ELIS – Multimedia Lab
• Hashtags- Content categorization and discovery- Effective search of tweets
• Our approach- Connect similar hashtags (topics)- Promote the use of hashtags
• By understanding the semantics of the tweet
Why
12
ELIS – Multimedia Lab
Introduction Goal Motivation Methodology Results Conclusion Future work
Overview
13
ELIS – Multimedia Lab
• Preprocessing- Remove non-English words- Remove non-ASCII characters- Remove mentions (@USER)- Remove URLs- Remove RT @ from retweets
• Feature vector generation
• Training of a feed forward neural network
• Evaluation
Methodology (1/3)
14
ELIS – Multimedia Lab
• Training: learning the relation between tweets and hashtags
Methodology (2/3)
300-D tweet vector
word2vec
300-D hashtag vector
word2vec
Deep feed-forward neural
network
300-D input layer1000-D hidden layer500-D hidden layer400-D hidden layer300-D output layer
Tweet HashtagElizabeth Warren Taking on Hillary as New Democratic Powerhouse
#politics
15
ELIS – Multimedia Lab
• Testing: recommending hashtags to tweets
Methodology (3/3)
300-D tweet vector
word2vec
300-D hashtag vector
Deep feed-forward neural
network
300-D input layer1000-D hidden layer500-D hidden layer400-D hidden layer300-D output layer
TweetHouse Democrats suggestObama impeachment isimminent to raise cash
vec2word
HashtagHashtag
HashtagHashtags
#politics#crisis
16
ELIS – Multimedia Lab
• Developed by Google Research
• Computes vector representations for words- Through the use of neural network technology
• Trained on part of the Google News dataset (+/- 100 billion words)• The model contains vectors for 3 million words and phrases
- Capture the semantic meaning of a word
• Example word vector properties- vector('Paris') - vector('France') + vector('Italy') ≈ vector('Rome')- vector('king') - vector('man') + vector('woman') ≈ vector('queen')
word2vec
17
ELIS – Multimedia Lab
Introduction Goal Motivation Methodology Results Conclusion Future work
Overview
18
ELIS – Multimedia Lab
Tweet Recommended hashtags
1 Someone dm/text me bc I’m so bored madd, Oh noes, rainnwilson, sooooooo, fricken
2 The good life is one inspired by love and guided by knowledge.
Ahh yes, FIVE THINGS About, YANKEES TALK, Kinder gentler,Ya gotta love
3 Method of Losing Weight http://t.co/rs64CEuo5W Shape Shifting, Treat Acne, Detect Cancer, Warps, Calorie Burn
4 I hate today cause its room cleaning day for me!!! FAN ’S ATTIC, Puh leez, Mopping robot, % #F######## 3v.jsn, InterestEURO JAP
5 SPELLS AND SPELL-CASTING:ENCYCLOPEDIA OF 5000 SPELLS ( JUDIKA ILLES ):BLACKSMITH’S WATER HEALING SPELL: A... http://t.co/k0TfrqJFQW
DEBUTS NEW, NOW AVAILABLE FOR, TO PUBLISH, DESIGNED TO,IS READY TO
Results (1/3)
19
ELIS – Multimedia Lab
Results (2/3)
20
ELIS – Multimedia Lab
Top-k recommendation Hit-rate
She et al. Our approach1 Top-5 82% 83.33%2 Top-10 89% 86.67%
Results (3/3)
21
ELIS – Multimedia Lab
Introduction Goal Motivation Methodology Results Conclusion Future work
Overview
22
ELIS – Multimedia Lab
Conclusion
• Introduced a novel approach for hashtag recommendation, using distributed word representations and a feed forward neural network
• Learns semantic and linguistic regularities without requiring careful feature engineering
• Can easily take advantage of temporal information
• Supports the automatic creation of new hashtags/trends
23
ELIS – Multimedia Lab
Introduction Goal Motivation Methodology Results Conclusion Future work
Overview
24
ELIS – Multimedia Lab
Future Work
• Use of more than four days of data
• Use word representations from different data sources
• Investigate impact of the quality of the word representations created
• Investigate impact of the use of DBpedia and Freebase
ELIS – Multimedia Lab