semantic recommandation sytems for research 2.0
TRANSCRIPT
SEMANTIC RECOMMENDATION SYSTEMS
FOR RESEARCH 2.0OR
A Conceptual Prototype for a Twitter based Recommender System for Research 2.0
by Patrick Thonhauser
Thursday, October 11, 12
OUTLINE
• Motivation
• Basics (Semantic Web, Recommender Systems, Natural Language Processing)
• Conceptual Prototype
• Test results and Discussion
• Questions
Thursday, October 11, 12
MOTIVATION
• Is Twitter useful for discovering new connections between researchers in similar subject areas (and why Twitter)?
• How much information can we extract form 140 character strings?
• Is it possible to separate useful information from noise?
• Are there any appropriate classifiers and metrics to measure the significance of Twitter users and Tweets?
Thursday, October 11, 12
SEMANTIC WEB
• Additional Layer of Information
• Linked Data (use URIs as names, use HTTP URIs, use standards to provide Information, include links to other URIs)
• RDF (based on triples -> subject, predicate, object) is like HTML for the classic web
• Nearly all semantic web standards are based on RDF (like FOAF - Friend of a Friend Project)
Thursday, October 11, 12
RECOMMENDER SYSTEMS
• Collaborative Filtering (user based/ item based)
• Content Based Recommendation
• Knowledge Based Recommendation
• Hybrid Recommendations
Thursday, October 11, 12
NATURAL LANGUAGE PROCESSING (NLP)
• Classification of Microtext Artefacts (This presentation is killer!)
• Applying NLP - Pipelines
• End of Sentence Detection
• Tokenization
• POS Tagging
• Chunking
• Extraction
Thursday, October 11, 12
THE CONCEPT OF THOUGHT
BUBBLES
Let’s imagine every Twitter user belongs to several
different topic related Bubbles
Thursday, October 11, 12
• A user is part of topic related bubbles
• Twitter users within topic related bubbles don’t necessarily know each other
• Connections of already existing connections of the service user lead to new information
• Non bidirectional connections preferred
LET’S SUMMARIZE
So how can we find such potentially interesting users?
Thursday, October 11, 12
PROOF OF CONCEPT SYSTEM(1) Preselection of user set, which will
be analyzed in depth
(2) Apply NLP-Pipeline for measuring user similarity
(3) Categorize the top-n best scoring users according to the idea of Thought Bubbles
(4) Recommend top-n best scoring users of a category to the user
(5) Analyze acceptance of recommendations
IOS DEV
SOCIAL MEDIA
SPORTS
SERVICE USER
REST API
THOUGHT
BUBBLES API
NLP PRE-
FILTERING
CATEGORISATION
CLUSTERING
ANALYZE RECS
SERVER
A USERS THOUGHT BUBBLE
DB
Thursday, October 11, 12
Friends of Friends Twitter
Accounts
Filter accounts that are already connected to you
Filter accounts where: follower_count < 300 status_count < 1000
Filter non English speakingaccounts
Filter Filter FilterIdentifiy People
by using a simple NLP Pipeline
Set of Twitter accounts for further processing
(1) PRE-SELECTION/FILTERING
• The set of friends of friend‘s Twitter accounts changes from iteration to iteration
• Filters are added after analyzing the acceptance of recommendations
Thursday, October 11, 12
@testuser The grand jury
commented on a number of…
POS tagging
Tokenization and stripping
@mentions and URLs
[('The', 'AT'), ('grand', 'JJ'), ('jury', 'NN'),
('commented', 'VBD'), ('on', 'IN'),
('a', 'AT'), ('number',
'NN'), ... ('.', '.')]
Raw Tweets
Chunking
Neglect 200 most used English
wordsPOS tagged Tweets
[('jury', 'NN'), 'number',
'NN'), ('social dayly',
'NP'), ...]
Mined nouns and phrases
Frequency Distribution
[('jury', 34), ('social', 23), ('test case',
16), ...]
Filter top n words
DB
Set of Frequency Distributed mined nouns and phrases
(2) NLP PIPELINE
400 most recent Tweets of a potential recommendation are used for calculating the similarity measure
Thursday, October 11, 12
•Calculate top-n users by applying Single-Linkage-Clustering
•Categorize if user belongs to user specific bubbles
•Present recommendation lists to users
•Analyze acceptance of recommendations (connect user accounts with FOAF) and add new filter predicate if necessary.
Thursday, October 11, 12
SUPERVISED TEST RUN
@gargamit100*@selvers*
@UpsideLearning*@poposkidimitar*
@jkalten*@cpappas*@pfidalgo1*
@timbuckteeth*@starsandrobots*
@TheJ Russ@cliveshepherd*
@Microsoft@jtcobb*
@MichaelPhelps@SebastianThrun*
@elearning*@elvaandrade
@BarackObama@SteveVictor
@AnwarRichardson@pabaker55*
@jamesmclynn@DrEvanHarris
@mstrohm*@AmyFrearson
@gekitz@Hhaitch@sclater*
@TheRock@MCeraWeakBaby
@fatcharlesh@FrankViola@timbarker
@AnnaOscarsson@WithDrake
sabrinaVanessa@charliesheen
@WWEDanielBryan@cmccosky
@kaitlyntrigger@judithsei*
@atsc*@melaniedaveid
@Emmadw*@ladygaga
@marcusfairs@lucyheartsTW
@PeterSmith@MikeVick
@meadd cameron0 0.075 0.150 0.225 0.300
recommendations are framed
Thursday, October 11, 12
UNSUPERVISED TEST RESULTS
The probability that a recommended item is relevant is 64.4%. Standard Derivation: 31.5%
Thursday, October 11, 12
DISCUSSIONTwitter IS useful for discovering new information in sense of Research 2.0 but:
• Recommendations reflect the Twitter behavior of the user
• Automated tweets harm recommendation results (one sentence gets an enormous weight because it occurs very very often)
• Twitter‘s request limitation is a show stopper
• Comparison to similar systems (Content and collaborative filtering)
Thursday, October 11, 12
THANK YOU!ANY QUESTIONS?
Thursday, October 11, 12