notube: pattern-based recommendations (part 1)
TRANSCRIPT
WP 3User profiling and
Recommenda5on (Part 1)BBC, Pro-‐ne+cs, VUA
1
Wednesday, March 28, 12
Contents
26-27 March 2012 2NoTube 3rd Review
Overview
User profilingGeneral goal & approachFrom activity streams to profileIssuesAnalyticsBeancounter
RecommendationsGeneral goal & approachSemantic recommendationStatistical recommendationHybrid recommendation
Exploitation
Conclusions
Wednesday, March 28, 12
Overview
26-27 March 2012 3
TV Program Enrichment
SemanticPattern-based
Recommendation Strategy
RDF GraphTV
Programs
Semantic ContentPatterns for
TV Programs
HybridRecommendation
Strategy
StatisticalSimilarity-based
Recommendation StrategyUser Ratings &
Demographics(BBC EPG
Data)
EPG Metadata(BBC)
Recommendation Service
SimilarityClusters
of Programs
User Data Analysis
End-UsersEnd Users
NoTube 3rd Review
Wednesday, March 28, 12
Overview
26-27 March 2012 3
TV Program Enrichment
SemanticPattern-based
Recommendation Strategy
RDF GraphTV
Programs
Semantic ContentPatterns for
TV Programs
HybridRecommendation
Strategy
StatisticalSimilarity-based
Recommendation StrategyUser Ratings &
Demographics(BBC EPG
Data)
EPG Metadata(BBC)
Recommendation Service
SimilarityClusters
of Programs
User Data Analysis
End-UsersEnd Users
BEANCOUNTER
NoTube 3rd Review
Wednesday, March 28, 12
User profiling approach
26-27 March 2012 4NoTube 3rd Review
users’ interests and behaviours could be inferred from their activities on the Social Web
• from tweets,• liked facebook resources,• song listened• ...
interests in topics are represented using Linked Data web identifiers
• to access a wealth of open and machine-readable data• to publish profiles in compliance with the LOD paradigm• to leverage on the graph-based model of such data sets
Wednesday, March 28, 12
User profiling: Challenge
26-27 March 2012 5NoTube 3rd Review
main challenge: extracting meaningful data from different sources of user activities
to produce LOD identifiers from activities:• “follow-your-nose”, record-linkage based approach• semantic-annotation-based approach, NLP techniques on raw text
interests are weighted to represent their descriptiveness user profiles are syndicated using JSON, JSON-P and RDF
Wednesday, March 28, 12
26-27 March 2012 6NoTube 3rd Review
User profiling: Follow-your-nose
facebook.com/pages/Shoeshine/ dbpedia.org/resource/
“follow-your-nose”, record-linkage based
record linkage is “the problem of recognising those records in two files which represent identical persons, objects or events
(said to be matched).”
we adopted a text retrieval version, incremental constrained multiple text searches
Wednesday, March 28, 12
26-27 March 2012 7NoTube 3rd Review
User profiling: Semantic Annotation
for some activities the “follow-your-noise” approach is not suitable
Tweet, or text resources need Natural Language Processing techniques
• semantic annotation using LUpedia (WP4)
lookup for LOD identifiers from:
• tweet text• #hashtags definitions • linked Web pages
Wednesday, March 28, 12
26-27 March 2012 8NoTube 3rd Review
User profiling: Semantic Annotation
Wednesday, March 28, 12
26-27 March 2012 8NoTube 3rd Review
User profiling: Semantic Annotation
Bubbles Devere is the best thing ever. #littlebritain
Wednesday, March 28, 12
26-27 March 2012 8NoTube 3rd Review
User profiling: Semantic Annotation
Bubbles Devere is the best thing ever. #littlebritain
Brilliant british humor by Matt Lucas & David Walliams - whole range of facinating characters portraying diversity of british society
Wednesday, March 28, 12
26-27 March 2012 8NoTube 3rd Review
User profiling: Semantic Annotation
Bubbles Devere is the best thing ever. #littlebritain
Brilliant british humor by Matt Lucas & David Walliams - whole range of facinating characters portraying diversity of british society
http://dbpedia.org/resource/Matt_Lucashttp://dbpedia.org/resource/David_Walliams
WP4 Enrichment
Wednesday, March 28, 12
26-27 March 2012 9NoTube 3rd Review
User profiling: Issues
non-deterministic record-linkage and semantic annotation could introduce noise
• noisy data leads to misleading profiles• recommendations could be affected
hence, we introduced interest weights
• to minimise the effect of potential noise eliminating poorly descriptive interests giving them lower weights
• to represent the evolution of a single interest recurring interest over time gain more weights
Wednesday, March 28, 12
26-27 March 2012 10NoTube 3rd Review
Analytics
“people are usually interested in information about themselves”
from Doppler annual report
Wednesday, March 28, 12
26-27 March 2012 11NoTube 3rd Review
NoTube Beancounter
The User profiling and analytics components has been lovingly called “Beancounter” since the early days
built on top of experience and experiments made during the 3 years of the project
a scalable, activity-streams-oriented set of processes
• filtering, slicing, fast key lookups• many analysis are really just “counting the beans”• analysis deserves an high performance architecture
Wednesday, March 28, 12
26-27 March 2012 12NoTube 3rd Review
NoTube Beancounter
REST platform
crawler
analysis engine
key value
{activities
{analysis
{profilesprofiler
Wednesday, March 28, 12
Acknowledgements
26-27 March 2012 13NoTube 3rd Review
Wednesday, March 28, 12