tag-based semantic website recommendation
DESCRIPTION
Presentation of the paper with the same title: http://arxiv.org/abs/1302.1596TRANSCRIPT
Outline
Introduction
Related Work
Problem Definition and Algorithm
Experimental Evaluation
Conclusion
Future Work
Demo
Introduction - Definitions
Tags
non-hierarchical keyword or term
Reasons
categorizing,
memorizing,
archiving
and sharing…
Introduction - Motivation
Dramatic increase in the number of the websites on
the internet
7.14 billion pages
Difficulty in finding and exploring
new websites
Socialbookmarking
Recommendationsystems
Introduction – Turkish Effect
Recommendation systems search within user inputs
Users tend to use their own language on the internet
Turkey is listed as 32nd country in English proficiency
Turkish and English is very different languages!
Introduction – What is
proposed?
Tag-based recommendation system
For Turkish-language
Which is based on similarity, tag weight, tag
popularity;
Where semantic properties of tags are taken into
account
Related Work
Collaborative filtering
Widely accepted
No context!
Topic and pattern extraction
Usage of WordNet
A lexical database for the English language
2 papers are found for Turkish WordNet but no
source
Related Work
Similarity calculation methods
Durao & Dolog (2009) Reference paper
Tag popularity, tag representativeness and tag-
user affinity
Without any semantics analysis, 60 % acceptance
level achieved
Problem Definition
Take inputs
Recommendation
SystemWebsites
and tags
Problem Definition
Provide personal recommendations
Recommendation
SystemWebsites
and tags
Problem Definition
Aim -> User satisfaction
Recommend websites
User wants to use in the future,
Already using and finds interesting
Problem Definition
Challenge -> Different tagging purposes and
expectations
Website Tag Potential Purpose
zaytung.com zaytung Archiving
eksisozluk.comalışkanlık (ENG: habit)
Internet usage habit
evekitap.comücretsiz kargo(ENG: free shipping)
Categorizing
9gag.comeğlenceli (ENG: funny)
Definition
Data are taken from experiment
Algorithm
Steps of the algorithm
Spell-check StemmingSemanticsAnalysis
SimilarityCalculation
Algorithm – Spell-Checking
Spell check on the tags
Add a single letter,
Delete a single letter,
Replace one letter and
Transpose two letters
Estimated tags occur or not in Turkish National Corpus.
Algorithm – Spell-Checking
Correction on URLs
Original URL Corrected URL
https://www.deviantart.com/ deviantart.com
http://www.sahadan.com/Default.aspx sahadan.com
http://www.yemeksepeti.com/AnonymouseDefault.aspx yemeksepeti.com
Data are taken from experiment
Algorithm
Steps of the algorithm
Spell-check StemmingSemanticsAnalysis
SimilarityCalculation
Algorithm – Stemming
Stems of the tags are extracted by removing
suffices.
Data are taken from experiment
Website Original Tag Corrected Tag
facebook.comarkadaşlık
(ENG: friendship)
arkadaş
(ENG: friend)
metu.edu.trmühendislik
(ENG: engineering)
mühendis
(ENG: engineer)
deviantart.comeğlenceli
(ENG: funny)
eğlence
(ENG: fun)
Algorithm
Steps of the algorithm
Spell-check StemmingSemanticsAnalysis
SimilarityCalculation
Algorithm – Semantics Analysis
An open source «Turkish Thesaurus» project
125.022 <Word, Synonym> pairs
Algorithm – Semantics Analysis
Algorithm applied:
for each “tag” in ALL-DATA do:
for each “synonym” of “tag” in SYNONYM-LIST do:
if “synonym” occurs in ALL-DATA then:
add <user, site, “synonym”> to ALL-DATA
Algorithm – Semantics Analysis
User Website Tag
User1 milliyet.com.tr haber (ENG: news)
User2 sabah.com.tr gazete (ENG: newspaper)
Original data (ALL-DATA)
Word Synonym
haber (ENG: news) gazete (ENG: newspaper)
Synonym List (SYNONYM-LIST)
User Website Tag
User1 milliyet.com.tr gazete (ENG: newspaper)
User2 sabah.com.tr haber (ENG: news)
Added data to ALL-DATA
Data are taken from experiment
Algorithm – Semantics Analysis
An environment where all users provide tags and
their potential meanings which other people may
have already used.
Algorithm
Steps of the algorithm
Spell-check StemmingSemanticsAnalysis
SimilarityCalculation
Algorithm – Similarity Calculation
Website
Rating
Tag
Popularity
Tag
Representativeness=
𝑜𝑣𝑒𝑟 𝑎𝑙𝑙 𝑡𝑎𝑔𝑠
x
How much a tag
can represent a
document?
The more used
for document,
the more
representative
How often
this tag is
used?
Algorithm – Similarity Calculation
Similarity
(a,b)
Document
Score (b)= + x
Cosine
Similarity
(a,b)
Tags as vectors,
Cosine similarity
between vectors
Document
Score (a)
Experimental Evaluation
Call forparticipation
Gather websitesand tags
Findrecommendations
Ask for evaluation
Experimental Evaluation
Call forparticipation
Gather websitesand tags
Findrecommendations
Ask forevaluation
www.eksiduyuru.com
Experimental Evaluation
Call forparticipation
Gather websitesand tags
Findrecommendations
Ask forevaluation
bit.ly/oneri-sistemi
25 users 122 websites
366 tags
Experimental Evaluation
Call forparticipation
Gather websitesand tags
Findrecommendations
Ask forevaluation
bit.ly/oneri-sistemi-degerlendirme
20 of 25 Users
Experimental EvaluationExpected Results
80 %50 %Recommendation
Acceptance Excellent
(not expected)Not
acceptable
Experimental EvaluationResults
For top 5
recommendations
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Accepte
d R
ecom
m.
User
Accepted
Accepted recommendations by each user (5 Recommendations)
72%
28%
Accepted Rejected
Experimental EvaluationResults
For top 3
recommendations78%
22%
Accepted Rejected
0
1
2
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Accepte
d R
ecom
m.
User
Accepted
Accepted recommendations by each user (3 Recommendations)
Conclusion
What is presented?
Turkish-language tag-based recommendation
system
Based on similarity, tag weight, tag
popularity
Semantic properties of tags are taken
into account
Conclusion
Main contribution
Combining
Well-known similarity measures and
calculations
Turkish semantics analysis
Conclusion
Evaluation
An experiment with 25 people
Participants provide websites and tags
Then evaluate recommendations
Future Work
Pre-processing Stage
English inputs
Turkish inputs with English letters
Translation or control over them
Site Tags
yandex.com harita, e-mail, arama
Site Tags
eksiduyuru.com duyuru, alinik, satilik
Future Work
Semantic Analysis
Small set of synonyms list
125.022 <word, synonym> pairs
Larger and more comprehensive theasurus
Demo
2 users from experiment
Demo
Website Tags
http://candasdemir.com pazarlama, kişisel, blog
http://radikal.com.tr haber, gündem, güncel
http://mynet.com portal, genel, haber
http://sahibinden.com alışveriş, market, sahibinden
http://markafoni.com moda, e-ticaret, alışveriş
Demo
Website Tags
http://candasdemir.com pazarlama, kişisel, blog
http://radikal.com.tr haber, gündem, güncel
http://mynet.com portal, genel, haber
http://sahibinden.com alışveriş, market, sahibinden
http://markafoni.com moda, e-ticaret, alışveriş
mynet.com bilgi
mynet.com gazete
radikal.com.tr bilgi
radikal.com.tr gazete
Added after
semantics
analysis
Demo
Website User Satisfaction
eksisozluk.com Accepted
zaytung.com Accepted
sabah.com.tr Accepted
ntvmsnbc.comAccepted
golfdunyasi.com.tr Not Accepted
Website
candasdemir.com
radikal.com.tr
mynet.com
sahibinden.com
markafoni.com
Recommended
Websites
User Inputs
Demo
Website Tags
http://www.sahadan.com/Default.aspx eğlence, merak, futbol
http://www.erepublik.com/ iletişim, strateji, oyun
http://www.1907unifeb.org/forums fenerbahçe, sohbet, eğlence
http://ligtv.com.tr/ maç özetleri, haber, futbol
https://www.tuttur.com/ para, futbol, eğlence
Demo
Website Tags
http://www.sahadan.com/Default.aspx eğlence, merak, futbol
http://www.erepublik.com/ iletişim, strateji, oyun
http://www.1907unifeb.org/forums fenerbahçe, sohbet, eğlence
http://ligtv.com.tr/ maç özetleri, haber, futbol
https://www.tuttur.com/ para, futbol, eğlence
ligtv.com.tr bilgi
ligtv.com.tr gazete
Added after
semantics
analysis
Demo
Website User Satisfaction
mackolik.com Accepted
zaytung.com Accepted
9gag.com Accepted
dizi-mag.com Accepted
galatasaray.com.tr Not Accepted
Website
sahadan.com
erepublik.com
1907unifeb.org/forums
ligtv.com.tr
tuttur.com
Recommended
Websites
User Inputs
References
Adrian, B., Sauermann, L., & Roth-berghofer, T. (2007). ConTag: A
Semantic Tag Recommendation System. Proceedings of ISemantics’ 07
Aksan, Y. et al. (2012). Construction of the Turkish National Corpus (TNC).
In Proceedings of the Eight International Conference on Language
Resources and Evaluation (LREC 2012). İstanbul. Turkiye.
http://www.lrec-conf.org/proceedings/lrec2012/papers.html
Brill, E., & Moore, R. C. (2000). An Improved Error Model for Noisy
Channel Spelling Correction. (Microsoft Research)
Cattuto, C., Benz, D., Hotho, A., & Stumme, G. (2008). Semantic
Grounding of Tag Relatedness in Social Bookmarking Systems. In The
Semantic Web - ISWC 2008. 2008: Springer
Durao, F., & Dolog, P. (2009). A Personalized Tag-based Recommendation
in Social Web Systems. International Workshop on Adaptation and
Personalization for Web 2.0
References
Education First, (2012). EF EPI Country Rankings
Frankfurt International School, (2001). The Differences Between English
and Turkish
ISPA (Investment Support and Promotion Agency) of Turkey, (2010).Turkish
Information and Communication Technologies Industry. Deloitte
Nakamoto, R., Nakajima, S., Miyazaki, J., & Uemura, S. (2007). Tag-
based Contextual Collaborative Filtering. IAENG International Journal of
Computer Science
Özbek, A. (2012). Türkçe Eşanlamlı Kelimeler Sözlüğü Projesi (Turkish
Thesaurus Project). Retrieved from http://github.com/maidis/mythes-tr
Thank you!