tag-based semantic website recommendation

47
«Tag-based Semantic Website Recommendation for Turkish Language» Onur Yılmaz [email protected]

Upload: onur-yilmaz

Post on 22-May-2015

838 views

Category:

Education


4 download

DESCRIPTION

Presentation of the paper with the same title: http://arxiv.org/abs/1302.1596

TRANSCRIPT

Page 1: Tag-based Semantic Website Recommendation

«Tag-based Semantic

Website Recommendation

for Turkish Language»

Onur Yılmaz

[email protected]

Page 2: Tag-based Semantic Website Recommendation

Outline

Introduction

Related Work

Problem Definition and Algorithm

Experimental Evaluation

Conclusion

Future Work

Demo

Page 3: Tag-based Semantic Website Recommendation

Introduction - Definitions

Tags

non-hierarchical keyword or term

Reasons

categorizing,

memorizing,

archiving

and sharing…

Page 4: Tag-based Semantic Website Recommendation

Introduction - Motivation

Dramatic increase in the number of the websites on

the internet

7.14 billion pages

Difficulty in finding and exploring

new websites

Socialbookmarking

Recommendationsystems

Page 5: Tag-based Semantic Website Recommendation

Introduction – Turkish Effect

Recommendation systems search within user inputs

Users tend to use their own language on the internet

Turkey is listed as 32nd country in English proficiency

Turkish and English is very different languages!

Page 6: Tag-based Semantic Website Recommendation

Introduction – What is

proposed?

Tag-based recommendation system

For Turkish-language

Which is based on similarity, tag weight, tag

popularity;

Where semantic properties of tags are taken into

account

Page 7: Tag-based Semantic Website Recommendation

Related Work

Collaborative filtering

Widely accepted

No context!

Topic and pattern extraction

Usage of WordNet

A lexical database for the English language

2 papers are found for Turkish WordNet but no

source

Page 8: Tag-based Semantic Website Recommendation

Related Work

Similarity calculation methods

Durao & Dolog (2009) Reference paper

Tag popularity, tag representativeness and tag-

user affinity

Without any semantics analysis, 60 % acceptance

level achieved

Page 9: Tag-based Semantic Website Recommendation

Problem Definition

Take inputs

Recommendation

SystemWebsites

and tags

Page 10: Tag-based Semantic Website Recommendation

Problem Definition

Provide personal recommendations

Recommendation

SystemWebsites

and tags

Page 11: Tag-based Semantic Website Recommendation

Problem Definition

Aim -> User satisfaction

Recommend websites

User wants to use in the future,

Already using and finds interesting

Page 12: Tag-based Semantic Website Recommendation

Problem Definition

Challenge -> Different tagging purposes and

expectations

Website Tag Potential Purpose

zaytung.com zaytung Archiving

eksisozluk.comalışkanlık (ENG: habit)

Internet usage habit

evekitap.comücretsiz kargo(ENG: free shipping)

Categorizing

9gag.comeğlenceli (ENG: funny)

Definition

Data are taken from experiment

Page 13: Tag-based Semantic Website Recommendation

Algorithm

Steps of the algorithm

Spell-check StemmingSemanticsAnalysis

SimilarityCalculation

Page 14: Tag-based Semantic Website Recommendation

Algorithm – Spell-Checking

Spell check on the tags

Add a single letter,

Delete a single letter,

Replace one letter and

Transpose two letters

Estimated tags occur or not in Turkish National Corpus.

Page 15: Tag-based Semantic Website Recommendation

Algorithm – Spell-Checking

Correction on URLs

Original URL Corrected URL

https://www.deviantart.com/ deviantart.com

http://www.sahadan.com/Default.aspx sahadan.com

http://www.yemeksepeti.com/AnonymouseDefault.aspx yemeksepeti.com

Data are taken from experiment

Page 16: Tag-based Semantic Website Recommendation

Algorithm

Steps of the algorithm

Spell-check StemmingSemanticsAnalysis

SimilarityCalculation

Page 17: Tag-based Semantic Website Recommendation

Algorithm – Stemming

Stems of the tags are extracted by removing

suffices.

Data are taken from experiment

Website Original Tag Corrected Tag

facebook.comarkadaşlık

(ENG: friendship)

arkadaş

(ENG: friend)

metu.edu.trmühendislik

(ENG: engineering)

mühendis

(ENG: engineer)

deviantart.comeğlenceli

(ENG: funny)

eğlence

(ENG: fun)

Page 18: Tag-based Semantic Website Recommendation

Algorithm

Steps of the algorithm

Spell-check StemmingSemanticsAnalysis

SimilarityCalculation

Page 19: Tag-based Semantic Website Recommendation

Algorithm – Semantics Analysis

An open source «Turkish Thesaurus» project

125.022 <Word, Synonym> pairs

Page 20: Tag-based Semantic Website Recommendation

Algorithm – Semantics Analysis

Algorithm applied:

for each “tag” in ALL-DATA do:

for each “synonym” of “tag” in SYNONYM-LIST do:

if “synonym” occurs in ALL-DATA then:

add <user, site, “synonym”> to ALL-DATA

Page 21: Tag-based Semantic Website Recommendation

Algorithm – Semantics Analysis

User Website Tag

User1 milliyet.com.tr haber (ENG: news)

User2 sabah.com.tr gazete (ENG: newspaper)

Original data (ALL-DATA)

Word Synonym

haber (ENG: news) gazete (ENG: newspaper)

Synonym List (SYNONYM-LIST)

User Website Tag

User1 milliyet.com.tr gazete (ENG: newspaper)

User2 sabah.com.tr haber (ENG: news)

Added data to ALL-DATA

Data are taken from experiment

Page 22: Tag-based Semantic Website Recommendation

Algorithm – Semantics Analysis

An environment where all users provide tags and

their potential meanings which other people may

have already used.

Page 23: Tag-based Semantic Website Recommendation

Algorithm

Steps of the algorithm

Spell-check StemmingSemanticsAnalysis

SimilarityCalculation

Page 24: Tag-based Semantic Website Recommendation

Algorithm – Similarity Calculation

Website

Rating

Tag

Popularity

Tag

Representativeness=

𝑜𝑣𝑒𝑟 𝑎𝑙𝑙 𝑡𝑎𝑔𝑠

x

How much a tag

can represent a

document?

The more used

for document,

the more

representative

How often

this tag is

used?

Page 25: Tag-based Semantic Website Recommendation

Algorithm – Similarity Calculation

Similarity

(a,b)

Document

Score (b)= + x

Cosine

Similarity

(a,b)

Tags as vectors,

Cosine similarity

between vectors

Document

Score (a)

Page 26: Tag-based Semantic Website Recommendation

Experimental Evaluation

Call forparticipation

Gather websitesand tags

Findrecommendations

Ask for evaluation

Page 27: Tag-based Semantic Website Recommendation

Experimental Evaluation

Call forparticipation

Gather websitesand tags

Findrecommendations

Ask forevaluation

www.eksiduyuru.com

Page 28: Tag-based Semantic Website Recommendation

Experimental Evaluation

Call forparticipation

Gather websitesand tags

Findrecommendations

Ask forevaluation

bit.ly/oneri-sistemi

25 users 122 websites

366 tags

Page 29: Tag-based Semantic Website Recommendation

Experimental Evaluation

Call forparticipation

Gather websitesand tags

Findrecommendations

Ask forevaluation

bit.ly/oneri-sistemi-degerlendirme

20 of 25 Users

Page 30: Tag-based Semantic Website Recommendation

Experimental EvaluationExpected Results

80 %50 %Recommendation

Acceptance Excellent

(not expected)Not

acceptable

Page 31: Tag-based Semantic Website Recommendation

Experimental EvaluationResults

For top 5

recommendations

0

1

2

3

4

5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Accepte

d R

ecom

m.

User

Accepted

Accepted recommendations by each user (5 Recommendations)

72%

28%

Accepted Rejected

Page 32: Tag-based Semantic Website Recommendation

Experimental EvaluationResults

For top 3

recommendations78%

22%

Accepted Rejected

0

1

2

3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Accepte

d R

ecom

m.

User

Accepted

Accepted recommendations by each user (3 Recommendations)

Page 33: Tag-based Semantic Website Recommendation

Conclusion

What is presented?

Turkish-language tag-based recommendation

system

Based on similarity, tag weight, tag

popularity

Semantic properties of tags are taken

into account

Page 34: Tag-based Semantic Website Recommendation

Conclusion

Main contribution

Combining

Well-known similarity measures and

calculations

Turkish semantics analysis

Page 35: Tag-based Semantic Website Recommendation

Conclusion

Evaluation

An experiment with 25 people

Participants provide websites and tags

Then evaluate recommendations

Page 36: Tag-based Semantic Website Recommendation

Future Work

Pre-processing Stage

English inputs

Turkish inputs with English letters

Translation or control over them

Site Tags

yandex.com harita, e-mail, arama

Site Tags

eksiduyuru.com duyuru, alinik, satilik

Page 37: Tag-based Semantic Website Recommendation

Future Work

Semantic Analysis

Small set of synonyms list

125.022 <word, synonym> pairs

Larger and more comprehensive theasurus

Page 38: Tag-based Semantic Website Recommendation

Demo

2 users from experiment

Page 39: Tag-based Semantic Website Recommendation

Demo

Website Tags

http://candasdemir.com pazarlama, kişisel, blog

http://radikal.com.tr haber, gündem, güncel

http://mynet.com portal, genel, haber

http://sahibinden.com alışveriş, market, sahibinden

http://markafoni.com moda, e-ticaret, alışveriş

[email protected]

Page 40: Tag-based Semantic Website Recommendation

Demo

Website Tags

http://candasdemir.com pazarlama, kişisel, blog

http://radikal.com.tr haber, gündem, güncel

http://mynet.com portal, genel, haber

http://sahibinden.com alışveriş, market, sahibinden

http://markafoni.com moda, e-ticaret, alışveriş

mynet.com bilgi

mynet.com gazete

radikal.com.tr bilgi

radikal.com.tr gazete

[email protected]

Added after

semantics

analysis

Page 41: Tag-based Semantic Website Recommendation

Demo

Website User Satisfaction

eksisozluk.com Accepted

zaytung.com Accepted

sabah.com.tr Accepted

ntvmsnbc.comAccepted

golfdunyasi.com.tr Not Accepted

[email protected]

Website

candasdemir.com

radikal.com.tr

mynet.com

sahibinden.com

markafoni.com

Recommended

Websites

User Inputs

Page 42: Tag-based Semantic Website Recommendation

Demo

Website Tags

http://www.sahadan.com/Default.aspx eğlence, merak, futbol

http://www.erepublik.com/ iletişim, strateji, oyun

http://www.1907unifeb.org/forums fenerbahçe, sohbet, eğlence

http://ligtv.com.tr/ maç özetleri, haber, futbol

https://www.tuttur.com/ para, futbol, eğlence

[email protected]

Page 43: Tag-based Semantic Website Recommendation

Demo

Website Tags

http://www.sahadan.com/Default.aspx eğlence, merak, futbol

http://www.erepublik.com/ iletişim, strateji, oyun

http://www.1907unifeb.org/forums fenerbahçe, sohbet, eğlence

http://ligtv.com.tr/ maç özetleri, haber, futbol

https://www.tuttur.com/ para, futbol, eğlence

ligtv.com.tr bilgi

ligtv.com.tr gazete

[email protected]

Added after

semantics

analysis

Page 44: Tag-based Semantic Website Recommendation

Demo

Website User Satisfaction

mackolik.com Accepted

zaytung.com Accepted

9gag.com Accepted

dizi-mag.com Accepted

galatasaray.com.tr Not Accepted

[email protected]

Website

sahadan.com

erepublik.com

1907unifeb.org/forums

ligtv.com.tr

tuttur.com

Recommended

Websites

User Inputs

Page 45: Tag-based Semantic Website Recommendation

References

Adrian, B., Sauermann, L., & Roth-berghofer, T. (2007). ConTag: A

Semantic Tag Recommendation System. Proceedings of ISemantics’ 07

Aksan, Y. et al. (2012). Construction of the Turkish National Corpus (TNC).

In Proceedings of the Eight International Conference on Language

Resources and Evaluation (LREC 2012). İstanbul. Turkiye.

http://www.lrec-conf.org/proceedings/lrec2012/papers.html

Brill, E., & Moore, R. C. (2000). An Improved Error Model for Noisy

Channel Spelling Correction. (Microsoft Research)

Cattuto, C., Benz, D., Hotho, A., & Stumme, G. (2008). Semantic

Grounding of Tag Relatedness in Social Bookmarking Systems. In The

Semantic Web - ISWC 2008. 2008: Springer

Durao, F., & Dolog, P. (2009). A Personalized Tag-based Recommendation

in Social Web Systems. International Workshop on Adaptation and

Personalization for Web 2.0

Page 46: Tag-based Semantic Website Recommendation

References

Education First, (2012). EF EPI Country Rankings

Frankfurt International School, (2001). The Differences Between English

and Turkish

ISPA (Investment Support and Promotion Agency) of Turkey, (2010).Turkish

Information and Communication Technologies Industry. Deloitte

Nakamoto, R., Nakajima, S., Miyazaki, J., & Uemura, S. (2007). Tag-

based Contextual Collaborative Filtering. IAENG International Journal of

Computer Science

Özbek, A. (2012). Türkçe Eşanlamlı Kelimeler Sözlüğü Projesi (Turkish

Thesaurus Project). Retrieved from http://github.com/maidis/mythes-tr

Page 47: Tag-based Semantic Website Recommendation

Thank you!