content-based recommendationpeterb/2480-012/contentbasedfiltering.pdf · content-based? • item...

40
Content-Based Recommendation

Upload: phamphuc

Post on 04-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Content-Based Recommendation

Page 2: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Content-based?

•  Item descriptions to identify items that are of particular interest to the user

Page 3: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Example

Page 4: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Example

Page 5: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Comparing with Non-content based

•  User-based CF�Searches for similar users in user-item “rating” matrix

•  No rating

•  Item-feature matrix

Ratings

Items

Page 6: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Item Representation

•  Structured

•  Unstructured

•  Semi-structured

Page 7: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Structured •  Attribute - value

Page 8: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Unstructured

•  Full-text

• No attributes formally defined

• Other complicated problems - such as synonymy, polysymy

Page 9: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Semi-structured

•  Structured + unstructured

• Well defined attributes/values + free text

Page 10: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Conversion of Unstructured Data

• Need to convert to structured form

•  IR techniques

•  VSM, TF-IDF, stemming, etc.

Page 11: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Profile

•  A model of user’s preference

•  A history of the user’s interactions

•  Recently viewed items

•  Already viewed items

•  Training data — machine learning

Page 12: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Profile Tear Rate

Reduced Normal

No Age Young Pre-presbyoptic

Yes Prescription

Presbyoptic

Yes

Hypermetrope Myope

Astigmatic Yes No

Yes No

Page 13: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Model

Collects information about individual user

Provides adaptation effect

Adaptive System

User Modeling side

Adaptation side

User profiling == user modeling

User Profiling

Page 14: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Profile

•  Customization

• Manual

•  Limitations — user efforts (interest change), no ordering

•  Rule-based recommendation — history based rules

Page 15: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Profile

Page 16: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Model Learning

•  Classification problem

•  Classifying to “Like” or “Dislike” •  Training data — feedbacks

•  Probability of classification

•  Unstructured data conversion

•  Feature selection — high/low dimensions

Page 17: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Model Learning

Train/Learn Classify/Recommend

Training Data Target Data

Page 18: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

UM Learning�Feedbacks

•  Implicit feedback

•  Indirect interaction

• Opened document, Reading time, etc.

•  Large data, high uncertainty

Page 19: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

UM Learning�Feedbacks

•  Explicit

•  Directly from users

•  No noise, hard to obtain

Page 20: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Model Learning�Feature Selection

•  Problem of high dimensional input vectors

• Overfit�(especially when a dataset is small)

•  Document frequency thresholding, Information gain, Mutual information, Chi square statistic, Term strength

Page 21: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Overfitting

Overfit Underfit

Page 22: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Model Learning�Feature Selection

• Mutual Information •  A = number of times t and c co-occur�

B = number of times t occurs without c�C = number of times c occurs without t�N = number of total documents

I(t,c) = logA ! N

(A + C) ! (A + B)

Page 23: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

User Model Learning�Feature Selection

•  “Austrian train fire accident” •  After learning 5 documents

Train Fire

Alps Austria People

A 5 5 2 5 5 B 5873 8092 93 974 34501 C 0 0 3 0 0 N 96260 96260 96260 96260 96260

Page 24: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Decision Tree

•  Partitioning dataset into trees

•  Ideal for structured, small data

•  Performance, simplicity, understandability

•  ID3 (Iterative Dichotomiser 3)

Page 25: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Decision Tree�Example

•  Using WEKA

•  http://www.cs.waikato.ac.nz/ml/weka/

• Machine learning algorithm package

•  JAVA API, interactive UI

Page 26: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Decision Tree�Example - dataset Training Testing

attribute to be estimated attribute as inputs

Page 27: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Decision Tree�Example - tree

Page 28: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user
Page 29: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Decision Tree�Example - tree

Tear Rate Reduced Normal

Age Young Pre-presbyoptic

Yes Yes

Prescription Presbyoptic

Yes

Hypermetrope Myope

Astigmatic Yes No

Yes No

Page 30: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Decision Tree�Example - evaluation

Page 31: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

k Nearest Neighbor

•  Prepare training data (classification labels)

•  Extract k most similar items

•  Decide the label of the test data by looking at kNN’s

Page 32: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

k Nearest Neighbor�Example

k=3 k=5

Page 33: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Training data

Linear Classifier

•  Tries to find out a hyperplane that best separates classes

•  Gradient Descent - incremental vector

Page 34: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Linear Classifier�SVN

•  Support Vector Machine

• Maximizes the distance between decision boundary & support vector (closest training instance)

•  Avoids overfitting

Page 35: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Linear Classifier�SVM

•  http://svm.dcs.rhbnc.ac.uk/pagesnew/

Page 36: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Naive Bayes

•  VSM — lack of theoretical justification

•  Probabilistic text classification method

• Naive = term independence

•  Probability document d is classified to category c

Page 37: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Naive Bayes • Multivariate Bernoulli

•  Document probability = Π of term probability �→ term independence assumption �→ “naive”

Page 38: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Naive Bayes

• Multinomial

• Non-binary

Page 39: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Naive Bayes�Example

Page 40: Content-Based Recommendationpeterb/2480-012/ContentBasedFiltering.pdf · Content-based? • Item descriptions to identify items that are of particular interest to the user

Conclusions

•  User model learns from content (description, fulltext, etc) itself

•  Implicit, explicit method

•  Classifying — like, dislike

•  Limitations — not enough content

•  Hybrid — content + collaborative