2006-10-09 - kes2006 - dawn · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 motivation goal:...

26
Sebastian Stober and Andreas Nürnberger Institute of Language and Knowledge Engineering Otto-von-Guericke University Magdeburg, Germany http://irgroup.cs.uni-magdeburg.de e-mail: {stober,nuernb}@iws.cs.uni-magdeburg.de DAWN - A System for Context-based Link Recommendation in Web Navigation KES2006 10th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems October 9, 2006

Upload: others

Post on 06-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

Sebastian Stober and Andreas Nürnberger Institute of Language and Knowledge Engineering

Otto-von-Guericke University Magdeburg, Germany

http://irgroup.cs.uni-magdeburg.de e-mail: {stober,nuernb}@iws.cs.uni-magdeburg.de

DAWN - A System for Context-based Link Recommendation in Web Navigation

KES2006 10th International Conference on Knowledge-Based & Intelligent Information & Engineering Systems

October 9, 2006

Page 2: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 2

Outline

 Motivation  Underlying Model

  General Assumptions   Problems using Markov Models   Generalization   Graph Representation

 Recommender Algorithm  Test Results, Work in Progress, Future Work  Summary

“DAWN” (Direction Anticipation in Web Navigation)

Page 3: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 3

Motivation   Goal: System that provides support for navigating the

World Wide Web   …by recommending outgoing links of a web page   …without confinement to a specific web site or domain

  Approaches made so far include:   look-ahead crawler finding pages with similar content as

the last (n) pages [Lieberman’95:Letizia]   recommending pages (within the web site) that users with

similar interests/information need have visited [Joachims et al.’97:WebWatcher]

  Idea: Users may (unconsciously) develop navigational patterns for web browsing   learn these patterns and use them as heuristic for

recommendation

Page 4: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 4

Modeling Navigational Patterns  Assumption: Browsing process has the Markov

Property, i.e.:   the (conditional) probability for the next link

chosen depends only on the last n visited web pages

  web pages visited further back in history have no impact on the choice of the next link

 Successfully applied in   web-cache optimization (predictive prefetching)   web-usage mining

Page 5: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 5

Problems using Markov Models   Exponential space complexity

  for each possible sequence of n pages, a probability distribution for the next page has to be stored: complexity O(|S|n+1) (where S is the state space of the model)

  WWW contains too many web pages and links   state space would become way too big to handle   very sparse training data even if the training data set is

huge (sparsity problem)

  If we don‘t want to reduce the order of the model, there are only 2 ways to reduce a model‘s space complexity:   reduce the size of the state space   find an efficient way to store the model (compression)

Page 6: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 6

Reducing the Size of the State Space

 … by grouping similar pages into “contexts” by clustering (k-means on standard TFiDF doc representations)

C1 C3

C2

sequences of web pages:

P1 → P3 → P5

P1 → P2 → P4 → P5

sequences of contexts:

C1 → C2 → C3

C1 → C1 → C2 → C3

P1

P2

P3 P4

P5

  reduces the size of the state space (S)   generalizes the model (works on unseen pages as well)   reduces sparsity-problem

Page 7: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 7

 A 1st-order Markov Model can be represented as a weighted directed graph:   contexts → vertices   for all ci, cj with P(cj |ci) > 0

insert a directed edge from ci to cj with weight P(cj |ci)

 Question: How can this representation be extended to higher-order models?   idea: context sequences → vertices

Model Representation (1st order)

Note: This is not to be confused with a Markov Network! Vertices refer here to states of a system and not to random variables!

c1

c2

0.38 0.75

c3

c4

1.0

0.8 0.5

0.62 c5

0.5

0.25

0.2

Page 8: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 8

c1

c2

0.38 0.75

c3

c4

1.0

0.8 0.5

0.62 c5

0.5

0.25

0.2

  idea: context sequences (n-grams) → vertices i.e. for each possible sequence of contexts in the data with length n, create a vertice:

  Observation: vertices can be merged if the probability distributions for the next context are (nearly) identical

0.5

Model Representation (nth order)

c1c2

0.38 0.75

1.0

1.0 0.5

0.62

0.25

c4c1

c5c4

c1c3

c2c3

c4c3

c2c5 c3c5

1.0

1.0

old c3

old c5

c5c3

1.0

0.4

0.6

Page 9: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 9

Model Representation (compression)

  idea [Borges and Levene ‘05]:   construct graph incrementally

(i.e. start with 1st-order model and construct nth-order model from (n-1)th-order model)

  clone a vertice only if the nth-order probability distribution differs from the one in the (n-1)th-order model

  if a vertice needs to be cloned, merge clones with (nearly) identical probability distributions

Page 10: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 10

Overview “DAWN” (Direction Anticipation in Web Navigation)

so far…

next…

Page 11: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 11

Recommendation Algorithm   each time the history is updated (i.e. user clicks a link)

  map the history onto the model by   …finding similar paths (context sequences)

  …and computing the path weights (minimum of the context-page similarities)

  compute the probability distribution for the next context by overlaying all similar paths (weighted)

  for each candidate page (outgoing link of current page)

  map onto the model (find similar contexts)

  recommend if the probability for at least one of the most similar contexts exceeds a threshold θ

Page 12: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 12

Ex: Recommendation Algorithm

  find similar contexts for Pt-1

  A,C,C‘   find successors of A, C and

C‘, that are similar to Pt   B,E

  compute path weights (min of all context-page similarities)

  wA,B = min(0.7 , 0.9) = 0.7   wC,B = min(0.9 , 0.9) = 0.9   wC‘,E = min(0.9 , 0.8) = 0.8   wB = 0.9   wE = 0.8

C F

A DB

C’

H E

G

5 (0.5)

2 (0.4)

5 (0.5)

3 (0.6) 0.8

0.9

0.9

0.9

0.7

  map the history (Pt-1,Pt) onto the model (similarity threshold λ=0.7):

Page 13: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 13

Ex: Recommendation Algorithm

  compute the probability distribution for the next state   weighted overlay of the probability distributions induced

by the different similar paths in the model:

  processing of candidate page Y:   identify similar contexts of Y (with similarity threshold λ=0.7)

  D   if P(D|Ct-1,Ct) ≥ θ then Y is recommended (e.g. for θ = 0.4)

Page 14: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 14

Outlook  Test results on server logs indicate that

DAWN’s recommendations might be useful:   In about 30% of all test cases the actual chosen

link was amongst the 3 highest ranked recommendations

 Work in progress:   Development of browser front-end   Incorporation of further information   History visualization

 Future work:   User study (usability, helpfulness)

Page 15: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 15

Summary  DAWN uses a higher-order Markov Model to

capture a user’s browsing behavior and to predict, which page the user might want to access next

 Assumption: Probability for the next link chosen depends only on the last n visited web pages

 Model size is reduced by   grouping similar pages into contexts   using a special graph representation

 Preliminary test results on server logs indicate usefulness of the recommendations

Page 16: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

Thank You for Your Attention

Sebastian Stober and Andreas Nürnberger http://irgroup.cs.uni-magdeburg.de

e-mail: {stober,nuernb}@iws.cs.uni-magdeburg.de

Page 17: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 17

Detailed Test Results   Number of sessions, requests and unique URLs in the data used for

training and evaluation:

  Predicted ranks of the actually followed links. The number of candidates (number of different outbound links in a web page) ranged from 1 to 607 with a mean of 10.77.

Page 18: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 18

Browser Integration

Recommendations are displayed in an overlay <div>- element within the web page

Page 19: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 19

Browser Integration thumbnail

  Visual information:   Impression of the page layout and favicon that the user might

remember   Content information:

  Page title, uri   Ranking information:

  List of recommendations is sorted by visit probability predicted by the model

probabilty

favicon

page title uri

Page 20: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 20

Ex: Inducing 1st-order Model

S F

A C

B

sequence # A,B,C 3

A,B,D 2

E,B,C 1

E,B,D 1

G,B,C 2

G,B,D 1

A,B,D,H 1

E,B,D,H 3

G,B,D,H 2

3 3

3 3

absolute transition frequency (in the data)

artificial states for start and end

Example from: José Borges and Mark Levene. Generating Dynamic Higher-Order Markov Models in Web Usage Mining, PKDD, 2005.

Page 21: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 21

Ex: Inducing 1st-order Model

Example from: José Borges and Mark Levene. Generating Dynamic Higher-Order Markov Models in Web Usage Mining, PKDD, 2005.

S F

A C

B

D

5 5

3 3

2 2

sequence # A,B,C 3

A,B,D 2

E,B,C 1

E,B,D 1

G,B,C 2

G,B,D 1

A,B,D,H 1

E,B,D,H 3

G,B,D,H 2

Page 22: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 22

Ex: Inducing 1st-order Model

Example from: José Borges and Mark Levene. Generating Dynamic Higher-Order Markov Models in Web Usage Mining, PKDD, 2005.

S F

A C

B

D

5 5

4 4

2 2 E

1 1

sequence # A,B,C 3

A,B,D 2

E,B,C 1

E,B,D 1

G,B,C 2

G,B,D 1

A,B,D,H 1

E,B,D,H 3

G,B,D,H 2

Page 23: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 23

Ex: Inducing 1st-order Model

Example from: José Borges and Mark Levene. Generating Dynamic Higher-Order Markov Models in Web Usage Mining, PKDD, 2005.

S F

A C

B

D

5 5

4 4

3 3 E

2 2

sequence # A,B,C 3

A,B,D 2

E,B,C 1

E,B,D 1

G,B,C 2

G,B,D 1

A,B,D,H 1

E,B,D,H 3

G,B,D,H 2

Page 24: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 24

Ex: 1st-order Model

Example from: José Borges and Mark Levene. Generating Dynamic Higher-Order Markov Models in Web Usage Mining, PKDD, 2005.

S F

A C

B

D

6 (0.38) 6 (1)

6 (0.38) 6 (1)

10 (0.62)

4 (0.40)

H

E

G

5 (1)

5 (1) 5 (0.31)

5 (0.31)

6 (0.60)

6 (1)

transition probability P(C|B)

sequence # A,B,C 3

A,B,D 2

E,B,C 1

E,B,D 1

G,B,C 2

G,B,D 1

A,B,D,H 1

E,B,D,H 3

G,B,D,H 2

Page 25: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 25

Ex: Inducing 2nd-order Model

Example from: José Borges and Mark Levene. Generating Dynamic Higher-Order Markov Models in Web Usage Mining, PKDD, 2005.

S F

A C

B

D

6 (0.38) 6 (1)

6 (0.38) 6 (1)

10 (0.62)

4 (0.40)

H

E

G

5 (1)

5 (1) 5 (0.31)

5 (0.31)

6 (0.60)

6 (1)

C D

A,B 0.5 0.5

E,B 0.2 0.8

G,B 0.4 0.6

check transition probability distribution induced by the different in-paths (2-grams)

B needs to be cloned!

similar!

sequence # A,B,C 3

A,B,D 2

E,B,C 1

E,B,D 1

G,B,C 2

G,B,D 1

A,B,D,H 1

E,B,D,H 3

G,B,D,H 2

Page 26: 2006-10-09 - KES2006 - DAWN · 9 oct 2006 kes 2006 - sebastian stober : dawn 3 Motivation Goal: System that provides support for navigating the World Wide Web …by recommending outgoing

9 oct 2006 kes 2006 - sebastian stober : dawn 26

Ex: 2nd-order Model

Example from: José Borges and Mark Levene. Generating Dynamic Higher-Order Markov Models in Web Usage Mining, PKDD, 2005.

S F

A C

D

6 (0.38) 6 (1) 5 (0.45)

6 (0.55)

1 (0.2)

4 (0.8) B’

B

6 (1)

4 (0.40)

H

E

G 5 (1)

5 (1)

5 (0.31)

5 (0.31)

6 (0.60)

6 (1)

insert clone B‘ of B for 2-gram “SA” and adjust all incoming and outgoing edges

sequence # A,B,C 3

A,B,D 2

E,B,C 1

E,B,D 1

G,B,C 2

G,B,D 1

A,B,D,H 1

E,B,D,H 3

G,B,D,H 2