adative websites

20
TOWARDS ADAPTIVE WEBSITES: A CONCEPTUAL FRAMEWORK AND CASE STUDY Department of Computer Science, November 2005 Presented by Akash Y Shindhe

Upload: akash-shindhe

Post on 24-May-2015

47 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: adative websites

TOWARDS ADAPTIVE WEBSITES: A CONCEPTUAL FRAMEWORK

AND CASE STUDY

Department of Computer Science, November 2005

Presented by

Akash Y Shindhe

Page 2: adative websites

AGENDA Introduction

Motivation What are Adaptive Websites? Approaches to Adaptation The Index Page Synthesis Use Case

The PageGather Algorithm Description of The Algorithm Experimental Method Time Complexity Comparison with Related Algorithms

Conclusions Related Work Summary Resources

Page 3: adative websites

MOTIVATION

Designing a complex web site so it readily yields its information is tricky, because:1. Different visitors have distinct goals2. Same user may seek different information at different

times3. Many sites outgrow their original design, accumulating

links and pages in unlikely places4. A site may be designed for a particular use, but may be

used in unanticipated ways in practice

Too often, web sites are fossils cast in HTML, while web navigation is dynamic, time-dependent, and idiosyncratic

Page 4: adative websites

WHAT ARE ADAPTIVE WEBSITES?Adaptive websites are sites that

automatically improve their organization and presentation by learning from visitor access patterns

They mine the data buried in web server logs to produce more easily navigable websites

To demonstrate the feasibility of adaptive websites, the index page synthesis use case is considered

Page 5: adative websites
Page 6: adative websites

APPROACHES TO ADAPTATION Aim is to make a website “better”, so we need

a clear quality measure Quality measure as a function of variables:

How often users find what they are looking for How many clicks users have to make to get to their

goal How much time users spend reading link text and

scrolling through pages Two approaches to adaptation:

Content-based : organizes and presents pages based on their content.

Access-based : uses the way past visitors have interacted with the site to guide how information is structured.

Content-based and access-based adaptations are complementary and may be used together

Page 7: adative websites

THE INDEX PAGE SYNTHESIS CASE STUDY (1)

Page synthesis is the automatic creation of web pages

An index page is a page consisting of links to a set of pages that cover a particular topic

Index page synthesis problem: given a web site and a visitor access log, create new index pages containing collections of links to related but currently unlinked pages

Page 8: adative websites

THE INDEX PAGE SYNTHESIS CASE STUDY (2)

The Index Page Synthesis Problem:1. What are the contents (i.e. hyperlinks) of the

index page?2. How are the hyperlinks on the page ordered?3. How are the hyperlinks labeled?4. What is the title of the page? Does it

correspond to a coherent concept?5. Is it appropriate to add the page to the site?

If so, where?

Page 9: adative websites

SOLUTIONS

2 Algorithms have been suggested by the authors of the paper

PageGatherIndexFinder

Page 10: adative websites

THE PAGEGATHER ALGORITHM

1. The PageGather algorithm is a statistical cluster mining algorithm

2. Clustering algorithms take a collection of objects as their input and produce a partition of the collection

3. Cluster mining is a variation on traditional clustering that may place a single object in multiple overlapping clusters

4. PageGather uses cluster mining to find collections of related pages at a website

Page 11: adative websites

DESCRIPTION OF PAGEGATHER

1. Process the access log into visits2. Compute the co-occurrence frequencies

between pages and create a similarity matrix

3. Create the graph corresponding to the matrix, and find maximal cliques (or connected components) in the graph

4. Rank the clusters found, and choose which to output

5. Eliminate overlap among the clusters6. Present it to the webmaster for evaluation

Page 12: adative websites

TIME COMPLEXITY What is the running time of PageGather? Let L be the number of page views in the log

and N the number of pages at the site Step (1) requires O(L log L) time: page views

must be sorted by origin and time Step (2) requires O(L + N2) time: must process

the log and create a matrix of size O(N2) In step (3) we may find either connected

components (linear in the size of the graph) or cliques (exponential in general, but since size of discovered clusters is bound to k, this step is a polynomial of degree k)

Page 13: adative websites

COMPARISON WITH RELATED ALGORITHMS

PageGather significantly outperforms other statistical clustering algorithms, but is not as well as human-authored clusters

Page 14: adative websites

IMPLEMENTATIONS And More:

Use both user’s path and model to guess what pages they are interested in seeing e.g., AVANTI Project [1]

Automatic user categorization Hybrid approach Footprints [2] uses the metaphor of travellers creating

footpaths in the grass over time Using meta-information e.g., XML, Apple’s Meta-

Content Format, STRUDEL [3] Client-side customization

Page 15: adative websites

CONCLUSIONS (1)

PageGather and IndexFinder outperform traditional methods including: the Apriori data mining algorithm, standard clustering algorithms and the COBWEB conceptual clustering algorithm

PageGather and IndexFinder are instances of novel, domain-independent approaches to unsupervised data mining

Extensions and applications to these approaches outside the domain of adaptive websites can be found

Page 16: adative websites

CONCLUSIONS (2)

Future work may focus on the automatic placement of new index pages at the website

Automatically suggesting names for the new pages, and deciding where in the site they should be located

Index page synthesis itself is a step towards the long-term goal of change in view: adaptive websites that automatically suggest re-organisations of their contents based on visitor access patterns

Page 17: adative websites

RELATED WORK By the authors:

Mainly updates to the original paper (most recent one in 2001)

By others:Adaplix [5] : A system that extends HTML

by introducing conditional statements and an inductive logic programming component to learn the user's browsing preferences

WebWatcher [6]: A “tour guide” of the web. It accompanies the user from page to page, highlighting hyperlinks that it believes will be of interest

Page 18: adative websites

SUMMARY

We have covered: Adaptive Websites The Index Page Synthesis Use Case The PageGather Algorithm The IndexFinder Algorithm Implementations Related Work

Page 19: adative websites

ANY QUESTIONS?

Page 20: adative websites

RESOURCES [1] J. Fink, A Kobsa, and A. Nill.

User-oriented Adaptivity and Adaptability in the AVANTI Project. In Designing for the Web: Empirical Studies, Microsoft Usability Group, Redmond (WA)., 1996.

[2] A. Wexelblat and P. Maes. Footprints: History-rich web browsing. In Proc. Conf. Computer-Assisted Information Retrieval (RIAO), pages 75-84, 1997.

• [3] M. Fernandez, D. Florescu, J. Kang, A. Levy, and D. Suciu. System Demonstration - Strudel: A Web-site Management System. In ACM SIGMOD Conference on Management of Data, 1997.

[4] D. Fisher. Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning, 2:139-172, 1987

[5] Nico Jacobs. Adaplix: Towards Adaptive Websites. In P. De Bra and L. Hardman, editors, Proceedings van de Informatiewetenschap'99 Conferentie, pages 22--28. Eindhoven University of Technology, November 1999

[6] URL : http://www.cs.cmu.edu/~webwatcher, accessed on 22 November 2005