learning patterns on the world wide web andrew hogue advisor: david karger october 17, 2003

26
Learning Patterns on the World Wide Web Andrew Hogue Advisor: David Karger October 17, 2003

Upload: colleen-hampton

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Learning Patterns on the World Wide Web

Andrew Hogue

Advisor: David Karger

October 17, 2003

Agenda

What is a pattern? How do we make one? How do we use it? Why do you want one? Demo

What is a pattern?

Objects in the world have certain semantic properties

A pattern is a way of recognizing the semantic properties of an object we’ve seen before

A pattern is a structure with semantic slots to be filled in

Example – Books

Define an object’s semantics (ontology):

Class: BookProperty: Author

Property: Title

Property: Price

Property: Publisher

Property: ISBN

. . .

Class: BookProperty: Author

Property: Title

Property: Price

Property: Publisher

Property: ISBN

. . .

Example - Books

?

?

Example - Books

Class: BookProperty: Author

Property: Title

Property: Price

Property: Publisher

Property: ISBN

. . .

Example - Books

Class: BookProperty: Author

Property: Title

Property: Price

Property: Publisher

Property: ISBN

. . .

Creating a Pattern

Choose positive examples

Creating a Pattern

Creating a Pattern

Creating a Pattern

Choose positive examples Find best mapping between examples

Creating a Pattern

Creating a Pattern

Choose positive examples Find best mapping between examples Merge mapped elements and assign

semantic labels

Creating a Pattern

Creating a Pattern

Creating a Pattern

Creating a Pattern

Creating a Pattern

Choose positive examples Find best mapping between examples Merge mapped elements and assign

semantic labels Eliminate unmapped elements

Creating a Pattern

Creating a Pattern

Matching Patterns

Given a pattern with slots and a page to search

Look for items on page with same structure Map pattern slots to page text

Matching Patterns

Matching Patterns

Applications

Extract search engine results Extract and email news headlines Watch sites for updates Reformat sites for easier reading Monitor bank account balances

Demo

More Information

http://haystack.lcs.mit.edu

[email protected]