Download - Learning Patterns on the World Wide Web Andrew Hogue Advisor: David Karger October 17, 2003
What is a pattern?
Objects in the world have certain semantic properties
A pattern is a way of recognizing the semantic properties of an object we’ve seen before
A pattern is a structure with semantic slots to be filled in
Example – Books
Define an object’s semantics (ontology):
Class: BookProperty: Author
Property: Title
Property: Price
Property: Publisher
Property: ISBN
. . .
Class: BookProperty: Author
Property: Title
Property: Price
Property: Publisher
Property: ISBN
. . .
Example - Books
?
?
Example - Books
Class: BookProperty: Author
Property: Title
Property: Price
Property: Publisher
Property: ISBN
. . .
Example - Books
Class: BookProperty: Author
Property: Title
Property: Price
Property: Publisher
Property: ISBN
. . .
Creating a Pattern
Choose positive examples Find best mapping between examples Merge mapped elements and assign
semantic labels
Creating a Pattern
Choose positive examples Find best mapping between examples Merge mapped elements and assign
semantic labels Eliminate unmapped elements
Matching Patterns
Given a pattern with slots and a page to search
Look for items on page with same structure Map pattern slots to page text
Applications
Extract search engine results Extract and email news headlines Watch sites for updates Reformat sites for easier reading Monitor bank account balances