u sing o ntological r elationships to p rovide i ndexing of p lain t ext s earches 1 research by...

12
USING ONTOLOGICAL RELATIONSHIPS TO PROVIDE INDEXING OF PLAIN TEXT SEARCHES 1 Research by Fletcher Liverance [email protected] November 14 th , 2011

Upload: esmond-wade

Post on 13-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

1

USING ONTOLOGICAL RELATIONSHIPS TO PROVIDE INDEXING OF PLAIN TEXT SEARCHES

Research by Fletcher [email protected]

November 14th, 2011

Page 2: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

HOW DOES A SEARCH ENGINE WORK?

2

Page Rank

8 18374

7 12973

23 9977

99 7645

192 5521

15 4211

65 988

1. User submits a keyword based query to the search

engine

2. The indexer locates all relevant pages containing

those keywords

3. The database returns all pages found in the index

4. Pages are ranked and returned to the user

Page 3: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

3

HOW DOES A SEARCH ENGINE WORK?

Benefits Fast Machine learnable Straight forward

Drawbacks Pattern matching Keyword based Garbage in, garbage out

Page 4: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

4

GARBAGE IN, GARBAGE OUT

ScenarioYou saw this television series and you’d like to find out more about it, but you don’t know what the name of the series or any of the characters are.

What do you do?

http://www.dan-dare.org/FreeFun/Images/CartoonsMoviesTV/WinnieThePoohWallpaper1024.jpg

Page 5: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

5

GARBAGE IN, GARBAGE OUT

POOR RESULTS!

Page 6: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

6

GARBAGE IN, GARBAGE OUT

GOOD RESULTS!

Page 7: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

7

SEMANTIC RELATIONSHIPS

Winnie the Pooh

Bear

Yellow

Disney

Piglet Shirt

RedPig

isMadeBy isA

hasClothing

hasColor

hasColorhasFriend

isA

Ontology“An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.” http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

Resource Description Framework (RDF)“RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link. Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.”http://www.w3.org/RDF/

Page 8: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

8

SEMANTIC RELATIONSHIPS

Winnie the Pooh

Bear

Yellow

Disney

Piglet Shirt

RedPig

isMadeBy isA

hasClothing

hasColor

hasColorhasFriend

isA

How can we locate useful semantic relationships? Link Distance Link Direction Link Relationship

BrownMammal

Company

isAisA hasColor

0xFFFF00

hasRGB

Page 9: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

MODIFIED SEARCH INDEXING

9

Search Rank

8 18374

7 12973

23 9977

99 7645

192 5521

1. User submits a keyword based query to the search

engine

4. Searches are ranked and returned to the user as

additional search suggestions

2. Search analyzer creates additional searches based on ontological information

3. Search engine performs parallel searches of top search

terms

Page 10: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

CURRENT WORK

10

NASA SWEET Ontologies 6000 concepts 200 ontologies Scientific Loose relationships

National Oceanographic and Atmospheric Administration 30+ years of scientific research Text based Unsorted 2+ gigabytes Domain specific terminology

Page 11: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

CHALLENGES & FUTURE WORK

11

How to rank plain text No links or history No ‘page views’

Limited ontology coverage 6000 concepts in NASA SWEET ontologies ~170,000 words in the English language Many more unique names and scientific terms How can ontologies be automatically generated?

Graph matching Identifying related terms in a large graph is difficult Multiple links per node, must identify appropriate links

Page 12: U SING O NTOLOGICAL R ELATIONSHIPS TO P ROVIDE I NDEXING OF P LAIN T EXT S EARCHES 1 Research by Fletcher Liverance fletcher.liverance@gmail.com November

12

Q & A