using ontological relationships to provide indexing of plain t ext searches

12
USING ONTOLOGICAL RELATIONSHIPS TO PROVIDE INDEXING OF PLAIN TEXT SEARCHES 1 Research by Fletcher Liverance [email protected] November 14 th , 2011

Upload: udell

Post on 22-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Using Ontological Relationships to Provide Indexing of Plain T ext Searches. Research by Fletcher Liverance [email protected] November 14 th , 2011. How Does a Search Engine Work?. 1. User submits a keyword based query to the search engine. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

1

USING ONTOLOGICAL RELATIONSHIPS TO PROVIDE INDEXING OF PLAIN TEXT SEARCHES

Research by Fletcher [email protected]

November 14th, 2011

Page 2: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

HOW DOES A SEARCH ENGINE WORK?

2

Page Rank

8 183747 1297323 997799 7645192 552115 421165 988

1. User submits a keyword based query to the search

engine

2. The indexer locates all relevant pages containing

those keywords

3. The database returns all pages found in the index

4. Pages are ranked and returned to the user

Page 3: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

3

HOW DOES A SEARCH ENGINE WORK?

Benefits Fast Machine learnable Straight forward

Drawbacks Pattern matching Keyword based Garbage in, garbage out

Page 4: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

4

GARBAGE IN, GARBAGE OUT

ScenarioYou saw this television series and you’d like to find out more about it, but you don’t know what the name of the series or any of the characters are.

What do you do?

http://www.dan-dare.org/FreeFun/Images/CartoonsMoviesTV/WinnieThePoohWallpaper1024.jpg

Page 5: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

5

GARBAGE IN, GARBAGE OUT

POOR RESULTS!

Page 6: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

6

GARBAGE IN, GARBAGE OUT

GOOD RESULTS!

Page 7: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

7

SEMANTIC RELATIONSHIPS

Winnie the Pooh Bear

Yellow

Disney

Piglet Shirt

RedPig

isMadeBy isA

hasClothing

hasColor

hasColorhasFriend

isA

Ontology“An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.” http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

Resource Description Framework (RDF)“RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link. Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.”http://www.w3.org/RDF/

Page 8: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

8

SEMANTIC RELATIONSHIPS

Winnie the Pooh

Bear

Yellow

Disney

Piglet Shirt

RedPig

isMadeBy isA

hasClothing

hasColor

hasColorhasFriend

isA

How can we locate useful semantic relationships? Link Distance Link Direction Link Relationship

BrownMammal

Company

isAisA hasColor

0xFFFF00

hasRGB

Page 9: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

MODIFIED SEARCH INDEXING

9

Search Rank

8 183747 1297323 997799 7645192 5521

1. User submits a keyword based query to the search

engine

4. Searches are ranked and returned to the user as

additional search suggestions2. Search analyzer creates additional searches based on ontological information

3. Search engine performs parallel searches of top search

terms

Page 10: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

CURRENT WORK

10

NASA SWEET Ontologies 6000 concepts 200 ontologies Scientific Loose relationships

National Oceanographic and Atmospheric Administration 30+ years of scientific research Text based Unsorted 2+ gigabytes Domain specific terminology

Page 11: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

CHALLENGES & FUTURE WORK

11

How to rank plain text No links or history No ‘page views’

Limited ontology coverage 6000 concepts in NASA SWEET ontologies ~170,000 words in the English language Many more unique names and scientific terms How can ontologies be automatically generated?

Graph matching Identifying related terms in a large graph is difficult Multiple links per node, must identify appropriate links

Page 12: Using Ontological Relationships to Provide Indexing of Plain  T ext Searches

12

Q & A