u sing o ntological r elationships to p rovide i ndexing of p lain t ext s earches 1 research by...
Post on 13-Dec-2015
217 Views
Preview:
TRANSCRIPT
1
USING ONTOLOGICAL RELATIONSHIPS TO PROVIDE INDEXING OF PLAIN TEXT SEARCHES
Research by Fletcher Liverancefletcher.liverance@gmail.com
November 14th, 2011
HOW DOES A SEARCH ENGINE WORK?
2
Page Rank
8 18374
7 12973
23 9977
99 7645
192 5521
15 4211
65 988
1. User submits a keyword based query to the search
engine
2. The indexer locates all relevant pages containing
those keywords
3. The database returns all pages found in the index
4. Pages are ranked and returned to the user
3
HOW DOES A SEARCH ENGINE WORK?
Benefits Fast Machine learnable Straight forward
Drawbacks Pattern matching Keyword based Garbage in, garbage out
4
GARBAGE IN, GARBAGE OUT
ScenarioYou saw this television series and you’d like to find out more about it, but you don’t know what the name of the series or any of the characters are.
What do you do?
http://www.dan-dare.org/FreeFun/Images/CartoonsMoviesTV/WinnieThePoohWallpaper1024.jpg
5
GARBAGE IN, GARBAGE OUT
POOR RESULTS!
6
GARBAGE IN, GARBAGE OUT
GOOD RESULTS!
7
SEMANTIC RELATIONSHIPS
Winnie the Pooh
Bear
Yellow
Disney
Piglet Shirt
RedPig
isMadeBy isA
hasClothing
hasColor
hasColorhasFriend
isA
Ontology“An ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents.” http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
Resource Description Framework (RDF)“RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link. Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.”http://www.w3.org/RDF/
8
SEMANTIC RELATIONSHIPS
Winnie the Pooh
Bear
Yellow
Disney
Piglet Shirt
RedPig
isMadeBy isA
hasClothing
hasColor
hasColorhasFriend
isA
How can we locate useful semantic relationships? Link Distance Link Direction Link Relationship
BrownMammal
Company
isAisA hasColor
0xFFFF00
hasRGB
MODIFIED SEARCH INDEXING
9
Search Rank
8 18374
7 12973
23 9977
99 7645
192 5521
1. User submits a keyword based query to the search
engine
4. Searches are ranked and returned to the user as
additional search suggestions
2. Search analyzer creates additional searches based on ontological information
3. Search engine performs parallel searches of top search
terms
CURRENT WORK
10
NASA SWEET Ontologies 6000 concepts 200 ontologies Scientific Loose relationships
National Oceanographic and Atmospheric Administration 30+ years of scientific research Text based Unsorted 2+ gigabytes Domain specific terminology
CHALLENGES & FUTURE WORK
11
How to rank plain text No links or history No ‘page views’
Limited ontology coverage 6000 concepts in NASA SWEET ontologies ~170,000 words in the English language Many more unique names and scientific terms How can ontologies be automatically generated?
Graph matching Identifying related terms in a large graph is difficult Multiple links per node, must identify appropriate links
12
Q & A
top related