natural language query interface mostafa karkache & bryce wenninger
Post on 21-Dec-2015
216 views
TRANSCRIPT
Natural Language Query Interface
Mostafa Karkache & Bryce Wenninger
Outline Natural Language Query Interface Introduction
What is Natural Language Query Interface? Why do we need this type interface?
Problems implementing this interface. Ambiguity i.e. Semantics Size of the language Syntax And Grammars Anaphora Indexicality Metaphor
NL domains of application Internet
Information retrieval : search engines Information filtering: document grouping
Database Conclusion:
Current status Future trend
Introduction
What is Natural Query Language Interface? What is it for? Where would it be used?
Why do we need this interface? How much would it really help? Is it even possible?
Problems with implementation
• Ambiguity i.e. Semantics• Size of the language• Syntax And Grammars• Anaphora• Indexicality• Metaphor
Problems with implementation… Ambiguity and Semantics
The boy saw the man on the hill with the telescope
Ambiguity and Semantics What is wrong with this sentence? Ans:
Too ambiguous How many different meanings can it
have? The only way to truly understand is to be
there.
Ambiguity and Semantics
Another example of problems is with semantics.
The word up can have many meanings when used in different ways such as, “Look up there”, “It is up to me”, “Is he up to the task?”, “She is not up yet”, “Starting up”, “What’s up dude?”
More problems with implementation…
Size of the language
Most Natural Languages have enormous vocabulary.
Example: The English language has approximately 3 Million words, and counting. 200,000 of which are in common use today (and this isn’t counting semantics).
More problems with implementation…
Syntax And Grammars
Languages have alphabets and rules Sample alphabet {a, b} Sample (rewrite) rules:
S aSbS ba
This will generate words of type ba, abab, aababb, aaababbb
Syntax And Grammars…
English’s main constituents: Sentences Noun phrases Verb phrases Prepositional phrases
A sample English grammar S -> NP VP NP -> Det NOMINAL NOMINAL -> Noun VP -> Verb Det -> a Noun -> table Verb -> found
Any Problem with that grammar? It is context free grammar, it only
account for the syntactic structure. CFG works fine for any high level
language. How about the semantics of the
words?
Semantic Representations Can we create representations of
the meanings of the English words?.
This is not an easy task. It is a very complex task. A context sensitive grammar is
needed.
More problems with implementation…
Anaphora
What is Anaphora? Pronouns and Nouns
Why is it a problem? Key words have to be tallied.
How would it have to be Handled?
More problems with implementation… Indexicality
A sentence that refers to a situation (place or time)
Example: “I am over here” Where is “here”? Who is “I”?
More problems with implementation… Metaphor
Non literal use of a word. “This process was killed because it
ran out of resources” Meaning in manufacturing vs.
computers.
Domains of Application
Internet Information retrieval: with search
engines Information filtering: document
grouping More...
Database
NLIQ and the internet
Information retrieval: an English query is issued to a
search engine, Documents relevant to the query are returned.
Two methods are used
Exact matching Inexact matching
Exact Matching
restrictive and is known for low hits
Inexact Matching
higher hit rate …But User might have to scan a lot of
returned documents!!
How does it work?
select the ‘candidate key’ words from the query
‘a’, ‘the’, ‘an’ etc would not make it Count the key words in the
documents “running”, “run”, “ran” and
“runner” would count as one
How does it work…
Rank the documents by frequency of key words found
Information filtering
Documents are first prepared and then searched
Documents are ranked by topics
NLQI and Database
Database business, an industry that runs in the billions of dollars
a more user-friendly interface between the user and the machine is needed
NLQI seems to fit well
Example of DB use
Give me the names of the employees of Banks Of America who signed up for 401k?
Sounds easy? To humans, yes Not to machines!!
What does “up” mean?
Give me the names of the employees of Banks Of America who signed up for 401k?
Solution!
Create an index that has all the meanings of every word that can be used in database domain!!!
Then guess what “up” would mean
NLQI and Database…
Can’t use NLQI to create a database:
---Data integrity compromised Could use NLQI for information
retrieval: ---Performance compromised, to
say the least
Current status
NLQI is used in many areas today, but it is very (very) application specific. This is to avoid a lot of the problems discussed in this presentation.
Broad use and what NLQI is truly capable of has not yet been realized.
Future trend Where is it going? The trend is to store more and more
data per user to help determine exactly what semantics the user is really intending. This is called incremental enhancement of the data retrieval process
Will it ever get there?
Questions?
You’ve got questions, we’ve got answers
(hopefully).