application areas
DESCRIPTION
Application Areas. For two lectures, we will examine a number of application areas of AI research Visual image understanding Speech recognition Natural language processing mostly we’ll look at NL understanding but we will briefly talk about NL generation and machine translation - PowerPoint PPT PresentationTRANSCRIPT
Application Areas• For two lectures, we will examine a number of
application areas of AI research– Visual image understanding– Speech recognition– Natural language processing
• mostly we’ll look at NL understanding• but we will briefly talk about NL generation and machine translation
– Search engine technology
• Next time, we will cover the following topics and possibly others (specific topics to be determined)– Homeland security– Sensor interpretation– Robotic vehicles– AI in space
Perception Problems• Vision understanding, natural language processing and
speech recognition all have several things in common– Each problem is so complex that it must be solved through a
series of mappings from subproblem to subproblem– Each problem requires a great deal of knowledge that is not
necessarily available or well-understood such that successful applications often utilize non-knowledge-based mechanisms
– Each problem contains some degree of uncertainty, often implemented using HMMs or neural networks
• Early approaches in AI were symbolic and often suffered for several reasons– Poor run-time performance (because of the sheer amount of
knowledge needed and the slow processors of the time)– Models that were based on our incomplete knowledge of
human (or animal) vision, auditory abilities and language– Lack of learning so that knowledge acquisition was essential
• Research into all three areas has progressed, but slowly
Vision Understanding Mapping
• The vision problem is shown to the left as a series of mappings– Low level
processing and filtering to convert from analog to digital
– Pixels edges/lines
– Lines regions– Regions
surfaces– Add texture,
shading, contours– Surfaces
objects– Classify objects– Analyze scene (if
necessary)
A Few Details• Computer vision has been studied for decades
– There is no single solution to the problem– Each of the mappings has its own solution
• often mathematical, such as the edge detection and mapping from edges to regions
• often applies constraint satisfaction algorithms to reduce the amount of search or computation required for low level processes
– The “intelligence” part really comes in toward the end of the process
• Object classification• Scene analysis• Surface and object disambiguation (determine which object a
particular surface belongs, dealing with optical illusions)
• Computer vision is practically an entire CS discipline and so is beyond the scope of what we can cover here (sadly)
Edge Detection• Waltz created an algorithm for edge detection by
– finding junction points (intersections of lines)– determining the orientation of the lines into the junction points– applying constraint satisfaction to select which lines belong to
which surfaces• Below, convex edges are denoted with + and concave edges with -
For trihedral junction points (intersection of3 lines), these are the 18 legal connections
Other approaches may be necessary forcurve, contour or blob detection and analysis
Often, these approaches use such mathematicalmodels as eigen-models (eigenform, eigenface), quadratics or superquadratics, distance measures,closest point computations, etc
Vision Sub-Applications• Machine produced character recognition
– Solved satisfactorily through neural networks
• Hand-written character recognition– Many solutions: neural networks, genetic algorithms, HMMs,
Bayesian probabilities, nearest neighbor matching approaches, symbolic approaches
• printed character recognition highly accurate, cursive recognition greatly varies
• Face recognition– Many solutions, often the solutions attempt to map the face’s
contours and texture into mathematical equations and Gaussian distributions
• Image stabilization and image (object) tracking– Solutions include neural networks, fuzzy logic, best fit search
• UAV input – we’ll discuss UAVs next time
Two Approaches to Handwritten Character
Recognition
Neural Networks with voting
Symbolic approach using pattern matching
Speech Recognition• In spite of the fact that research began in earnest in
1970, speech recognition is a problem far from solved– Problems:
• Multispeaker – people speak at different frequencies• Continuous speech – the speech signal for a given sound is dependent
on the preceding and succeeding sounds, and in continuous speech, it is extremely hard to find the beginning of a new word, making the search process even more computationally difficult
• Large vocabularies – not only does this complicate the search process because there might be many words that match, but it also brings in ambiguity
• SR attempts– Knowledge based approaches (particularly Hearsay)– Neural networks– Hidden Markov Models– Hybrid approaches
The Task Pictorially
The speech signal is segmentedoverlapping segments are processedto create a small window of speech
Processing typically involves FFTand Linear Predictive Coding (LPC)analysis
This provides a series of energypatterns at different frequencies
Phonetic Dependence• Below are two wave forms created by uttering the same vowel
sound, “ee” as in three (on the left) and tea (on the right)– notice how dissimilar the “ee” portion is, in fact the one on the right is even
longer– this problem is caused because of co-articulation – one sound will directly
impact the next sound
Hearsay-II• Hearsay-II attempted to
solve the problem through symbolic problem solvers– Each called a knowledge
group– They would
communicate through a global mechanism called a blackboard
– Each KG knew what part of the BB to read from and where to post partial conclusions to
– A scheduler would use a complex algorithm to decide which KG should be invoked next based on priority of KGs and what knowledge was currently available on the BB
Hearsay could recognize 1000 words ofcontinuous speech and several speakers with a limited syntax with an accuracy ofaround 90%
Sample of Hearsay-II’s Blackboard
Sentence is “Are any by Feigenbaum and Feldman?”
One Neural Network Approach• One approach is to build a separate network for every
word known in the system– For a small vocabulary isolated word system, this approach
may work • no need to worry about finding the separation between words or the
effect that a word ending might have on the next word beginning
Notice that NN have fixed sized inputs
An input here will be the processed speech signal in the form of LPCs
Continuous Speech• The preceding solution
does not work for continuous speech– Since there is no easy way to
determine where one word ends and the next begins, we cannot just rely on word models
– Instead, we need phonetic models
– The problem here is that the sound of a phoneme is influenced by the preceding and succeeding sounds
• a neural network only learns a “snapshot” of data and what we need is context dependent
• One solution is to use a recurrent NN, which remembers the output of the previous input to provide context or memory– Note that the RNN is much more
difficult to train, but can solve the speech problem more effectively than the normal NN
Another Solution: Multiple Nets
Here, there are multipleneural networks forthe various levels
The segmentation module is responsiblefor dividing the continuous signalinto segments
The unit level Generates phonetic units
The word module combinespossible phonetic units to words
HMM Approach• The HMM approach views speech recognition as finding
a path through a graph of connected phonetic and/or grammar models, each of which is an HMM– In this case, the speech signal is the observable, and the unit of
speech uttered is the hidden state– Typically, several frames of the speech signal are mapped into
a codebook (a fancy name for selecting one of a set of acoustic classifications)
– Separate HMMs are developed for every phonetic unit• For instance, we might have a /d/ HMM and an /i/ HMM, etc• There may be multiple paths through a single HMM to allow for
differences in duration caused by co-articulation and other effectsHere, the speech problemis one of working throughseveral layers of HMMs,at the lowest level are thephonetic HMMs, which combine to make up wordHMMs which combine tomake up grammar HMMs
Discrete Speech Using HMMs• Here, digits spoken one
at a time are recognized• The HMM word model
for a digit consists of 5 states any of which can repeat– each model is trained
before being used to adjust transition probabilities to the speaker
• The process is simple, giving the LPC, work through each word model and find the one which yields the greatest probability using Viterbi
Continuous Speech with HMM• Many simplifications made for discrete speech do not work for
continuous speech– HMMs will have to model smaller units, possibly phonemes– To reduce the search space, use a beam search– To ease the word-to-word transitions, use bigrams or unigrams
A trained Phoneme HMM for /d/
The process is similar to the previous slide where all phoneme HMMs are searched using Viterbi, but here, transition probabilities are included along with more codebooks to handle the phoneme-to-phoneme transitions and word-to-word transitions
A successful path through the phoneme HMMs tomatch the word “seeks”
HMM Codebook• HMMs need to have an observable state • For the speech problem, the observable is a speech signal – this
needs to be converted to a state• The LPC coefficients are mapped to a codebook using a nearest-
neighbor type of search• More
codebooks mean larger HMMs (more observable states to model) but the more accurate the nearest-neighbor matching will be HMM systems will commonly use at least 256 codebooks
HMM/NN Hybrid• The strength of the NN is in its low level recognition ability• The strength of the HMM is in its matching ability of the LPC
values to a codebook and selecting the right phoneme• Why not combine them?
Here, a neural network is trained and used to determinethe classification of the framesrather than a matching codebook-- thus, the system can learn tomatch better the acoustic information to a phoneticclassification
Phonetic classifications aregathered together into an arrayand mapped to the HMMs
Outstanding Problems• Most current solutions are stochastic or neural network and
therefore exclude potentially useful symbolic knowledge which might otherwise aid during the recognition problem (e.g., semantics, discourse, pragmatics)
• Speech segmentation – dividing the continuous speech into individual words
• Selection of the proper phonetic units – we have seen phonemes and words, but also common are diphones, demisyllables and other possibilities – speech science still has not determined which type of unit is the proper type of unit to model for speech recognition
• Handling intonation, stress, dialect, accent, etc• Dealing with very large vocabularies (currently, speech systems
recognize no more than a few thousand words, not an entire language)
• Accuracy is still too low for reliability (95-98% is common)
NLP• We mainly look at NLU here
– How can a machine “understand” natural language?– As we know, machines don’t understand, so the goal is to
transform a natural language input (speech or text) into an internal representation
• the system may be one that takes that representation and selects an action, or merely stores it
• for instance, if the NLU system is the front end of a database, then the goal is to form a DB query, submit it, and respond to the user with the DB results, or if it is the front end of an OS, the goal is to generate an OS command
• Research began in the 1940s and virtually died in the early 1950s until progress was made in the field of linguistics in the 1960s– Since then, a number of approaches have been tried:
• Symbolic• Stochastic/probabilistic• Neural network
NLPNLU on theleft – input text(or speech) andcome to a meaning bysyntactic parsing followed bysemantic analysis
NLG on the right– a planning problem – how tocreate a sentenceto convey a givenidea?
NLU Processes• Morphological analysis• Syntactic parsing
– identifying grammatical categories• top down parser• bottom up parser
• Semantic parsing– Identifying meaning
• template matching• alternatives are semantic grammars and using other forms of
representations• ambiguity handled by some form of word sense disambiguation
• Discourse analysis– Combining word meanings for a full sentence
• handling references• applying world knowledge – causality, expectations, inferences
• Pragmatic analysis– Speech acts, cognitive states, and beliefs, illocutionary acts
Parsing
Recursive Transition Network parser
Augmented Transition Network parser
An NLU Problem: Ambiguity• At the grammatical level
– words can take on multiple grammatical roles• “Our company is training workings” has 3 syntactic parses• “List the sales of the products produced in 1973 with the products
produced in 1972” has 455 parses!
• At the semantic level– words can take on multiple meanings even within one
grammatical category • consider the sentence “I made her duck”
• At the pragmatic and discourse levels– what is the meaning of “it sure is cold in here” – is this just a
statement of discomfort, or a request to make it warmer?– identifying the relationship for references “The chef made the
boy some stew. He ate it and thanked him” – who do the he and him refer to? What does it refer to?
– Some real US headlines with multiple interpretations:
Fun Headlines• Hospitals are Sued by 7 Foot Doctors• Astronaut Takes Blame for Gas in Spacecraft• New Study of Obesity Looks for Larger Test Group• Chef Throws His Heart into Helping Feed Needy• Include your Children when Baking Cookies
Stochastic Solutions Offer Promise• HMMs & bayesian probabilities are often used – how?
– in either case, build a corpora – the probability that a given word or a given word + transition to another word will have a specific meaning
– this requires obtaining statistics on word frequencies, collocation frequencies (phrases), and interpretation frequencies
• Problem: Zipf’s law – many words appear infrequently and therefore will have low probabilities
in stochastic models– rather than using the word count to determine the probability of a given
word, rank the words by frequency of occurrence and then use the position in the list to compute a probability
• frequency 1 / rank
• In addition, we need to filter collocations to remove common phrases like “and so”, “one of the”, etc, to obtain more reasonable frequency rankings – otherwise, collocation frequencies will be misleading toward very
common phrases
Other Solutions• Symbolic solutions include:
– Logic-based models for parsing and context-free grammars to generate parsers
• finite state automota such as RTNs and ATNs
– Parsing by dynamic programming– Knowledge representation approaches and case grammars (word models)
for syntactic and semantic parsing– Ad hoc knowledge-based approaches
• Neural network solutions include:– Parsing (combining
recurrent networks and self-organizing maps)
– Parsing relative clauses using recurrent networks
– Case role assignment– Word sense disambiguation
NLG, Machine Translation• NLG: given a concept to relate, translate it into a
legal statement– Like NLU, a mapping process, but this time in reverse
• much more straight forward than NLU because ambiguity is not present
• but there are many ways to say something, a good NLG will know its audience and select the proper words through register (audience context)
• a sophisticated NLG will use reference and possibly even parts of speech
• Machine Translation:– This is perhaps the hardest problem in NLP becomes it
must combine NLU and NLG– Simple word-to-word translation is insufficient– Meaning, references, idioms, etc must all be taken care of– Current MT systems are highly inaccurate
Application Areas• MS Word – spell checker/corrector, grammar checker,
thesaurus• WordNet• Search engines (more generically, information retrieval
including library searches)• Database front ends• Question-answering systems within restricted domains• Automated documentation generation• News categorization/summation• Information extraction• Machine translation
– for instance, web page translation• Language composition assistants – help non-native
speakers with the language• On-line dictionaries
Search Engine Technology• Search engines generally comprise three components
– Web crawler • simple, non-AI, traverser of web pages/sites • given web page, accumulate all URLs, add them to a queue or stack • retrieve next page given the URL from the queue (breadth-first) or stack
(depth-first/recursive)• convert material to text format when possible
– Summary extractor• take web page and summarize its content (possibly just create a bag of
words, possibly attempt some form of classification) – store summary, classification and URL in DB
– note: some engines only save summaries, others store summaries and the entire web page, but only use summaries during the search process
• create index of terms to web pages (possibly a hash table)– Search engine portal
• accept query• find all related items in the DB via hashing• sort using some form of rating scheme• display URLs, titles and possibly brief summaries
Page Categorization/Summaries• The tricky part of the search engine is to properly
categorize or summarize a web page– Information retrieval techniques are common
• Keywords from a bag of words• Statistical analysis to gauge similarities between pages• Link information such as page rank, hits, hubs, etc
– Filtering• Many web pages (e.g., stores) try to take advantage of the syntactic
nature of search engines and place meta tags in their pages that contain all English words
• Filtering is useful in eliminating pages that attempt such tricks
– Sorting• Once web pages have been found that match the given query, how are
they sorted?– using word count, giving extra credit if any of the words are found in the
page’s title or the link text, examine font size and style for importance of the words in the document, etc
Page Ranking• Based on the idea of academic citation to determine something’s
importance– PR(A) = (1 – d) + d * (PR(T1) / C(T1) + … + PR(Tn)/C(Tn))– PR(A) – page rank of page A– d – a “damping factor” between 0 and 1 (usually set to .85)– C(A) – number of links leaving page A– T1..Tn are the n pages that point at A
• The page rank corresponds to the principle eigenvector of a normalized “link matrix” (that is, a matrix of pages and their links)
• One way to view page rank is to think of an average web surfer who randomly is walking around pages by clicking on links (and never clicking the back button)– the page rank is in essence the probability that this page will be reached
randomly– the damping factor is the likelihood that the surfer will get bored at this
page and request another random page (rather than following a specific link of interest)
Google’s Architecture• There are numerous distributed
crawlers working all the time• Web pages are compressed
before being stored• Each page has a unique
document ID provided by the store server
• The indexer uncompresses files and parses them into word occurrences including the position in the document of the given word
• These word occurrences are stored in “barrels” to create an index of word-to-document mappings (using ISAM)
• The Sorter resorts the barrel information by word to create a reverse index
• The URL resolver converts relative URLs into absolute URLs
An Architecture for Improved Search• Search engines are limited in that they do not take
advantage of personalized knowledge– This is of course understandable given that search engines
have to support millions of users– Imagine that you had your own search engine DB, could you
provide user-specific knowledge to improve search?Three different techniques weretried using the architecture tothe left
A search engine provides anumber (millions?) of URLs,which are downloaded and keptlocally
User personalized knowledge isapplied to filter the items foundlocally in order to provide onlythose pages most likely to be of use
User Profiles
• The first approach merely created a bag of words annotated by their frequency
• Both the words and the frequency were derived by accumulating text in user stored files
• Word counts were summed based on which words appeared in the document – this score was used to order the retrieved pages
Partial user profile accumulated from text files:
debugging 1100 dec 1785 degree 4138 def 4938default 1349 define 2752defined 1140 department 2587 dept 1328 description 2691 design 1780 deskset 6472development 1403 different 2336 digital 1424 directory 2517disk 1907 distributed 5127
Word and frequency listed
User-Defined Knowledge Base
Concept Threshold Keywords(1)
Phrases (3) Groups OrAuthors
(2)
Expert Systems
6 Knowledge -baseLispRuleShellInference…
Knowledge based Systems”“Inference Engine”“KnowledgeEngineer”…
InferenceEngineKnowledgeBase
Intelligence 5 IntelligenceKnowledgeIQSkill…
“Gifted”“IQ Score”“IQ Test”…
BinetWechsler…
An alternative approach is toallow the user to define his/herown knowledge-base oftopics, people, phrases andkeywords that are of importance
Classification is done througha user-defined hierarchy whereeach concept has its ownmatching knowledge
Two concepts are shown to theleft – threshold means the number of matches required forthe category to establish
Learning-Driven Approach
• Learning what bag of words best represents a category by having viewers vote on which web pages are relevant for a given query and which are not
• Voting causes weights of the words in the bag to be altered• Implementations have included linear matrix forms, perceptrons and neural
networks– this approach can also be used for recommendation systems