application areas

Application Areas• For two lectures, we will examine a number of

application areas of AI research– Visual image understanding– Speech recognition– Natural language processing

• mostly we’ll look at NL understanding• but we will briefly talk about NL generation and machine translation

– Search engine technology

• Next time, we will cover the following topics and possibly others (specific topics to be determined)– Homeland security– Sensor interpretation– Robotic vehicles– AI in space

Perception Problems• Vision understanding, natural language processing and

speech recognition all have several things in common– Each problem is so complex that it must be solved through a

series of mappings from subproblem to subproblem– Each problem requires a great deal of knowledge that is not

necessarily available or well-understood such that successful applications often utilize non-knowledge-based mechanisms

– Each problem contains some degree of uncertainty, often implemented using HMMs or neural networks

• Early approaches in AI were symbolic and often suffered for several reasons– Poor run-time performance (because of the sheer amount of

knowledge needed and the slow processors of the time)– Models that were based on our incomplete knowledge of

human (or animal) vision, auditory abilities and language– Lack of learning so that knowledge acquisition was essential

• Research into all three areas has progressed, but slowly

Vision Understanding Mapping

• The vision problem is shown to the left as a series of mappings– Low level

processing and filtering to convert from analog to digital

– Pixels edges/lines

– Lines regions– Regions

surfaces– Add texture,

shading, contours– Surfaces

objects– Classify objects– Analyze scene (if

necessary)

A Few Details• Computer vision has been studied for decades

– There is no single solution to the problem– Each of the mappings has its own solution

• often mathematical, such as the edge detection and mapping from edges to regions

• often applies constraint satisfaction algorithms to reduce the amount of search or computation required for low level processes

– The “intelligence” part really comes in toward the end of the process

• Object classification• Scene analysis• Surface and object disambiguation (determine which object a

particular surface belongs, dealing with optical illusions)

• Computer vision is practically an entire CS discipline and so is beyond the scope of what we can cover here (sadly)

Edge Detection• Waltz created an algorithm for edge detection by

– finding junction points (intersections of lines)– determining the orientation of the lines into the junction points– applying constraint satisfaction to select which lines belong to

which surfaces• Below, convex edges are denoted with + and concave edges with -

For trihedral junction points (intersection of3 lines), these are the 18 legal connections

Other approaches may be necessary forcurve, contour or blob detection and analysis

Often, these approaches use such mathematicalmodels as eigen-models (eigenform, eigenface), quadratics or superquadratics, distance measures,closest point computations, etc

Vision Sub-Applications• Machine produced character recognition

– Solved satisfactorily through neural networks

• Hand-written character recognition– Many solutions: neural networks, genetic algorithms, HMMs,

Bayesian probabilities, nearest neighbor matching approaches, symbolic approaches

• printed character recognition highly accurate, cursive recognition greatly varies

• Face recognition– Many solutions, often the solutions attempt to map the face’s

contours and texture into mathematical equations and Gaussian distributions

• Image stabilization and image (object) tracking– Solutions include neural networks, fuzzy logic, best fit search

• UAV input – we’ll discuss UAVs next time

Two Approaches to Handwritten Character

Recognition

Neural Networks with voting

Symbolic approach using pattern matching

Speech Recognition• In spite of the fact that research began in earnest in

1970, speech recognition is a problem far from solved– Problems:

• Multispeaker – people speak at different frequencies• Continuous speech – the speech signal for a given sound is dependent

on the preceding and succeeding sounds, and in continuous speech, it is extremely hard to find the beginning of a new word, making the search process even more computationally difficult

• Large vocabularies – not only does this complicate the search process because there might be many words that match, but it also brings in ambiguity

• SR attempts– Knowledge based approaches (particularly Hearsay)– Neural networks– Hidden Markov Models– Hybrid approaches

The Task Pictorially

The speech signal is segmentedoverlapping segments are processedto create a small window of speech

Processing typically involves FFTand Linear Predictive Coding (LPC)analysis

This provides a series of energypatterns at different frequencies

Phonetic Dependence• Below are two wave forms created by uttering the same vowel

sound, “ee” as in three (on the left) and tea (on the right)– notice how dissimilar the “ee” portion is, in fact the one on the right is even

longer– this problem is caused because of co-articulation – one sound will directly

impact the next sound

Hearsay-II• Hearsay-II attempted to

solve the problem through symbolic problem solvers– Each called a knowledge

group– They would

communicate through a global mechanism called a blackboard

– Each KG knew what part of the BB to read from and where to post partial conclusions to

– A scheduler would use a complex algorithm to decide which KG should be invoked next based on priority of KGs and what knowledge was currently available on the BB

Hearsay could recognize 1000 words ofcontinuous speech and several speakers with a limited syntax with an accuracy ofaround 90%

Sample of Hearsay-II’s Blackboard

Sentence is “Are any by Feigenbaum and Feldman?”

One Neural Network Approach• One approach is to build a separate network for every

word known in the system– For a small vocabulary isolated word system, this approach

may work • no need to worry about finding the separation between words or the

effect that a word ending might have on the next word beginning

Notice that NN have fixed sized inputs

An input here will be the processed speech signal in the form of LPCs

Continuous Speech• The preceding solution

does not work for continuous speech– Since there is no easy way to

determine where one word ends and the next begins, we cannot just rely on word models

– Instead, we need phonetic models

– The problem here is that the sound of a phoneme is influenced by the preceding and succeeding sounds

• a neural network only learns a “snapshot” of data and what we need is context dependent

• One solution is to use a recurrent NN, which remembers the output of the previous input to provide context or memory– Note that the RNN is much more

difficult to train, but can solve the speech problem more effectively than the normal NN

Another Solution: Multiple Nets

Here, there are multipleneural networks forthe various levels

The segmentation module is responsiblefor dividing the continuous signalinto segments

The unit level Generates phonetic units

The word module combinespossible phonetic units to words

HMM Approach• The HMM approach views speech recognition as finding

a path through a graph of connected phonetic and/or grammar models, each of which is an HMM– In this case, the speech signal is the observable, and the unit of

speech uttered is the hidden state– Typically, several frames of the speech signal are mapped into

a codebook (a fancy name for selecting one of a set of acoustic classifications)

– Separate HMMs are developed for every phonetic unit• For instance, we might have a /d/ HMM and an /i/ HMM, etc• There may be multiple paths through a single HMM to allow for

differences in duration caused by co-articulation and other effectsHere, the speech problemis one of working throughseveral layers of HMMs,at the lowest level are thephonetic HMMs, which combine to make up wordHMMs which combine tomake up grammar HMMs

Discrete Speech Using HMMs• Here, digits spoken one

at a time are recognized• The HMM word model

for a digit consists of 5 states any of which can repeat– each model is trained

before being used to adjust transition probabilities to the speaker

• The process is simple, giving the LPC, work through each word model and find the one which yields the greatest probability using Viterbi

Continuous Speech with HMM• Many simplifications made for discrete speech do not work for

continuous speech– HMMs will have to model smaller units, possibly phonemes– To reduce the search space, use a beam search– To ease the word-to-word transitions, use bigrams or unigrams

A trained Phoneme HMM for /d/

The process is similar to the previous slide where all phoneme HMMs are searched using Viterbi, but here, transition probabilities are included along with more codebooks to handle the phoneme-to-phoneme transitions and word-to-word transitions

A successful path through the phoneme HMMs tomatch the word “seeks”

HMM Codebook• HMMs need to have an observable state • For the speech problem, the observable is a speech signal – this

needs to be converted to a state• The LPC coefficients are mapped to a codebook using a nearest-

neighbor type of search• More

codebooks mean larger HMMs (more observable states to model) but the more accurate the nearest-neighbor matching will be HMM systems will commonly use at least 256 codebooks

HMM/NN Hybrid• The strength of the NN is in its low level recognition ability• The strength of the HMM is in its matching ability of the LPC

values to a codebook and selecting the right phoneme• Why not combine them?

Here, a neural network is trained and used to determinethe classification of the framesrather than a matching codebook-- thus, the system can learn tomatch better the acoustic information to a phoneticclassification

Phonetic classifications aregathered together into an arrayand mapped to the HMMs

Outstanding Problems• Most current solutions are stochastic or neural network and

therefore exclude potentially useful symbolic knowledge which might otherwise aid during the recognition problem (e.g., semantics, discourse, pragmatics)

• Speech segmentation – dividing the continuous speech into individual words

• Selection of the proper phonetic units – we have seen phonemes and words, but also common are diphones, demisyllables and other possibilities – speech science still has not determined which type of unit is the proper type of unit to model for speech recognition

• Handling intonation, stress, dialect, accent, etc• Dealing with very large vocabularies (currently, speech systems

recognize no more than a few thousand words, not an entire language)

• Accuracy is still too low for reliability (95-98% is common)

NLP• We mainly look at NLU here

– How can a machine “understand” natural language?– As we know, machines don’t understand, so the goal is to

transform a natural language input (speech or text) into an internal representation

• the system may be one that takes that representation and selects an action, or merely stores it

• for instance, if the NLU system is the front end of a database, then the goal is to form a DB query, submit it, and respond to the user with the DB results, or if it is the front end of an OS, the goal is to generate an OS command

• Research began in the 1940s and virtually died in the early 1950s until progress was made in the field of linguistics in the 1960s– Since then, a number of approaches have been tried:

• Symbolic• Stochastic/probabilistic• Neural network

NLPNLU on theleft – input text(or speech) andcome to a meaning bysyntactic parsing followed bysemantic analysis

NLG on the right– a planning problem – how tocreate a sentenceto convey a givenidea?

NLU Processes• Morphological analysis• Syntactic parsing

– identifying grammatical categories• top down parser• bottom up parser

• Semantic parsing– Identifying meaning

• template matching• alternatives are semantic grammars and using other forms of

representations• ambiguity handled by some form of word sense disambiguation

• Discourse analysis– Combining word meanings for a full sentence

• handling references• applying world knowledge – causality, expectations, inferences

• Pragmatic analysis– Speech acts, cognitive states, and beliefs, illocutionary acts

Parsing

Recursive Transition Network parser

Augmented Transition Network parser

An NLU Problem: Ambiguity• At the grammatical level

– words can take on multiple grammatical roles• “Our company is training workings” has 3 syntactic parses• “List the sales of the products produced in 1973 with the products

produced in 1972” has 455 parses!

• At the semantic level– words can take on multiple meanings even within one

grammatical category • consider the sentence “I made her duck”

• At the pragmatic and discourse levels– what is the meaning of “it sure is cold in here” – is this just a

statement of discomfort, or a request to make it warmer?– identifying the relationship for references “The chef made the

boy some stew. He ate it and thanked him” – who do the he and him refer to? What does it refer to?

– Some real US headlines with multiple interpretations:

Fun Headlines• Hospitals are Sued by 7 Foot Doctors• Astronaut Takes Blame for Gas in Spacecraft• New Study of Obesity Looks for Larger Test Group• Chef Throws His Heart into Helping Feed Needy• Include your Children when Baking Cookies

Stochastic Solutions Offer Promise• HMMs & bayesian probabilities are often used – how?

– in either case, build a corpora – the probability that a given word or a given word + transition to another word will have a specific meaning

– this requires obtaining statistics on word frequencies, collocation frequencies (phrases), and interpretation frequencies

• Problem: Zipf’s law – many words appear infrequently and therefore will have low probabilities

in stochastic models– rather than using the word count to determine the probability of a given

word, rank the words by frequency of occurrence and then use the position in the list to compute a probability

• frequency 1 / rank

• In addition, we need to filter collocations to remove common phrases like “and so”, “one of the”, etc, to obtain more reasonable frequency rankings – otherwise, collocation frequencies will be misleading toward very

common phrases

Other Solutions• Symbolic solutions include:

– Logic-based models for parsing and context-free grammars to generate parsers

• finite state automota such as RTNs and ATNs

– Parsing by dynamic programming– Knowledge representation approaches and case grammars (word models)

for syntactic and semantic parsing– Ad hoc knowledge-based approaches

• Neural network solutions include:– Parsing (combining

recurrent networks and self-organizing maps)

– Parsing relative clauses using recurrent networks

– Case role assignment– Word sense disambiguation

NLG, Machine Translation• NLG: given a concept to relate, translate it into a

legal statement– Like NLU, a mapping process, but this time in reverse

• much more straight forward than NLU because ambiguity is not present

• but there are many ways to say something, a good NLG will know its audience and select the proper words through register (audience context)

• a sophisticated NLG will use reference and possibly even parts of speech

• Machine Translation:– This is perhaps the hardest problem in NLP becomes it

must combine NLU and NLG– Simple word-to-word translation is insufficient– Meaning, references, idioms, etc must all be taken care of– Current MT systems are highly inaccurate

Application Areas• MS Word – spell checker/corrector, grammar checker,

thesaurus• WordNet• Search engines (more generically, information retrieval

including library searches)• Database front ends• Question-answering systems within restricted domains• Automated documentation generation• News categorization/summation• Information extraction• Machine translation

– for instance, web page translation• Language composition assistants – help non-native

speakers with the language• On-line dictionaries

Search Engine Technology• Search engines generally comprise three components

– Web crawler • simple, non-AI, traverser of web pages/sites • given web page, accumulate all URLs, add them to a queue or stack • retrieve next page given the URL from the queue (breadth-first) or stack

(depth-first/recursive)• convert material to text format when possible

– Summary extractor• take web page and summarize its content (possibly just create a bag of

words, possibly attempt some form of classification) – store summary, classification and URL in DB

– note: some engines only save summaries, others store summaries and the entire web page, but only use summaries during the search process

• create index of terms to web pages (possibly a hash table)– Search engine portal

• accept query• find all related items in the DB via hashing• sort using some form of rating scheme• display URLs, titles and possibly brief summaries

Page Categorization/Summaries• The tricky part of the search engine is to properly

categorize or summarize a web page– Information retrieval techniques are common

• Keywords from a bag of words• Statistical analysis to gauge similarities between pages• Link information such as page rank, hits, hubs, etc

– Filtering• Many web pages (e.g., stores) try to take advantage of the syntactic

nature of search engines and place meta tags in their pages that contain all English words

• Filtering is useful in eliminating pages that attempt such tricks

– Sorting• Once web pages have been found that match the given query, how are

they sorted?– using word count, giving extra credit if any of the words are found in the

page’s title or the link text, examine font size and style for importance of the words in the document, etc

Page Ranking• Based on the idea of academic citation to determine something’s

importance– PR(A) = (1 – d) + d * (PR(T1) / C(T1) + … + PR(Tn)/C(Tn))– PR(A) – page rank of page A– d – a “damping factor” between 0 and 1 (usually set to .85)– C(A) – number of links leaving page A– T1..Tn are the n pages that point at A

• The page rank corresponds to the principle eigenvector of a normalized “link matrix” (that is, a matrix of pages and their links)

• One way to view page rank is to think of an average web surfer who randomly is walking around pages by clicking on links (and never clicking the back button)– the page rank is in essence the probability that this page will be reached

randomly– the damping factor is the likelihood that the surfer will get bored at this

page and request another random page (rather than following a specific link of interest)

Google’s Architecture• There are numerous distributed

crawlers working all the time• Web pages are compressed

before being stored• Each page has a unique

document ID provided by the store server

• The indexer uncompresses files and parses them into word occurrences including the position in the document of the given word

• These word occurrences are stored in “barrels” to create an index of word-to-document mappings (using ISAM)

• The Sorter resorts the barrel information by word to create a reverse index

• The URL resolver converts relative URLs into absolute URLs

An Architecture for Improved Search• Search engines are limited in that they do not take

advantage of personalized knowledge– This is of course understandable given that search engines

have to support millions of users– Imagine that you had your own search engine DB, could you

provide user-specific knowledge to improve search?Three different techniques weretried using the architecture tothe left

A search engine provides anumber (millions?) of URLs,which are downloaded and keptlocally

User personalized knowledge isapplied to filter the items foundlocally in order to provide onlythose pages most likely to be of use

User Profiles

• The first approach merely created a bag of words annotated by their frequency

• Both the words and the frequency were derived by accumulating text in user stored files

• Word counts were summed based on which words appeared in the document – this score was used to order the retrieved pages

Partial user profile accumulated from text files:

debugging 1100 dec 1785 degree 4138 def 4938default 1349 define 2752defined 1140 department 2587 dept 1328 description 2691 design 1780 deskset 6472development 1403 different 2336 digital 1424 directory 2517disk 1907 distributed 5127

Word and frequency listed

User-Defined Knowledge Base

Concept Threshold Keywords(1)

Phrases (3) Groups OrAuthors

(2)

Expert Systems

6 Knowledge -baseLispRuleShellInference…

Knowledge based Systems”“Inference Engine”“KnowledgeEngineer”…

InferenceEngineKnowledgeBase

Intelligence 5 IntelligenceKnowledgeIQSkill…

“Gifted”“IQ Score”“IQ Test”…

BinetWechsler…

An alternative approach is toallow the user to define his/herown knowledge-base oftopics, people, phrases andkeywords that are of importance

Classification is done througha user-defined hierarchy whereeach concept has its ownmatching knowledge

Two concepts are shown to theleft – threshold means the number of matches required forthe category to establish

Learning-Driven Approach

• Learning what bag of words best represents a category by having viewers vote on which web pages are relevant for a given query and which are not

• Voting causes weights of the words in the bag to be altered• Implementations have included linear matrix forms, perceptrons and neural

networks– this approach can also be used for recommendation systems

application areas

Documents

mappingthe vision problem

knowledge acquisition

animal vision

detailscomputer vision

subproblemeach problem

commoneach problem

incomplete knowledge

great deal of knowledge