for wednesday read chapter 22, sections 4-6 homework: –chapter 18, exercise 7
TRANSCRIPT
![Page 1: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/1.jpg)
For Wednesday
• Read chapter 22, sections 4-6
• Homework:– Chapter 18, exercise 7
![Page 2: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/2.jpg)
Program 4
• Any questions?
![Page 3: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/3.jpg)
Model Neuron(Linear Threshold Unit)
• Neuron modelled by a unit (j) connected by weights, wji, to other units (i):
• Net input to a unit is defined as:
netj = wji * oi
• Output of a unit is a threshold function on the net input:– 1 if netj > Tj
– 0 otherwise
![Page 4: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/4.jpg)
Multi Layer Neural Networks• Multi layer networks can represent arbitrary functions, but
building an effective learning method for such networks was thought to be difficult.
• Generally networks are composed of an input layer, hidden layer, and output layer and activation feeds forward from input to output.
• Patterns of activation are presented at the inputs and the resulting activation of the outputs is computed.
• The values of the weights determine the function computed. • A network with one hidden layer with a sufficient number of
units can represent any boolean function.
![Page 5: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/5.jpg)
Basic Problem
• General approach to the learning algorithm is to apply gradient descent.
• However, for the general case, we need to be able to differentiate the function computed by a unit and the standard threshold function is not differentiable at the threshold.
![Page 6: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/6.jpg)
Differentiable Threshold Unit
• Need some sort of non linear output function to allow computation of arbitary functions by mulit layer networks (a multi layer network of linear units can still only represent a linear function).
• Solution: Use a nonlinear, differentiable output function such as the sigmoid or logistic function
oj = 1/(1 + e-(netj - Tj) )
• Can also use other functions such as tanh or a Gaussian.
![Page 7: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/7.jpg)
Error Measure
• Since there are mulitple continuous outputs, we can define an overall error measure:
E(W) = 1/2 *( (tkd - okd)2) dD kK
where D is the set of training examples, K is the set of output units, tkd is the target output for the kth unit given input d, and okd is network output for the kth unit given input d.
![Page 8: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/8.jpg)
Gradient Descent
• The derivative of the output of a sigmoid unit given the net input is
oj/ netj = oj(1 - oj)
• This can be used to derive a learning rule which performs gradient descent in weight space in an attempt to minimize the error function.
wji = -(E / wji)
![Page 9: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/9.jpg)
Backpropogation Learning Rule• Each weight wji is changed by
wji = joi
j = oj (1 - oj) (tj - oj) if j is an output unit
j = oj (1 - oj) k wkj otherwise where is a constant called the learning rate, tj is the correct output for unit j, dj is an error measure for unit j.
• First determine the error for the output units, then backpropagate this error layer by layer through the network, changing weights appropriately at each layer.
![Page 10: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/10.jpg)
Backpropogation Learning Algorithm• Create a three layer network with N hidden units and fully connect input
units to hidden units and hidden units to output units with small random weights.
Until all examples produce the correct output within e or the mean squared error ceases to decrease (or other termination criteria):
Begin epoch
For each example in training set do:
Compute the network output for this example.
Compute the error between this output and the correct output.
Backpropagate this error and adjust weights to decrease this error.
End epoch
• Since continuous outputs only approach 0 or 1 in the limit, must allow for some e approximation to learn binary functions.
![Page 11: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/11.jpg)
Comments on Training• There is no guarantee of convergence, may oscillate or reach
a local minima. • However, in practice many large networks can be adequately
trained on large amounts of data for realistic problems. • Many epochs (thousands) may be needed for adequate
training, large data sets may require hours or days of CPU time.
• Termination criteria can be: – Fixed number of epochs – Threshold on training set error
![Page 12: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/12.jpg)
Representational PowerMulti layer sigmoidal networks are very expressive. • Boolean functions: Any Boolean function can be represented by a
two layer network by simulating a two layer AND OR network. But number of required hidden units can grow exponentially in the number of inputs.
• Continuous functions: Any bounded continuous function can be approximated with arbitrarily small error by a two layer network. Sigmoid functions provide a set of basis functions from which arbitrary functions can be composed, just as any function can be represented by a sum of sine waves in Fourier analysis.
• Arbitrary functions: Any function can be approximated to arbitarary accuracy by a three layer network.
![Page 13: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/13.jpg)
Sample Learned XOR Network
Hidden unit A represents ¬(X Y)
Hidden unit B represents ¬(X Y)
Output O represents: A ¬B
¬(X Y) (X Y)
X Y
A B
X Y
3.11
6.96 -7.38
-5.24 -2.03
-5.57-3.6
-3.58
-5.74
![Page 14: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/14.jpg)
Hidden Unit Representations
• Trained hidden units can be seen as newly constructed features that re represent the examples so that they are linearly separable.
• On many real problems, hidden units can end up representing interesting recognizable features such as vowel detectors, edge detectors, etc.
• However, particularly with many hidden units, they become more “distributed” and are hard to interpret.
![Page 15: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/15.jpg)
Input/Output Coding
• Appropriate coding of inputs and outputs can make learning problem easier and improve generalization.
• Best to encode each binary feature as a separate input unit and for multi valued features include one binary unit per value rather than trying to encode input information in fewer units using binary coding or continuous values.
![Page 16: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/16.jpg)
I/O Coding cont.
• Continuous inputs can be handled by a single input by scaling them between 0 and 1.
• For disjoint categorization problems, best to have one output unit per category rather than encoding n categories into log n bits. Continuous output values then represent certainty in various categories. Assign test cases to the category with the highest output.
• Continuous outputs (regression) can also be handled by scaling between 0 and 1.
![Page 17: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/17.jpg)
Neural Net Conclusions• Learned concepts can be represented by networks of linear
threshold units and trained using gradient descent. • Analogy to the brain and numerous successful
applications have generated significant interest. • Generally much slower to train than other learning
methods, but exploring a rich hypothesis space that seems to work well in many domains.
• Potential to model biological and cognitive phenomenon and increase our understanding of real neural systems. – Backprop itself is not very biologically plausible
![Page 18: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/18.jpg)
Natural Language Processing
• What’s the goal?
![Page 19: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/19.jpg)
Communication
• Communication for the speaker: – Intention: Decided why, when, and what
information should be transmitted. May require planning and reasoning about agents' goals and beliefs.
– Generation: Translating the information to be communicated into a string of words.
– Synthesis: Output of string in desired modality, e.g.text on a screen or speech.
![Page 20: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/20.jpg)
Communication (cont.)• Communication for the hearer:
– Perception: Mapping input modality to a string of words, e.g. optical character recognition or speech recognition.
– Analysis: Determining the information content of the string. • Syntactic interpretation (parsing): Find correct parse tree showing the
phrase structure • Semantic interpretation: Extract (literal) meaning of the string in some
representation, e.g. FOPC. • Pragmatic interpretation: Consider effect of overall context on the
meaning of the sentence
– Incorporation: Decide whether or not to believe the content of the string and add it to the KB.
![Page 21: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/21.jpg)
Ambiguity
• Natural language sentences are highly ambiguous and must be disambiguated. I saw the man on the hill with the telescope.
I saw the Grand Canyon flying to LA.
I saw a jet flying to LA.
Time flies like an arrow.
Horse flies like a sugar cube.
Time runners like a coach.
Time cars like a Porsche.
![Page 22: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/22.jpg)
Syntax
• Syntax concerns the proper ordering of words and its effect on meaning.
The dog bit the boy.
The boy bit the dog.
* Bit boy the dog the
Colorless green ideas sleep furiously.
![Page 23: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/23.jpg)
Semantics
• Semantics concerns of meaning of words, phrases, and sentences. Generally restricted to “literal meaning” – “plant” as a photosynthetic organism – “plant” as a manufacturing facility – “plant” as the act of sowing
![Page 24: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/24.jpg)
Pragmatics
• Pragmatics concerns the overall commuinicative and social context and its effect on interpretation. – Can you pass the salt? – Passerby: Does your dog bite?
Clouseau: No. Passerby: (pets dog) Chomp!
I thought you said your dog didn't bite!! Clouseau:That, sir, is not my dog!
![Page 25: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/25.jpg)
Modular Processing
acoustic/ phonetic
syntax semantics pragmatics
Speech recognition Parsing
Sound waves
words Parse trees
literal meaning
meaning
![Page 26: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/26.jpg)
Examples
• Phonetics “grey twine” vs. “great wine”
“youth in Asia” vs. “euthanasia”
“yawanna” > “do you want to”
• Syntax I ate spaghetti with a fork.
I ate spaghetti with meatballs.
![Page 27: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/27.jpg)
More Examples
• Semantics I put the plant in the window.
Ford put the plant in Mexico.
The dog is in the pen.
The ink is in the pen.
• Pragmatics The ham sandwich wants another beer.
John thinks vanilla.
![Page 28: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/28.jpg)
Formal Grammars• A grammar is a set of production rules which
generates a set of strings (a language) by rewriting the top symbol S.
• Nonterminal symbols are intermediate results that are not contained in strings of the language.
S > NP VP
NP > Det N
VP > V NP
![Page 29: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/29.jpg)
• Terminal symbols are the final symbols (words) that compose the strings in the language.
• Production rules for generating words from part of speech categories constitute the lexicon.
• N > boy
• V > eat
![Page 30: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/30.jpg)
Context-Free Grammars
• A context free grammar only has productions with a single symbol on the left hand side.
• CFG: S > NP VNP > Det NVP > V NP
• not CFG: A B > CB C > F G
![Page 31: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/31.jpg)
Simplified English GrammarS > NP VP S > VP NP > Det Adj* N NP > ProN NP > PName VP > V VP > V NP VP > VP PP PP > Prep NP Adj* > e Adj* > Adj Adj*
Lexicon:
ProN > I; ProN > you; ProN > he; ProN > she Name > John; Name > Mary Adj > big; Adj > little; Adj > blue; Adj > red Det > the; Det > a; Det > an N > man; N > telescope; N > hill; N > saw Prep > with; Prep > for; Prep > of; Prep > in V > hit; V > took; V > saw; V > likes
![Page 32: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/32.jpg)
Parse Trees
• A parse tree shows the derivation of a sentence in the language from the start symbol to the terminal symbols.
• If a given sentence has more than one possible derivation (parse tree), it is said to be syntactically ambiguous.
![Page 33: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/33.jpg)
![Page 34: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/34.jpg)
![Page 35: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/35.jpg)
Syntactic Parsing
• Given a string of words, determine if it is grammatical, i.e. if it can be derived from a particular grammar.
• The derivation itself may also be of interest.
• Normally want to determine all possible parse trees and then use semantics and pragmatics to eliminate spurious parses and build a semantic representation.
![Page 36: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/36.jpg)
Parsing Complexity
• Problem: Many sentences have many parses.
• An English sentence with n prepositional phrases at the end has at least 2n parses.
I saw the man on the hill with a telescope on Tuesday in Austin...
• The actual number of parses is given by the Catalan numbers: 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796...
![Page 37: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/37.jpg)
Parsing Algorithms • Top Down: Search the space of possible derivations of S
(e.g.depth first) for one that matches the input sentence. I saw the man. S > NP VP
NP > Det Adj* N Det > the Det > a Det > an
NP > ProN ProN > I
VP > V NP V > hit V > took V > saw NP > Det Adj* N
Det > the Adj* > e N > man
![Page 38: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/38.jpg)
Parsing Algorithms (cont.)• Bottom Up: Search upward from words finding larger
and larger phrases until a sentence is found. I saw the man. ProN saw the man ProN > I NP saw the man NP > ProN NP N the man N > saw (dead end) NP V the man V > saw NP V Det man Det > the NP V Det Adj* man Adj* > e NP V Det Adj* N N > man NP V NP NP > Det Adj* N NP VP VP > V NP S S > NP VP
![Page 39: For Wednesday Read chapter 22, sections 4-6 Homework: –Chapter 18, exercise 7](https://reader035.vdocuments.us/reader035/viewer/2022062722/56649f305503460f94c4a458/html5/thumbnails/39.jpg)
Bottom up Parsing Algorithmfunction BOTTOM UP PARSE(words, grammar) returns a parse tree
forest words
loop do
if LENGTH(forest) = 1 and CATEGORY(forest[1]) = START(grammar) then
return forest[1]
else
i choose from {1...LENGTH(forest)}
rule choose from RULES(grammar)
n LENGTH(RULE RHS(rule))
subsequence SUBSEQUENCE(forest, i, i+n 1)
if MATCH(subsequence, RULE RHS(rule)) then
forest[i...i+n 1] / [MAKE NODE(RULE LHS(rule), subsequence)]
else fail
end