Download - Neural Networks for Classification
![Page 1: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/1.jpg)
1 / 40
Neural Networks for Classification
Andrei Alexandrescu
June 19, 2007
![Page 2: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/2.jpg)
Introduction
IntroductionNeural Networks:History
What is a NeuralNetwork?Examples of NeuralNetworksElements of aNeural Network
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
2 / 40
![Page 3: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/3.jpg)
Neural Networks: History
IntroductionNeural Networks:History
What is a NeuralNetwork?Examples of NeuralNetworksElements of aNeural Network
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
3 / 40
■ Modeled after the human brain■ Experimentation and marketing predated
theory■ Considered the forefront of the AI spring
Suffered from the AI winter■ Theory today still not fully developed and
understood
![Page 4: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/4.jpg)
What is a Neural Network?
IntroductionNeural Networks:History
What is a NeuralNetwork?Examples of NeuralNetworksElements of aNeural Network
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
4 / 40
■ Essentially:A network of interconnectedfunctional elementseach with several inputs/one output
y(x1, . . . , xn) = f(w1x1 +w2x2 + . . .+wnxn)(1)
■ wi are parameters■ f is the activation function■ Crucial for learning that addition is used
for integrating the inputs
![Page 5: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/5.jpg)
Examples of Neural Networks
IntroductionNeural Networks:History
What is a NeuralNetwork?Examples of NeuralNetworksElements of aNeural Network
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
5 / 40
■ Logical functions with 0/1 inputs andoutputs
■ Fourier series:
F (x) =∑
i≥0
(ai cos(ix) + bi sin(ix)) (2)
■ Taylor series:
F (x) =∑
i≥0
ai(x − x0)i (3)
■ Automata
![Page 6: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/6.jpg)
Elements of a Neural Network
IntroductionNeural Networks:History
What is a NeuralNetwork?Examples of NeuralNetworksElements of aNeural Network
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
6 / 40
■ The function performed by an element■ The topology of the network■ The method used to train the weights
![Page 7: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/7.jpg)
Single-Layer Perceptrons
Introduction
Single-LayerPerceptrons
The Perceptron
PerceptronCapabilities
BiasTraining thePerceptron
Algorithm
Summary of SimplePerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
7 / 40
![Page 8: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/8.jpg)
The Perceptron
Introduction
Single-LayerPerceptrons
The Perceptron
PerceptronCapabilities
BiasTraining thePerceptron
Algorithm
Summary of SimplePerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
8 / 40
■ n inputs, one output:
y(x1, . . . , xn) = f(w1x1 + . . . + wnxn)(4)
■ Oldest activation function(McCulloch/Pitts):
f(v) = 1x≥0(v) (5)
![Page 9: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/9.jpg)
Perceptron Capabilities
Introduction
Single-LayerPerceptrons
The Perceptron
PerceptronCapabilities
BiasTraining thePerceptron
Algorithm
Summary of SimplePerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
9 / 40
■ Advertised to be as extensive as the brainitself
■ Can (only) distinguish between twolinearly-separable sets
■ Smallest undecidable function: XOR■ Minsky’s proof started the AI winter■ It was not fully understood what
connected layers could do
![Page 10: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/10.jpg)
Bias
Introduction
Single-LayerPerceptrons
The Perceptron
PerceptronCapabilities
BiasTraining thePerceptron
Algorithm
Summary of SimplePerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
10 / 40
■ Notice that the decision hyperplane mustgo through the origin
■ Could be achieved by preprocessing theinput
■ Not always desirable or possible■ Add a bias input:
y(x1, . . . , xn) = f(w0+w1x1+. . .+wnxn)(6)
■ Same as an input connected to theconstant 1
■ We consider that ghost input implicithenceforth
![Page 11: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/11.jpg)
Training the Perceptron
Introduction
Single-LayerPerceptrons
The Perceptron
PerceptronCapabilities
BiasTraining thePerceptron
Algorithm
Summary of SimplePerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
11 / 40
■ Switch to vector notation:
y(x) = f(wx) = fw(x) (7)
■ Assume we need to separate sets ofpoints A and B.
E(w) =∑
x∈A
(1−fw(x))+∑
x∈B
fw(x) (8)
■ Goal: E(w) = 0■ Start from a random w and improve it
![Page 12: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/12.jpg)
Algorithm
Introduction
Single-LayerPerceptrons
The Perceptron
PerceptronCapabilities
BiasTraining thePerceptron
Algorithm
Summary of SimplePerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
12 / 40
1. Start with random w, set t = 02. Select a vector x ∈ A ∪ B
3. If x ∈ A and wx ≤ 0, thenwt+1 = wt + x
4. Else if x ∈ B and wx ≥ 0, thenwt+1 = wt − x
5. Conditionally go to step 2
■ Guaranteed to converge iff A and B arelinearly separable!
![Page 13: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/13.jpg)
Summary of Simple Perceptrons
Introduction
Single-LayerPerceptrons
The Perceptron
PerceptronCapabilities
BiasTraining thePerceptron
Algorithm
Summary of SimplePerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
13 / 40
■ Simple training■ Limited capabilities■ Reasonably efficient training
Simplex, linear programming arebetter
![Page 14: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/14.jpg)
Multi-Layer Perceptrons
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions14 / 40
![Page 15: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/15.jpg)
Multi-Layer Perceptrons
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions15 / 40
■ Let’s connect the output of a perceptronto the input of another
■ What can we compute with thishorizontal combination?
■ (We already take vertical combination forgranted)
![Page 16: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/16.jpg)
A Misunderstanding of Epic
Proportions
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions16 / 40
■ Some say “two-layered” network
◆ Two cascaded layers ofcomputational units
■ Some say “three-layered” network
◆ There is one extra input layer thatdoes nothing
■ Let’s arbitrarily choose “three-layered”
◆ Input◆ Hidden◆ Output
![Page 17: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/17.jpg)
Workings
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions17 / 40
■ The hidden layer maps inputs into asecond space: “feature space,”“classification space”
■ This makes the job of the output layereasier
![Page 18: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/18.jpg)
Capabilities
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions18 / 40
■ Each hidden unit computes a linearseparation of the input space
■ Several hidden units can carve a polytopein the input space
■ Output units can distinguish polytopemembership
⇓
Any union of polytopes can be decided
![Page 19: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/19.jpg)
Training Prerequisite
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions19 / 40
■ The step function bad for gradientdescent techniques
■ Replace with a smooth step function:
f(v) =1
1 + e−v(9)
■ Notable fact:f ′(v) = f(v)(1 − f(v))
■ Makes the function cycles-friendly
![Page 20: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/20.jpg)
Output Activation
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions20 / 40
■ Simple binarydiscrimination—zero-centered sigmoid:
f(v) =1 − e−v
1 + e−v(10)
■ Probability distribution—softmax:
f(vi) =evi
∑
j
evj(11)
![Page 21: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/21.jpg)
The Backpropagation Algorithm
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions21 / 40
■ Works on any differentiable activationfunction
■ Gradient descent in weight space■ Metaphor: a ball rolls on the error
function’s envelope■ Condition: no flat portion■ Ball would stop in indifferent equilibrium■ Some add a slight pull term:
f(v) =1 − e−v
1 + e−v+ cv (12)
![Page 22: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/22.jpg)
The Task
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions22 / 40
■ Minimize error function:
E =1
2
p∑
i=1
‖oi − ti‖2 (13)
where:
◆ oi actual outputs◆ ti desired outputs◆ p number of patterns
![Page 23: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/23.jpg)
Training. The Delta Rule
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions23 / 40
■ Compute ∇E =(
∂E∂w1
, . . . , ∂E∂wl
)
■ Update weights:
∆wi = −γ∂E
∂wi
i = 1, . . . , l (14)
■ Expect to find a point ∇E = 0■ Algorithm for computing ∇E:
backpropagation■ Beyond the scope of this class
![Page 24: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/24.jpg)
Gradient Locality
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions24 / 40
■ Only summation guarantees locality ofbackpropagation
■ Otherwise backpropagation wouldpropagate errors due to one input to allinputs
■ Essential to use summation as inputintegration!
![Page 25: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/25.jpg)
Regularization
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions25 / 40
■ Weights can grow uncontrollably■ Add a regularization term that opposes
weight growth
∆wi = −γ∂E
∂wi
− αwi (15)
■ Very important practical trick■ Also avoids overspecialization■ Forces a smoother output
![Page 26: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/26.jpg)
Local Minima
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
Multi-LayerPerceptrons
A Misunderstandingof Epic Proportions
Workings
Capabilities
Training Prerequisite
Output Activation
TheBackpropagationAlgorithm
The TaskTraining. The DeltaRule
Gradient Locality
Regularization
Local Minima
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions26 / 40
■ The gradient surf can stop in a localminimum
■ Biggest issue with neural networks■ Overspecialization second biggest■ Convergence not guaranteed either, but
regularization helps
![Page 27: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/27.jpg)
Accommodating Discrete
Inputs
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Discrete Inputs
One-Hot Encoding
Optimizing One-HotEncoding
One-Hot Encoding:Interesting Tidbits
Outputs
NLP Applications
Conclusions
27 / 40
![Page 28: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/28.jpg)
Discrete Inputs
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Discrete Inputs
One-Hot Encoding
Optimizing One-HotEncoding
One-Hot Encoding:Interesting Tidbits
Outputs
NLP Applications
Conclusions
28 / 40
■ Many NLP applications foster discretefeatures
■ Neural nets expect real numbers■ Smooth: similar outputs for similar inputs
■ Any two discrete inputs are “just asdifferent”
■ Treating them as integral numbersundemocratic
![Page 29: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/29.jpg)
One-Hot Encoding
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Discrete Inputs
One-Hot Encoding
Optimizing One-HotEncoding
One-Hot Encoding:Interesting Tidbits
Outputs
NLP Applications
Conclusions
29 / 40
■ One discrete feature with n values → n
real inputs■ The ith feature value sets the ith input to
1 and others to 0■ The Hamming distance between any two
distinct inputs is now constant!■ Disadvantage: input vector size much
larger
![Page 30: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/30.jpg)
Optimizing One-Hot Encoding
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Discrete Inputs
One-Hot Encoding
Optimizing One-HotEncoding
One-Hot Encoding:Interesting Tidbits
Outputs
NLP Applications
Conclusions
30 / 40
■ Each hidden unit has all inputs zeroexcept the ith one
■ Even that one is just multiplied by 1■ Regroup weights by discrete input, not by
hidden unit!■ Matrix w of size n × l
■ Input i just copies row i to the output(virtual multiplication by 1)
■ Cheap computation■ Delta rule applies as usual
![Page 31: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/31.jpg)
One-Hot Encoding: Interesting
Tidbits
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Discrete Inputs
One-Hot Encoding
Optimizing One-HotEncoding
One-Hot Encoding:Interesting Tidbits
Outputs
NLP Applications
Conclusions
31 / 40
■ The row wi is a continuousrepresentation of discrete feature i
■ Only one row trained per sample■ The size of the continuous representation
can be chosen depending on the feature’scomplexity
■ Mix this continuous representation freelywith “truly” continuous features, such asacoustic features
![Page 32: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/32.jpg)
Outputs
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
Multi-LabelClassification
Soft Training
NLP Applications
Conclusions
32 / 40
![Page 33: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/33.jpg)
Multi-Label Classification
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
Multi-LabelClassification
Soft Training
NLP Applications
Conclusions
33 / 40
■ n real outputs summing to 1■ Normalization included in the softmax
function:
f(vi) =evi
∑
j
evj=
evi−vmax
∑
j
evj−vmax(16)
■ Train with 1 − ǫ for the known label, ǫn−1
for all others (avoids saturation)
![Page 34: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/34.jpg)
Soft Training
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
Multi-LabelClassification
Soft Training
NLP Applications
Conclusions
34 / 40
■ Maybe the targets are known probabilitydistribution
■ Or want to reduce the number of trainingcycles
■ Train with actual desired distributions asdesired outputs
■ Example: for feature vector x, labels l1,l2, l3 are possible with equal probability
■ Train with 1−ǫ3
for the three, ǫn−3
for allothers
![Page 35: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/35.jpg)
NLP Applications
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Language Modeling
Lexicon Learning
Word SenseDisambiguation
Conclusions
35 / 40
![Page 36: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/36.jpg)
Language Modeling
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Language Modeling
Lexicon Learning
Word SenseDisambiguation
Conclusions
36 / 40
■ Input: n-gram context■ May include arbitrary word features
(cool!!!)■ Output: probability distribution of next
word■ Automatically figures which features are
important
![Page 37: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/37.jpg)
Lexicon Learning
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Language Modeling
Lexicon Learning
Word SenseDisambiguation
Conclusions
37 / 40
■ Input: Word-level features (root, stem,morph)
■ Input: Most frequent previous/nextwords
■ Output: Probability distribution of theword’s possible POSs
![Page 38: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/38.jpg)
Word Sense Disambiguation
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Language Modeling
Lexicon Learning
Word SenseDisambiguation
Conclusions
38 / 40
■ Input: bag of words in context, localcollocations
■ Output: Probability distribution oversenses
![Page 39: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/39.jpg)
Conclusions
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
Conclusions
39 / 40
![Page 40: Neural Networks for Classification](https://reader035.vdocuments.us/reader035/viewer/2022071613/6157338a0c53a5559b2b7298/html5/thumbnails/40.jpg)
Conclusions
Introduction
Single-LayerPerceptrons
Multi-LayerPerceptrons
AccommodatingDiscrete Inputs
Outputs
NLP Applications
Conclusions
Conclusions
40 / 40
■ Neural nets respectable machine learningtechnique
■ Theory not fully developed■ Local optima and overspecialization are
killers■ Yet can learn very complex functions■ Long training time■ Short testing time■ Small memory requirements