document classification using deep belief nets lawrence mcafee 6/9/08 cs224n, sprint ‘08
Post on 22-Dec-2015
213 views
TRANSCRIPT
Overview
• Corpus: Wikipedia XML Corpus• Single-labeled data – each document falls under single
category
• Binary Feature Vectors• Bag-of-words• ‘1’ indicates word occurred one or more times in document
Doc#1
Doc#1Doc#1
Doc#3Doc#2 Classifier
Doc#1
Food
Doc#2
Brazil
Doc#3President
s
Background on Deep Belief Nets
Training Data
RBM 1
RBM 2
RBM 3Higher level
features
Features/basis vectors for
training data
Very abstract features
RBM
• Unsupervised, clustering training algorithm
Inside an RBMhidden
i
j
visible Configuration (v,h)
GolfCycling
Energy
Input/Training data
• Goal in training RBM is to minimize energy of configurations corresponding to input data
• Train RBM by repeatedly sampling hidden and visible units for a given data input
Depth
• Binary representation does not capture word frequency information
• Inaccurate features learned at each level of DBN
0
20
40
60
80
100
120
0 2 4 6 8
Number of layers
Acc
ura
cy (
%)
straight
linear
Training Iterations
• Accuracy increases with more training iterations
• Increasing iterations may (partially) make up for learning poor features
0
10
20
30
40
50
60
70
80
90
100
0 2000 4000 6000 8000 10000 12000
Training iterations per layer
Acc
ura
cy (
%)
Configuration (v,h)
Lions Tigers
Configuration (v,h)
LionsTigers
Energy Energy
Comparison to SVM, NB
• Binary features do not provide good starting point for learning higher level features
• Binary still useful, as 22% is better than random• Time: DBN-2h,13m; SVM-4sec; NB-3sec
05
1015
2025
3035
4045
50
DBN (100K iters, 30categ)
SVM NB
Classifier
Acc
ura
cy (
%)
30 categories
Lowercasing
• Supposedly richer vocabulary when lowercasing• Overfitting: we don’t need these extra words
• Other experiments show only top 500 words relevant
0
10
20
30
40
50
60
70
80
90
0 500 1000 1500 2000 2500
Number of hidden neurons in top layer
Acc
ura
cy (
%)
low ercase
non-low ercase
Suggestions for Improvement
• Use appropriate continuous-valued neurons• Linear or Gaussian neurons• Slower to train• Not much documentation on using continuous-valued
neurons with RBMs
• Implement backpropagation to fine-tune weights and biases• Propagate error derivatives from top level RBM back
to inputs• Unsupervised training gives good initial weights,
while backpropagation slightly modifies weights/biases
• Backpropagation cannot be used alone, as it tends to get stuck in local optima