reducing the dimensionality of data with neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 reducing...
TRANSCRIPT
5/15/19
1
Reducing the Dimensionality of Data with Neural NetworksANDREA CASTRO
MAY 14, 2019
The curse of dimensionality• High dimensional data often has more features than observations• As more variables are added, it becomes more difficult to make
accurate predictions• Example: Finding a cell in a 2D petri dish vs. 3D beaker
2
25 cm2 125 cm3https://www.statisticshowto.datasciencecentral.com/dimensionality/
5/15/19
2
Reducing dimensionality
• Principle Components Analysis (PCA) • Finds directions of
greatest variance • Represents each data
point by coordinates along these directions
3
http://www.nlpca.org/pca_principal_component_analysis.html
Autoencoders
4
• Composed of encoder and decoder networks• Encoder: high-dimensional data -> low-
dimensional code• Decoder: recovers original data from low-
dimensional code
• Minimize discrepancy between input and output• Difficult to perform gradient descent
without well-initialized weights
5/15/19
3
Pretraining to optimize weights
• Training layer-by-layer as restricted Boltzmann machines (RBMs)• Learned feature activations are used as
input data in next layer
5
RBMs are energy-based models
Hidden units model the distribution
where
Energy can be raised or lowered by adjusting the biases and weight matrix
6
v1 v2 v3 vi…
h1 h2 hj…
bj
bi
Hidden layer
Visible layer
5/15/19
4
RBMs are energy-based models
The network assigns a probability to every possible image
Conditional distribution is easier to calculate
7
v1 v2 v3 vi…
h1 h2 hj…
bj
bi
Hidden layer
Visible layer
(and vice versa)
Derivation (1/2)
8
Joint over marginal
Expansion, cancel terms not dependent on h
Expansion
Exponential of sum is product of exponentials
Independent hj
Exponential of sum is product of exponentials
Expand for h’j = 0 and 1 cases
Combine both ∏j
Note distribution
5/15/19
5
Derivation (2/2)
9
Multiply by exp(-bj -Wj x)
RBM training
Given an input, hidden unit states are set to 1 according to
Next, a “confabulation” image is produced by setting each according to
Finally, the hidden unit states are updated to represent the confabulated image’s features
10
5/15/19
6
Unfolding and finetuning
Each next RBM is trained on previous hidden layer of feature detectors
Autoencoder is created by unfolding/mirroring stacked RBMs
Finetune using standard backpropagation
11
Exampleson images
12
Test data
6D Autoencoder
6D logistic PCA 7.64 MSE
1.44 MSE
Test data
30D Autoencoder
30D logistic PCA 8.01 MSE
3.00 MSE
Test data
30D Autoencoder
30D PCA 135 MSE
126 MSE
5/15/19
7
Example: 2D MNIST code visualization
13
LDA Autoencoder
Example: 2D document class visualization
14
Latent Semantic Analysis
Autoencoder