reducing the dimensionality of data with neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 reducing...

7
5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The curse of dimensionality High dimensional data often has more features than observations As more variables are added, it becomes more difficult to make accurate predictions Example: Finding a cell in a 2D petri dish vs. 3D beaker 2 25 cm 2 125 cm 3 https://www.statisticshowto.datas ciencecentral.com/dimensionality/

Upload: others

Post on 01-Feb-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The

5/15/19

1

Reducing the Dimensionality of Data with Neural NetworksANDREA CASTRO

MAY 14, 2019

The curse of dimensionality• High dimensional data often has more features than observations• As more variables are added, it becomes more difficult to make

accurate predictions• Example: Finding a cell in a 2D petri dish vs. 3D beaker

2

25 cm2 125 cm3https://www.statisticshowto.datasciencecentral.com/dimensionality/

Page 2: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The

5/15/19

2

Reducing dimensionality

• Principle Components Analysis (PCA) • Finds directions of

greatest variance • Represents each data

point by coordinates along these directions

3

http://www.nlpca.org/pca_principal_component_analysis.html

Autoencoders

4

• Composed of encoder and decoder networks• Encoder: high-dimensional data -> low-

dimensional code• Decoder: recovers original data from low-

dimensional code

• Minimize discrepancy between input and output• Difficult to perform gradient descent

without well-initialized weights

Page 3: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The

5/15/19

3

Pretraining to optimize weights

• Training layer-by-layer as restricted Boltzmann machines (RBMs)• Learned feature activations are used as

input data in next layer

5

RBMs are energy-based models

Hidden units model the distribution

where

Energy can be raised or lowered by adjusting the biases and weight matrix

6

v1 v2 v3 vi…

h1 h2 hj…

bj

bi

Hidden layer

Visible layer

Page 4: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The

5/15/19

4

RBMs are energy-based models

The network assigns a probability to every possible image

Conditional distribution is easier to calculate

7

v1 v2 v3 vi…

h1 h2 hj…

bj

bi

Hidden layer

Visible layer

(and vice versa)

Derivation (1/2)

8

Joint over marginal

Expansion, cancel terms not dependent on h

Expansion

Exponential of sum is product of exponentials

Independent hj

Exponential of sum is product of exponentials

Expand for h’j = 0 and 1 cases

Combine both ∏j

Note distribution

Page 5: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The

5/15/19

5

Derivation (2/2)

9

Multiply by exp(-bj -Wj x)

RBM training

Given an input, hidden unit states are set to 1 according to

Next, a “confabulation” image is produced by setting each according to

Finally, the hidden unit states are updated to represent the confabulated image’s features

10

Page 6: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The

5/15/19

6

Unfolding and finetuning

Each next RBM is trained on previous hidden layer of feature detectors

Autoencoder is created by unfolding/mirroring stacked RBMs

Finetune using standard backpropagation

11

Exampleson images

12

Test data

6D Autoencoder

6D logistic PCA 7.64 MSE

1.44 MSE

Test data

30D Autoencoder

30D logistic PCA 8.01 MSE

3.00 MSE

Test data

30D Autoencoder

30D PCA 135 MSE

126 MSE

Page 7: Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing the Dimensionality of Data with Neural Networks ANDREA CASTRO MAY 14, 2019 The

5/15/19

7

Example: 2D MNIST code visualization

13

LDA Autoencoder

Example: 2D document class visualization

14

Latent Semantic Analysis

Autoencoder