analysis of single-layer networks

40
Analysis of Single-Layer Networks Presented by Hourieh Fakourfar Adam Coates [email protected] Honglak Lee [email protected] Andrew Y. Ng [email protected] Computer Science Department, Stanford University, Stanford, CA 94305, USA An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Upload: others

Post on 15-Oct-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Single-Layer Networks

Analysis of Single-Layer Networks

Presented by Hourieh Fakourfar

Adam Coates [email protected]

Honglak Lee [email protected]

Andrew Y. Ng [email protected]

Computer Science Department, Stanford University, Stanford, CA 94305, USA

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Page 2: Analysis of Single-Layer Networks

Agenda

• Introduction

• Unsupervised Learning

• Feature Extraction

• Classification

• Experiments & Analysis

• Conclusion

Page 3: Analysis of Single-Layer Networks

Motivation

• Achieve a state-of-the-art performance with simple algorithms and a single layer of features

• Avoid complexity and expense, without compromising performance

Page 4: Analysis of Single-Layer Networks
Page 5: Analysis of Single-Layer Networks

Single-Layer Network

• Maps the n-dimensional input space to the m-dimensional output space

• Widely used for linear separable problems

Input layer Output layer

http://wwwold.ece.utep.edu/

Page 6: Analysis of Single-Layer Networks

Multi-layer Network

• One or more hidden layers

• Higher level of computation

• More complexity

• Cost efficient?

• Performance?

Page 7: Analysis of Single-Layer Networks
Page 8: Analysis of Single-Layer Networks

Unsupervised Learning

• To find hidden structure in unlabeled data▫ learn feature representations from unlabeled data

•• Unlike supervised learning no error or reward signal to

evaluate a potential solution.

• Approaches to unsupervised learning include:▫ Clustering k-means, mixture models, hierarchical clustering

▫ Blind signal separation using feature extraction techniques for dimensionality reduction PCA, Independent component analysis, Non-negative matrix

factorization, Singular value decomposition

Page 9: Analysis of Single-Layer Networks
Page 10: Analysis of Single-Layer Networks

Benchmark Datasets

• CIFAR

• NORB

• STL

http://www.idsia.ch/~juergen/vision.html http://www.stanford.edu/~acoates//stl10/

Page 11: Analysis of Single-Layer Networks

CIFAR-10

• 60000 32x32 color images

▫ 10 classes

▫ 6000 images per class

• 50000 training

• 10000 test images

32

32

CIFAR-10: 10 classes with 10 random images from each

http://www.cs.toronto.edu/~kriz/cifar.html

Page 12: Analysis of Single-Layer Networks

NORB• Images of 50 toys

• 5 generic categories

▫ Four-legged animals

▫ Human figures

▫ Airplanes

▫ Trucks

▫ Cars

• Training set:

▫ 5 instances of each category (instances 4, 6, 7, 8 and 9)

• Test set

▫ 5 instances (instances 0, 1, 2, 3, and 5).

Page 13: Analysis of Single-Layer Networks

STL-10

• 10 classes

▫ airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck.

• Images Size

▫ 96x96 pixels

• Training images

▫ 500 (10 pre-defined folds)

• Test images

▫ 800 per class

▫ 100000 unlabeled images for unsupervised learning

Page 14: Analysis of Single-Layer Networks
Page 15: Analysis of Single-Layer Networks

Learning Framework

Extract patches

Extract random patches from unlabeled training images.

Pre-processing

Apply a pre-processing stage to the patches.

Feature-mapping

Learn a feature-mapping using an unsupervised learning algorithm.

Page 16: Analysis of Single-Layer Networks

Preprocessing

Normalization (Zscores) Whitening

• For images/visual data:

▫ local brightness

▫ Contrast normalization

• Mean subtraction and scale normalization

• Common preprocessing technique

• Decorrelates data

• Observed vector x is linearly transformed to new vector

Page 17: Analysis of Single-Layer Networks

Whitening

• Observed vector x

• linearly transformed to new vector

▫ Components are uncorrelated

▫ Their variances is equal to unity

• New vector is then defined by:

Page 18: Analysis of Single-Layer Networks

Feature Mapping

Page 19: Analysis of Single-Layer Networks

Feature Extraction & Classification

Framework

Extract features from

Pool features

Training and prediction

Page 20: Analysis of Single-Layer Networks

Feature extraction and classification

• Reduce dimentionality to:▫ (n-w-1)-by-(n-w+1)-by-K image representation via feature mapping

▫ Summing up over local regions of yij

▫ Split yij into 4 equal-sized quadrants

▫ Compute the sum yij in each

▫ Obtain 4K-dimensional feature vectors for each training image and label

• Apply L2 SVM classification ▫ Where regularization parameters determined using cross validation

Page 21: Analysis of Single-Layer Networks
Page 22: Analysis of Single-Layer Networks

Feature Learning Algorithms

• Sparse auto-encoder

• Sparse RBMs

• K-means clustering

• Gaussian Mixtures

Page 23: Analysis of Single-Layer Networks

Sparse Auto-encoder

• Feature mapping:

Where,

is the logistic sigmoid function, and is applied component-wise to vector z

Page 24: Analysis of Single-Layer Networks

Sparse Auto-encoder

• Nice way to do non-linear dimensionality reduction:

▫ They provide mappings both ways

▫ The learning time and memory both scale

• Dimensionality reduction facilitates the classification, visualization, communication, and storage of high-dimensional data.

Image Representation

Input Image

K

N

W,b

Page 25: Analysis of Single-Layer Networks

Sparse Restricted Boltzmann Machine

(RBM)Hidden Units• Particular form of log-linear

Markov Random Field (MRF)

• Energy function is linear:

▫ E(v,h) = - b'v - c'h - h'Wv

▫ W :weights connecting hidden and visible units

▫ b, c are the offsets of the visible and hidden layers respectively

• Sparsity penalty as in autoencoder

m=3

n=4

Visible Units

http://deeplearning.net/tutorial/rbm.html

Page 26: Analysis of Single-Layer Networks

K-means Clustering

• Standard 1-of-K, hard-assignment coding

• Non-linear mapping that attempts “softer”

• chance-based

Page 27: Analysis of Single-Layer Networks

K-means Clustering

• Standard 1-of-K, hard-assignment coding

• Non-linear mapping that attempts “softer”

• chance-based

Page 28: Analysis of Single-Layer Networks

Gaussian Mixtures Model (GMM)

Represents the density of K Gaussian distributions

f maps each input to the posterior membership probabilities

Page 29: Analysis of Single-Layer Networks
Page 30: Analysis of Single-Layer Networks

Parameters

• Evaluate and assess the effects of change in the following parameters:

▫ Whitened or raw image

▫ Number of features K

▫ Stride size s

▫ Receptive field size w

Page 31: Analysis of Single-Layer Networks

Testing Procedure

• For each unsupervised learning algorithm

▫ Train a single-layer of features

Whitened or raw

Choice of the parameteres K, s, and w

▫ Then train a linear classifier

On a holdout set (main analysis)

On test set (for final results)

Page 32: Analysis of Single-Layer Networks

K-means

• Whitening is a crucial pre-process since the clustering algorithms cannot handle the correlations in the data

Without Whitening With Whitening

Page 33: Analysis of Single-Layer Networks

GMM

• Whitening is a crucial pre-process since the clustering algorithms cannot handle the correlations in the data

Without Whitening With Whitening

Page 34: Analysis of Single-Layer Networks

Sparse Auto-encoder

The effect here is somewhat ambiguous

Without Whitening With Whitening

Page 35: Analysis of Single-Layer Networks

Sparse RBM

Without Whitening With Whitening

The effect here is somewhat ambiguous

Page 36: Analysis of Single-Layer Networks

Performance for Raw and Whitened Inputs

• Feature representations with K=100,200, 400, 800, 1200, & 1600

• All algorithms achieved higher performance by learning more features as expected

Page 37: Analysis of Single-Layer Networks

Performance vs. Feature Stride

• “Stride” s is the spacing between patches where feature values will be extracted

• # of features are fixed (1600)

• Receptive field size (6 pixel)

• Stride is varying over 1,2,4,8

Page 38: Analysis of Single-Layer Networks

Effect of Receptive Field

• Stride = 1 ;1600 Bases ;Whitening

• Tested results for w = 6,8,12

• Overall 6 pixel receptive field

worked best

• Meanwhile 12 were similar or

worse than 6 or 8 pixels

• Unlike for other parameters

receptive field requires cross-

validation to make an informed

choice

Page 39: Analysis of Single-Layer Networks

Final Classification Results

Page 40: Analysis of Single-Layer Networks

Conclusions

Mean-subtraction, scale normalization and Whitening

+ Large K (#of features)

+ Small s (step size or “stride”)

+ Right patch size w (receptive field size)

+ Simple feature learning algorithm (soft K-means)

=

State-of-the-art results on CIFAR-10 and NORB