robust representation learning with nonnegativity constraintshtk/publication/2014-icml... ·...

18
Stable and Efficient Representation Learning with Nonnegativity Constraints Tsung-Han Lin and H.T. Kung

Upload: others

Post on 22-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Stable and EfficientRepresentation Learning with

Nonnegativity Constraints

Tsung-Han Lin and H.T. Kung

Page 2: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Unsupervised Representation Learning

Layer 1 Representation

Layer 2 Representation

Layer 3 Representation

Dictionary

EncodingSparse encoder

(e.g., l1-regularized sparse coding)

Large Dictionary

Page 3: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Why Sparse Representations?

• Prior knowledge is better encoded into sparse representations– Data is explained by only a few underlying factors– Representations are more linearly separable

Feature B

Feature A Simplifies supervised classifier training: sparse representations work well even when labeled samples are few

Page 4: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Computing Sparse RepresentationsSparse approximation:

= 0.5 0.3 +× ×

Page 5: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Computing Sparse RepresentationsSparse approximation:

• L1 relaxation approach: good classification accuracy, but computation is expensive

• Greedy approach (e.g., orthogonal matching pursuit): fast, but yields suboptimal classification accuracy

L1-regularized OMPClassification accuracy (%) 78.7 76.0

[Coates 2011]

CIFAR-10 classification with single-layer architecture

Page 6: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Major Findings

• Weak stability is the key to OMP’s suboptimal performance

• By allowing only additive features (via nonnegativity constraints), classification with OMP delivers higher accuracy by large margins

• Competitive classification accuracy with deep neural networks

Page 7: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Stability of Representations

Data Input

Encoder

?+ n

Page 8: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

d2

d3

d1x

Select the atom that has the largest correlation with the residual

Support set

k

Orthogonal Matching Pursuit (OMP)Select k atoms from a dictionary D that minimize |x-Dz|

d1

Page 9: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

x

d2

d3

d1

Support set d1

Select the atom that has the largest correlation with the residual

r(1)

Estimate the coefficients of the selected atoms by least squaresDz(1)

Update the residual using current estimate

x

k

Orthogonal Matching Pursuit (OMP)Select k atoms from a dictionary D that minimize |x-Dz|

Page 10: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

d2

d3

d1

Support set d1 d3

Select the atom that has the largest correlation with the residual

r(1)

Estimate the coefficients of the selected atoms by least squaresDz(1)

Update the residual using current estimate

k

x

Orthogonal Matching Pursuit (OMP)Select k atoms from a dictionary D that minimize |x-Dz|

Page 11: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

d1

d2

OMP

d1

d2

residual

δ

Nonnegative OMPUse only additive features by constraining the atoms and coefficients to be nonnegative

residual

1. Larger region for noise tolerance

2. Terminate without overfitting

“+d1”

“-d2”

n n

Page 12: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Allowing Only Additive Features

= +

Cancellation

Page 13: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Allowing Only Additive Features

= +

Enforce nonnegativity to eliminate cancellationOn input:

3-2-1

300021Sign splitting

“+” channel

“−” channel

On dictionary:

On representation:

• Any nonnegative sparse coding algorithms

• We use spherical K-means

• Encode with nonnegative OMP (NOMP)

Page 14: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Evaluate the Stability of Representations

Encoder Rotation angle δ0 0.01π 0.02π 0.03π 0.04π

OMP 1 0.71 0.54 0.43 0.34NOMP 1 0.92 0.80 0.68 0.57

Grating A

Encode by OMP/NOMP

Feature dictionary learned from image datasets

Representation A

Correlation between representation A and B

Grating B

Rotate by some small angle δ

Representation B

Measure change by their correlation

Page 15: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Classification: NOMP vs OMP

Classification accuracy on CIFAR-10NOMP has ~3% improvement over OMP

Page 16: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

NOMP Outperforms When Fewer Labeled Samples Are Available

Classification accuracy on CIFAR-10 with fewer labeled training samples

Page 17: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

STL-10: 10 classes, 100 labeled samples/class, 96x96 images

64.5%Hierarchical matching

pursuit (2012)

67.9%This work

61.4%Maxout network (2013)

60.1%This work

airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck

CIFAR-100: 100 classes, 500 labeled samples/class, 32x32 images

aquatic mammals, fish, flowers, food containers, fruit and vegetables, household electrical devices, household furniture, insects, large carnivores, large man-made outdoor things, large natural outdoor scenes, large omnivores and herbivores, medium-sized mammals, non-insect invertebrates, people, reptiles, small mammals, trees, vehicles

Page 18: Robust Representation Learning with Nonnegativity Constraintshtk/publication/2014-icml... · Unsupervised Representation Learning Layer 1 Representation Layer 2 Representation. Layer

Conclusion

• Greedy sparse encoder is useful, giving a scalable unsupervised representation learning pipeline that attains state-of-the-art classification performance

• Proper choice of encoder is critical: the stability of encoder is a key to the quality of representations