lecture 23 deep learning: segmentation...hi-res input image: 3 x 800 x 600 with region proposal...

Post on 15-Aug-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

COS 429: Computer Vision

Lecture 23Deep Learning: Segmentation

COS429 : 12.12.16 : Andras Ferencz

Thanks: most of these slides shamelessly adapted fromStanford CS231n: Convolutional Neural Networks for Visual Recognition

Fei-Fei Li, Andrej Karpathy, Justin Johnson http://cs231n.stanford.edu/

2 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

3 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

3

Szegedy et al, Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning, arXiv 2016

4 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

4

ClassificationClassification + Localization

Computer Vision Tasks

CAT CAT CAT, DOG, DUCK

Object DetectionInstance

Segmentation

CAT, DOG, DUCK

Single object Multiple objects

5 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

5

Simple Recipe for Classification + Localization

Step 2: Attach new fully-connected “regression head” to the network

Image

Convolution and Pooling

Final conv feature map

Fully-connected layers

Class scores

Fully-connected layers

Box coordinates

“Classification head”

“Regression head”

6 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

6

Sliding Window: Overfeat

Network input: 3 x 221 x 221

0.5 0.75

Classification scores: P(cat)

Larger image:3 x 257 x 257

7 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

7

Sliding Window: Overfeat

Network input: 3 x 221 x 221

0.5 0.75

0.6 0.8

Classification scores: P(cat)

Larger image:3 x 257 x 257

8 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

8

Sliding Window: Overfeat

Network input: 3 x 221 x 221

0.5 0.75

0.6 0.8

Classification scores: P(cat)

Larger image:3 x 257 x 257

9 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

9

Sliding Window: Overfeat

Network input: 3 x 221 x 221 Classification score:

P(cat)Larger image:3 x 257 x 257

Greedily merge boxes and scores (details in paper)

0.8

10 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

10

Sliding Window: Overfeat

In practice use many sliding window locations and multiple scales

Window positions + score maps Box regression outputs Final Predictions

Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014

11 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

11

Efficient Sliding Window: Overfeat

Image: 3 x 221 x 221

Convolution + pooling

Feature map: 1024 x 5 x 5

4096 x 1 x 1 1024 x 1 x 1

5 x 5 conv

5 x 5 conv

1 x 1 conv

4096 x 1 x 1 1024 x 1 x 1

Box coordinates:(4 x 1000) x 1 x 1

Class scores:1000 x 1 x 1

1 x 1 conv

1 x 1 conv 1 x 1 conv

Efficient sliding window by converting fully-connected layers into convolutions

12 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

12

Efficient Sliding Window: Overfeat

Training time: Small image, 1 x 1 classifier output

Test time: Larger image, 2 x 2 classifier output, only extra compute at yellow regions

Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014

13 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

13

ClassificationClassification + Localization

Computer Vision Tasks

Instance Segmentation

Object Detection

14 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

14

Region Proposals

● Find “blobby” image regions that are likely to contain objects● “Class-agnostic” object detector● Look for “blob-like” regions

15 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

15

Region Proposals: Selective Search

Bottom-up segmentation, merging regions at multiple scales

Convert regions to boxes

Uijlings et al, “Selective Search for Object Recognition”, IJCV 2013

16 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

16

R-CNN

Girschick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014

Slide credit: Ross Girschick

17 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

R-CNN Problems: Slow at test-time due to independent forward passes of the CNN

Solution: Share computation of convolutional layers between proposals for an image

R-CNN Problems: - Post-hoc training: CNN not updated in response to final classifiers and regressors- Complex training pipeline

Solution:Just train the whole system end-to-end all at once!

Fast R-CNN

18 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

18

Fast R-CNN: Region of Interest Pooling

Hi-res input image:

3 x 800 x 600with region proposal

Convolution and Pooling

Hi-res conv features:C x H x W

with region proposal

Fully-connected layers

Can back propagate similar to max pooling

RoI conv features:C x h x w

for region proposal

Fully-connected layers expect low-res conv features:

C x h x w

19 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

19

Faster R-CNN: TrainingIn the paper: Ugly pipeline- Use alternating optimization to train RPN,

then Fast R-CNN with RPN proposals, etc.- More complex than it has to be

Since publication: Joint training!One network, four losses- RPN classification (anchor good / bad)- RPN regression (anchor -> proposal)- Fast R-CNN classification (over classes)- Fast R-CNN regression (proposal -> box)

Slide credit: Ross Girschick

20 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

20

Faster R-CNN: Results

R-CNN Fast R-CNN Faster R-CNN

Test time per image(with proposals)

50 seconds 2 seconds 0.2 seconds

(Speedup) 1x 25x 250x

mAP (VOC 2007) 66.0 66.9 66.9

21 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

21

Object Detection State-of-the-art:ResNet 101 + Faster R-CNN + some extras

He et. al, “Deep Residual Learning for Image Recognition”, arXiv 2015

22 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

22

ImageNet Detection 2013 - 2015

23 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

23

YOLO: You Only Look OnceDetection as Regression

Divide image into S x S grid

Within each grid cell predict:B Boxes: 4 coordinates + confidenceClass scores: C numbers

Regression from image to 7 x 7 x (5 * B + C) tensor

Direct prediction using a CNN

Redmon et al, “You Only Look Once: Unified, Real-Time Object Detection”, arXiv 2015

24 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

24

YOLO: You Only Look OnceDetection as Regression

Faster than Faster R-CNN, but not as good

Redmon et al, “You Only Look Once: Unified, Real-Time Object Detection”, arXiv 2015

25 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

2525

ClassificationClassification + Localization

Computer Vision Tasks

CAT CAT CAT, DOG, DUCK

Object Detection Segmentation

CAT, DOG, DUCK

Multiple objectsSingle object

26 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Classification

Today

Object DetectionClassification + Localization

Segmentation

Today

27 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation

27

Label every pixel!

Don’t differentiate instances (cows)

Classic computer vision problem

Figure credit: Shotton et al, “TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context”, IJCV 2007

28 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation

28

Figure credit: Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015

Detect instances, give category, label pixels

“simultaneous detection and segmentation” (SDS)

Lots of recent work (MS-COCO)

29 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation

29

Extract patch

30 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation

30

CNN

Extract patch

Run througha CNN

31 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation

31

CNN COW

Extract patch

Run througha CNN

Classify center pixel

32 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation

32

CNN COW

Extract patch

Run througha CNN

Classify center pixel

Repeat for every pixel

33 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation

33

CNN

Run “fully convolutional” network to get all pixels at once

Smaller output due to pooling

34 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Multi-Scale

34

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

35 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Multi-Scale

35

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

Resize image to multiple scales

36 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Multi-Scale

36

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

Resize image to multiple scales

Run one CNN per scale

37 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Multi-Scale

37

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

Resize image to multiple scales

Run one CNN per scale

Upscale outputsand concatenate

38 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Multi-Scale

38

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

Resize image to multiple scales

Run one CNN per scale

Upscale outputsand concatenate

External “bottom-up” segmentation

39 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Multi-Scale

39

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013

Resize image to multiple scales

Run one CNN per scale

Upscale outputsand concatenate

External “bottom-up” segmentation

Combine everything for final outputs

40 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Refinement

40

Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Apply CNN once to get labels

41 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Refinement

41

Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Apply CNN once to get labels

Apply AGAIN to refine labels

42 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Refinement

42

Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Apply CNN once to get labels

Apply AGAIN to refine labels

And again!

Same CNN weights:recurrent convolutional network

More iterations improve results

43 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

43

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015

Semantic Segmentation: Upsampling

Learnable upsampling!

44 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

44

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015

Semantic Segmentation: Upsampling

45 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

45

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015

Semantic Segmentation: Upsampling

“skip connections”

46 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

46

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015

Semantic Segmentation: Upsampling

Skip connections = Better results

“skip connections”

47 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

47

Typical 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

48 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

48

Typical 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Dot product between filter and input

49 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

49

Typical 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Dot product between filter and input

50 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

50

Typical 3 x 3 convolution, stride 2 pad 1

Input: 4 x 4 Output: 2 x 2

51 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

51

Typical 3 x 3 convolution, stride 2 pad 1

Input: 4 x 4 Output: 2 x 2

Dot product between filter and input

52 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

52

Typical 3 x 3 convolution, stride 2 pad 1

Input: 4 x 4 Output: 2 x 2

Dot product between filter and input

53 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

53

3 x 3 “deconvolution”, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

54 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

54

3 x 3 “deconvolution”, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Input gives weight for filter

55 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Learnable Upsampling: “Deconvolution”

55

3 x 3 “deconvolution”, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Input gives weight for filter

Sum where output overlaps

Same as backward pass for normal convolution!

“Deconvolution” is a bad name, already defined as “inverse of convolution”

Better names: convolution transpose,backward strided convolution,1/2 strided convolution, upconvolution

56 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Semantic Segmentation: Upsampling

56

Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Normal VGG “Upside down” VGG

6 days of training on Titan X…

57 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation

57

Figure credit: Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015

Detect instances, give category, label pixels

“simultaneous detection and segmentation” (SDS)

Lots of recent work (MS-COCO)

58 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation

58

Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014

Similar to R-CNN, but with segments

59 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation

59

Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014

External Segment proposals

Similar to R-CNN, but with segments

60 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation

60

Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014

External Segment proposals

Similar to R-CNN

61 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation

61

Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014

External Segment proposals

Mask out background with mean image

Similar to R-CNN, but with segments

62 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation

62

Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014

External Segment proposals

Mask out background with mean image

Similar to R-CNN, but with segments

63 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation

63

Hariharan et al, “Simultaneous Detection and Segmentation”, ECCV 2014

External Segment proposals

Mask out background with mean image

Similar to R-CNN, but with segments

64 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation: Cascades

64

Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015

Similar to Faster R-CNN

Won COCO 2015 challenge (with ResNet)

65 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation: Cascades

65

Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015

Similar to Faster R-CNN

Region proposal network (RPN)

Won COCO 2015 challenge (with ResNet)

66 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation: Cascades

66

Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015

Similar to Faster R-CNN

Won COCO 2015 challenge (with ResNet)

Region proposal network (RPN)

Reshape boxes to fixed size,figure / ground logistic regression

Mask out background, predict object class

Learn entire model end-to-end!

67 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Instance Segmentation: Cascades

67

Dai et al, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, arXiv 2015 Predictions Ground truth

68 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Segmentation Overview

Semantic segmentationClassify all pixelsFully convolutional models, downsample then upsampleLearnable upsampling: fractionally strided convolutionSkip connections can help

Instance SegmentationDetect instance, generate maskSimilar pipelines to object detection

68

69 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Quick overview ofOther Topics

69

70 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

70

Recurrent Neural Networks (RNN)

Vanilla Neural Networks

71 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

71

e.g. Image Captioningimage -> sequence of words

Recurrent Neural Networks (RNN)

72 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

72

e.g. Sentiment Classificationsequence of words -> sentiment

Recurrent Neural Networks (RNN)

73 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

73

e.g. Machine Translationseq of words -> seq of words

Recurrent Neural Networks (RNN)

74 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

74

e.g. Video classification on frame level

Recurrent Neural Networks (RNN)

75 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

75

x

RNN

y

76 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

train more

train more

train more

Character RNN during training

77 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

77

78 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

78

Generated C code

79 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

79

quote detection cell

Searching for interpretable cells

80 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Multiple Object Recognition with

Visual Attention, Ba et al.

Sequential Processing of fixed inputs

81 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

DRAW: A Recurrent

Neural Network For

Image Generation,

Gregor et al.

Sequential Processing of fixed outputs

82 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

82

Explain Images with Multimodal Recurrent Neural Networks, Mao et al.Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-FeiShow and Tell: A Neural Image Caption Generator, Vinyals et al.Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick

Image Captioning

83 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

83

Convolutional Neural Network

Recurrent Neural Network

84 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

84

CNN

Image: H x W x 3

Features: L x D

h0

a1

z1

Weighted combination of features

y1

h1

First word

Distribution over L locations

Soft Attention for Captioning

a2 d1

h2

z2 y2Weighted

features: D

Distribution over vocab

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

85 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Soft Attention for Captioning

85

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

Soft attention

Hard attention

86 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Soft Attention for Captioning

86

Xu et al, “Show, Attend and Tell: Neural Image Caption Generation with Visual Attention”, ICML 2015

87 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Spatial Transformer Networks

Input image:H x W x 3

Box Coordinates:(xc, yc, w, h)

Cropped and rescaled image:

X x Y x 3

Jaderberg et al, “Spatial Transformer Networks”, NIPS 2015

Can we make this function differentiable?

88 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Spatial Transformer Networks

Input image:H x W x 3

Box Coordinates:(xc, yc, w, h)

Cropped and rescaled image:

X x Y x 3

Can we make this function differentiable?

Jaderberg et al, “Spatial Transformer Networks”, NIPS 2015

Idea: Function mapping pixel coordinates (xt, yt) of output to pixel coordinates (xs, ys) of input

Repeat for all pixels in output to get a sampling grid

Then use bilinear interpolation to compute output

Network attends to input by predicting �

89 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Spatial Transformer Networks

Input: Full image

A smallLocalization network predicts transform �

Grid generator uses to �compute sampling grid

Sampler uses bilinear interpolation to produce output

Output: Region of interest from input

90 : COS429 : L23 : 12.12.16 : Andras Ferencz Slide Credit:

Spatial Transformer Networks

90

Differentiable “attention / transformation” module

Insert spatial transformers into a classification network and it learns to attend and transform the input

top related