cs484/694: intro to computer vision - university of waterloo

47
CS484/694: Intro to Computer Vision Pooling & Architectures 9/19/17 Agastya Kalra 1

Upload: others

Post on 02-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS484/694: Intro to Computer Vision - University of Waterloo

CS484/694: Intro to Computer VisionPooling & Architectures

9/19/17 Agastya Kalra1

Page 2: CS484/694: Intro to Computer Vision - University of Waterloo

Outline

• Review

• Pooling

• Architectures

9/19/17 Agastya Kalra2

Page 3: CS484/694: Intro to Computer Vision - University of Waterloo

Outline

• Review

• Pooling

• Architectures

9/19/17 Agastya Kalra3

Page 4: CS484/694: Intro to Computer Vision - University of Waterloo

Convolutional Layer

4

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf

Page 5: CS484/694: Intro to Computer Vision - University of Waterloo

Outline

• Review

• Pooling

• Architectures

9/19/17 Agastya Kalra5

Page 6: CS484/694: Intro to Computer Vision - University of Waterloo

6

What is a conv feature?

Page 7: CS484/694: Intro to Computer Vision - University of Waterloo

7

What is a conv feature?

● Description of region of

image

Page 8: CS484/694: Intro to Computer Vision - University of Waterloo

Motivation

• We care more about knowing that a feature is present than exactly where it is

• i.e. We don’t need the exact pixels of Obama’s eyes and nose to know it is obama’s face

8

Page 9: CS484/694: Intro to Computer Vision - University of Waterloo

Solution: Pooling

• Replaces activations with summary statistic of nearby activations

9

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf

Page 10: CS484/694: Intro to Computer Vision - University of Waterloo

Example: Max Pooling

10

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf

Page 11: CS484/694: Intro to Computer Vision - University of Waterloo

Benefits Of Pooling

• Some translation invariance• No parameters• Easy to backprop• Less computations• Increased receptive field

11

Page 12: CS484/694: Intro to Computer Vision - University of Waterloo

Where is pooling used?

• After series of convs• Followed by double

depth

12

Page 13: CS484/694: Intro to Computer Vision - University of Waterloo

Drawbacks of Pooling

• Loss of information

13

Page 14: CS484/694: Intro to Computer Vision - University of Waterloo

Pooling Layer Summary

14

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf

Page 15: CS484/694: Intro to Computer Vision - University of Waterloo

Outline

• Review

• Pooling

• Architectures

9/19/17 Agastya Kalra15

Page 16: CS484/694: Intro to Computer Vision - University of Waterloo

Architectures for ImageNet

16

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture9.pdf

Page 17: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• Input Image 227x227x3

17

Page 18: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• Input Image 227x227x3• First Layer: 96 11x11 Conv filters stride 4 pad 0• What is output size?

18

Page 19: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• Input Image 227x227x3• First Layer: 96 11x11 Conv filters stride 4 pad 0• What is output size? 55x55x96

19

Page 20: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• Input Image 227x227x3• First Layer: 96 11x11 Conv filters stride 4 pad 0• How Many Parameters?

20

Page 21: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• Input Image 227x227x3• First Layer: 96 11x11 Conv filters stride 4 pad 0• How Many Parameters? 11x11x3x96 = 34,848

21

Page 22: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• First Layer Output 55x55x96• Second Layer: 2x2 Max Pooling stride 2• What is the output size?

22

Page 23: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• First Layer Output 55x55x96• Second Layer: 2x2 Max Pooling stride 2• What is the output size? 27x27x96

23

Page 24: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• First Layer Output 55x55x96• Second Layer: 2x2 Max Pooling stride 2• What is the number of params?

24

Page 25: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• First Layer Output 55x55x96• Second Layer: 2x2 Max Pooling stride 2• What is the number of params? 0

25

Page 26: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• First use of dropout, relu• Heavy data augmentation• SGD + Momentum, lr=1e-2, momentum=0.9, l2=5e-4

26

Page 27: CS484/694: Intro to Computer Vision - University of Waterloo

AlexNet - 2012

• GPUs too small, so they had half the network on different GPUs

27

Page 28: CS484/694: Intro to Computer Vision - University of Waterloo

VGGNet 2014

• First simple widely used net• Smaller and Deeper

28

Page 29: CS484/694: Intro to Computer Vision - University of Waterloo

VGGNet 2014

• First simple widely used net• Smaller and Deeper

What is receptive field of 7x7conv layer?

29

Page 30: CS484/694: Intro to Computer Vision - University of Waterloo

VGGNet 2014

• First simple widely used net• Smaller and Deeper

What is receptive field of 7x7conv layer?

7x7

30

Page 31: CS484/694: Intro to Computer Vision - University of Waterloo

VGGNet 2014

• First simple widely used net• Smaller and Deeper

What is receptive field of three 3x3 conv layers?

31

Page 32: CS484/694: Intro to Computer Vision - University of Waterloo

VGGNet 2014

• First simple widely used net• Smaller and Deeper

What is receptive field of three 3x3 conv layers?

Also 7x7

32

Page 33: CS484/694: Intro to Computer Vision - University of Waterloo

VGGNet 2014

• First simple widely used net• Smaller and Deeper

How many params does a 7x7layer with depth D have vs three3x3 filters of depth D?

33

Page 34: CS484/694: Intro to Computer Vision - University of Waterloo

VGGNet 2014

• First simple widely used net• Smaller and Deeper

How many params does a 7x7layer with depth D have vs three3x3 filters of depth D?

7*7*D*D = 49D^2

3*(3*3*D*D) = 27D^2

34

Page 35: CS484/694: Intro to Computer Vision - University of Waterloo

• First simple widely used net• Smaller and Deeper

How many non-linearities in a 7x7 filter vs three 3x3 filters?

VGGNet 2014

35

Page 36: CS484/694: Intro to Computer Vision - University of Waterloo

• First simple widely used net• Smaller and Deeper

How many non-linearities in a 7x7 filter vs three 3x3 filters?

1 vs 3

VGGNet 2014

36

Page 37: CS484/694: Intro to Computer Vision - University of Waterloo

• Similar training procedure• Good out of box model +

weights• Downsamples input by 32 and

then does fully connected layer

VGGNet 2014

37

Page 38: CS484/694: Intro to Computer Vision - University of Waterloo

Resnet 2015

38

Page 39: CS484/694: Intro to Computer Vision - University of Waterloo

ResNet 2015

• Lets go much deeper!• Why not just add many

convs?

39

Page 40: CS484/694: Intro to Computer Vision - University of Waterloo

ResNet 2015

• Training curves for plain models

Why is deeper worse???

40

Page 41: CS484/694: Intro to Computer Vision - University of Waterloo

ResNet 2015

• Training deep networks is a hard optimization problem

41

Page 42: CS484/694: Intro to Computer Vision - University of Waterloo

ResNet 2015

• Solution: Identity Mappings

42

Page 43: CS484/694: Intro to Computer Vision - University of Waterloo

ResNet 2015

• Normal Networks: y = f(g(h(x)))• Residual Networks: y = f(g(h(x) + x) + h(x) + x) + g(h(x)

+ x) + h(x) + x• There is always direct gradient flow to x

43

Page 44: CS484/694: Intro to Computer Vision - University of Waterloo

ResNet 2015

• Showed continual improvement with increased depth

44

Page 45: CS484/694: Intro to Computer Vision - University of Waterloo

ResNet 2015

• Great default• final layer does global average pooling• start with 7x7x64 conv followed by

3x3 max pool stride 2. <= standard inputprocessing

• Uses Conv - Batch Norm - Relu• No Dropout, No other pooling• Uses 1x1 Convs to downsize, and then 3x3s

45

Page 46: CS484/694: Intro to Computer Vision - University of Waterloo

Architectures Summary

• Deeper is better• Focus on gradient flow• Conv - Batch Norm - ReLu• Compositions of small filters are key: 3x3, 1x1• Pretrained ImageNet weights are very good• Keras:

https://github.com/fchollet/keras/tree/master/keras/applications

46

Page 47: CS484/694: Intro to Computer Vision - University of Waterloo

47