fully convoluional networks for semantic...
TRANSCRIPT
![Page 1: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/1.jpg)
Fully Convolutional Networks for Semantic Segmentation
Jonathan Long, Evan Shelhamer, Trevor Darrell
Presenter: Hannah Li
![Page 2: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/2.jpg)
Fixed dimension and throws away spatial coordinatesConvolutional Networks
Fully Convolutional Networks
![Page 3: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/3.jpg)
Fully Convolutional Networks
xi,j - location (i, j) for a particular layeryi,j - location (i, j) for the following layerk - kernel (filter) sizes - stridefks - determines the layer type:
- matrix multiplication for convolution or average pooling- spatial max for max pooling- elementwise nonlinearity for an activation function
To find the gradient descent on the whole image, you only need to find the gradient descent of the final layer
![Page 4: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/4.jpg)
Output dimensions are reduced to keep filters small and
computational requirements reasonable
However, we need dense pixels…
![Page 5: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/5.jpg)
Solution: Deconvolution/Upsampling
stride = 1 stride = 2
https://github.com/vdumoulin/conv_arithmetic
![Page 6: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/6.jpg)
http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-pixels.pdf
Deconvolution
Bilinear interpolation
![Page 7: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/7.jpg)
From classifier to dense FCN1. Discard the final layer2. Convert all fully connected layers to convolutions3. Append convolution with channel dimension 21 to
predict score for the PASCAL classes
![Page 8: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/8.jpg)
![Page 9: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/9.jpg)
![Page 10: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/10.jpg)
FCN for segmentation Technique: combining final
prediction layer with lower layers with finer strides
Not detailed enough
Combining fine layers and coarse layers lets the model make local predictions that respect global
structure.
![Page 11: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/11.jpg)
Results
20% improvement for mean IoU
286 times faster
*Simultaneous detection and segmentation. ECCV 2014
*
![Page 12: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/12.jpg)
Conclusions
• “Fully convolutional” networks take inputs of any size
• Adapt AlexNet, VGG net, GoogLeNetinto fully convolutional networks
• Fine-tune them to the segmentation task
• New architecture: combines semantic info from coarse layer with info from shallow, fine layer
• 20% relative improvement to 62.2% mean IU (PASCAL VOC 2012)
![Page 13: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/13.jpg)
Extras
![Page 14: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/14.jpg)
Fully Convolutional Networks
xi,j - location (i, j) for a particular layeryi,j - location (i, j) for the following layerk - kernel (filter) sizes - stridefks - determines the layer type:
- matrix multiplication for convolution or average pooling- spatial max for max pooling- elementwise nonlinearity for an activation function
- Kernel size and stride for this functional form obeys the transformation rule above- Computes a nonlinear filter (as opposed to a general nonlinear function)
FCN naturally operates on an input of any size!
![Page 15: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/15.jpg)
Loss Function
• Loss function is a sum over the spatial dimension of the final layer
• Gradient of loss function is a sum over the gradients of each of its spatial components
• Takes all of the final layer receptive fields as a minibatch
The receptive fields overlap significantly ->More efficient!
5 times faster than AlexNet
Spatial output maps are suitable dense problems like semantic
segmentation
![Page 16: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean](https://reader036.vdocuments.us/reader036/viewer/2022071104/5fde07c95ac4857c7d1c02b1/html5/thumbnails/16.jpg)