deep end2end voxel2voxel prediction · ahmed osman • motivation –“convolutional neural...

Ahmed Osman

Deep End2End Voxel2Voxel Prediction

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

Presented by: Ahmed Osman

Ahmed Osman

•Problems– Video Semantic Segmentation

– Optical Flow Estimation

– Video Coloring

•Related Work

•Contribution

•Method

•Experiments and Results

•Conclusion

Outline

Ahmed Osman

• Problems– Video Semantic Segmentation

– Video Coloring

• Related Work

• Contribution

• Method

• Experiments and Results

• Conclusion

Outline

Ahmed Osman

• Semantic Segmentation

Video Semantic Segmentation

http://jamie.shotton.org/work/images/resear6.png

Ahmed Osman

• Video Semantic Segmentation

Video Semantic Segmentation

http://jamie.shotton.org/work/images/resear6.png

Ahmed Osman

– Video Coloring

• Related Work

• Contribution

• Method

• Conclusion

Outline

Ahmed Osman

Optical Flow Estimation

http://www.cvlibs.net/projects/objectsceneflow/showcase.jpg

A Filter Formulation for Computing Real Time Optical FlowAdarve et al.https://www.youtube.com/watch?v=_oW1vMdBMuY

Ahmed Osman

– Video Coloring

• Related Work

• Contribution

• Method

• Conclusion

Outline

Ahmed Osman

Video Coloring

http://images.mentalfloss.com/sites/default/files/styles/article_640x430/public/colorizing-movies_6.jpg

Ahmed Osman

– Video Coloring

• Related Work

• Contribution

• Method

• Conclusion

Outline

Ahmed Osman

Traditional Computer Vision Pipeline

Ahmed Osman

• Motivation– “Convolutional Neural Networks (CNN) are biologically-

inspired variants of MLPs.”

– “Revolutionized the traditional computer vision pipeline”

– Re-popularized by Krizhevsky et al. in 2012 by producing state-of-the-art results on the ImageNet dataset (Image Classification).

– Why was AlexNet successful?• Large labeled datasets

• GPU Computing

Convolutional Neural Networks

Ahmed Osman

ConvNets

Ahmed Osman

• Convolution

ConvNets

https://developer.apple.com/library/ios/documentation/Performance/Conceptual/vImage/ConvolutionOperations/ConvolutionOperations.html

Ahmed Osman

• Convolution Layer

ConvNets

http://cs231n.github.io/convolutional-networks/

Ahmed Osman

• Activation function

ConvNets

Ahmed Osman

• Activation function– Rectified Linear Unit (ReLU)

• No gradient vanishing problem

• Non linear

ConvNets

Ahmed Osman

• Pooling

ConvNets

Ahmed Osman

• Fully Connected Layer

ConvNets

Ahmed Osman

• How to determine the weights?– Learn them using backpropagation

ConvNets

Ahmed Osman

• Loss Function

– Softmax

– Huber

– L2

ConvNets

Ahmed Osman

• Loss Function

– Softmax

– Huber

– L2

ConvNets

22Green: Huber Blue: L2

Ahmed Osman

ConvNets

Ahmed Osman

– Chain Rule

ConvNets

Ahmed Osman

Backpropagation

Slides from Stanford University Course CS231Nhttp://cs231n.stanford.edu/slides/winter1516_lecture4.pdf

Ahmed Osman

Backpropagation

Ahmed Osman

Backpropagation

Ahmed Osman

Backpropagation

Ahmed Osman

Backpropagation

Ahmed Osman

Backpropagation

Ahmed Osman

• Fully Convolutional Network

• FlowNet

• Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

Related Work

Ahmed Osman

• Fully Convolutional Network (FCN)

Related Work

Ahmed Osman

• FlowNet

Related Work

Ahmed Osman

Related Work

• FlowNet

Ahmed Osman

• Eigen et al. [2014]

Related Work

Ahmed Osman

– Video Coloring

• Related Work

• Contribution

• Method

• Conclusion

Outline

Ahmed Osman

• 3D CNN end-to-end voxel-wise prediction

• Same network architecture for all three challenges.

• Introduces an approach for training with limited data.

Contribution

Ahmed Osman

– Video Coloring

• Related Work

• Contribution

• Method

• Conclusion

Outline

Ahmed Osman

• Input: Channels x # of Frames x Height x Width

• Output: K x # of Frames x Height x Width

Recap: Problem

Segmentation done by http://segmentit.sourceforge.net/http://barkpost.com/wp-content/uploads/2013/03/oie_5181838bU3HJXJp.gif

Ahmed Osman

• Adapted from C3D

• Main Difference:

Method

Learning Spatiotemporal Features with 3D Convolutional NetworksDu Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

Ahmed Osman

• Adapted from C3D

• Main Difference: Added deconvolution layers

Method

Learning Spatiotemporal Features with 3D Convolutional NetworksDu Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

Ahmed Osman

Deconvolution

Visualizing and Understanding Convolutional Networks

Matthew D Zeiler, Rob Fergus

Layer 1 Layer 2

Ahmed Osman

Deconvolution

Ahmed Osman

Deconvolution

Ahmed Osman

Deconvolution

Ahmed Osman

Deconvolution

Upsampling

Learnable DeconvolutionVisualization Deconvolution

Ahmed Osman

– Video Coloring

• Related Work

• Contribution

• Method

• Conclusion

Outline

Ahmed Osman

• Optical Flow Estimation

• Video Coloring

Experiments and Results

Ahmed Osman

• Dataset: – GATECH dataset

– Training set: 63 videos

– Test set: 38 sequences

– 8 Classes

Experiments: Video Semantic Segmentation

Geometric Context from Videos. Hussain Raza Matthias Grundmann Irfan Essa

Ahmed Osman

• Experiment: – Training:

• Split each video into all possible clips of length 16 frames (i.e. stride:1).

– Testing:• Performed on all non-overlapping clips (i.e. stride: 16).

Geometric Context from Videos. Hussain Raza Matthias Grundmann Irfan Essa

16 frames16 frames

Ahmed Osman

• Experiment:

– Network details (V2V):• Loss layer: Softmax

• Weights initialized from C3D. New layers are randomly initialized.

• Initial learning rate: 10-4, divided by 10 every 30K iterations

Ahmed Osman

Results: Video Semantic Segmentation

Ahmed Osman

Bilinear

Ahmed Osman

Bilinear

Ahmed Osman

Bilinear

Ahmed Osman

Smooth

Ahmed Osman

• Video Coloring

Experiments

Ahmed Osman

• Training:– Problem:

• No large dataset with optical flow ground truth.

– Solution?• Fabricate “semi-truth” from an existing optical flow method.

• Brox’s method was used.

– Dataset: • (V2V) UCF101 (Partial: test split 1)

• (Fine-tuned V2V) MPI-Sintel

• Network:– Loss function: Huber loss

– Initial learning rate: 10-8, divided by 10 every 200K iterations

Experiments: Optical Flow Estimation

Ahmed Osman

• Testing:– MPI-Sintel

Results: Optical Flow Estimation

Input V2V Brox Ground truth

Ahmed Osman

Input V2V Brox Ground truth

Ahmed Osman

• Fine-tuning from C3D does not improve a lot.

• Same Architecture, Different Purpose

Ahmed Osman

• Video Coloring

Experiments

Ahmed Osman

• Dataset:– UCF101

– Convert color videos to grayscale.

• Experiment: – Training:

• Loss function: L2

• Initial learning rate: 10-8, divided by 10 every 200K iterations

Experiments: Video Coloring

Ahmed Osman

Network Average Distance Error (ADE)

2D-V2V 0.1495

V2V 0.1375

Results: Video Coloring

Ahmed Osman

• V2V learns “common sense” colors

Ground TruthV2V

Ahmed Osman

Ground TruthV2V

Ahmed Osman

Ground TruthV2V

Ahmed Osman

Ground TruthV2V

Ahmed Osman

– Video Coloring

• Related Work

• Contribution

• Method

• Conclusion

Outline

Ahmed Osman

• Contributions:– 3D CNN end-to-end voxel-wise prediction

– “Same” network architecture for all three challenges.

– Utilizes a well-established method to generate training data.

• Criticisms– Fine-tuning improved the result in OF, noticeably in

comparison with Brox’s method

– No mention activation function even in C3D

Conclusion

Ahmed Osman

Thank You

for Listening

Questions?

Ahmed Osman

• “Deep End2End Voxel2Voxel Prediction”– Tran et al. 2015

• “Flownet: Learning optical flow with convolutional networks”– Fischer et al. 2015

• “Imagenet classification with deep convolutional neural networks”– Krizhevsky et al. 2012

• “Learning spatiotemporal features with 3d convolutional networks”– Tran et al. 2015

• “Visualizing and understanding convolutional networks”– Zeiler et al. 2014

• “Fully convolutional networks for semantic segmentation”– Long et al. 2015

• “Depth map prediction from a single image using a multi-scale deep network”

– Eigen et al. 2014

• “Large displacement optical flow: Descriptor matching in variational motion estimation”

– Brox et al. 2011

References

Ahmed Osman

Backup Slides

Ahmed Osman

• A perceptron is a linear classifier that utilizes a set of weights to predict an output for a feature vector.

Multi-layer Perceptron

https://blog.dbrgn.ch/images/2013/3/26/perceptron.png

deep end2end voxel2voxel prediction · ahmed osman • motivation –“convolutional neural...

Documents

cpdlc end2end description

thunderx2: end2end from silicon to system delivery€¦ ·...

wachovia 100 pager on mlps

mlps 2.0: insights and strategies · 2020-04-06 · mlps...

mlps with backpropagation learning

improve e-commerce performance via end2end process...

citrix end2end - xendesktop

mortgage revolutionized

goldman sachs power, utilities, mlps and pipelines...

improving end2end performance for the columbia...

skin care revolutionized

end2end monitoring integration in check mk check …...page...

end2end monitoring integration in check mk check mk...

guide to oil and gas mlps

surprisingly, these mlps still have hidden upside

mlps rocketed higher this week

simon meggle - open source end2end monitoring with sakuli...

end2end enterprise architecture - application architecture

big data, mobility, cloud: end2end - it security and privacy

cpdlc end2end description - care...