cap 6412 advanced computer vision - cs departmentbgong/cap6412/lec3.pdfcap 6412 advanced computer...
Post on 07-May-2018
230 Views
Preview:
TRANSCRIPT
CAP6412AdvancedComputerVision
http://www.cs.ucf.edu/~bgong/CAP6412.html
Boqing GongJan19,2016
Today
• Administrivia• Neuralnetworks&backpropagation(PartII)• DeepresiduallearningbyDustin
Assignment1dueat3pm,01/21(Thursday)
• Reviewthefollowingpaper
[Visualization] Zeiler,MatthewD.,andRobFergus.“Visualizingandunderstandingconvolutionalnetworks.”InComputerVision–ECCV2014,pp.818-833.SpringerInternationalPublishing,2014.
Templateforpaperreview:http://www.cs.ucf.edu/~bgong/CAP6412/Review.docx
Use“latehomeworkpolicy”wisely
- Threelatedaysintotalforallreportsandprojects- Countingatthegranularityof12hours- Noadditionallatedays
• Somearelatefortheone-pointassignment“TopicPreferenceList”• Tolose1point?(Default)• OR,toearn1pointandtotriggerthelatehomeworkpolicy?(Sendmeemail)
Email--- thebestwaytoreachme
• bgong@crcv.ucf.edu (preferred)• DONOTleavemessagesundermyannouncements
• Put[CAP6412] insubjectline• Summarizemessageinsubjectline• Ex:[CAP6412]Meetingrequest:Thursday(Jan14)4:30pm?
Officehoursofthisweek
• Tuesday:4:30—5:30pmà Thursday:4:30—5:30pm• HEC214
Thisweek:CNNvisualizatin &objectrecognition
Tuesday(01/19)
DustinMorley
[ILSVRC] Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause,Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. “Imagenet largescale visual recognition challenge.” International Journal of ComputerVision (2014): 1-42.[152 layers] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.“Deep Residual Learning for Image Recognition.” arXiv preprintarXiv:1512.03385 (2015).
Thursday(01/21)
Jason Tiller
[Visualization] Zeiler, Matthew D., and Rob Fergus. “Visualizing andunderstanding convolutional networks.” In Computer Vision–ECCV2014, pp. 818-833. Springer International Publishing, 2014.Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and AntonioTorralba. “Object detectors emerge in deep scene cnns.” arXivpreprint arXiv:1412.6856 (2014).
Nextweek:CNN&objectlocalizationTuesday(01/26)
SamerIskander
J. Hosang, R. Benenson, and B. Schiele. How good are detectionproposals, really? BMVC 2014.{Major} J. Hosang, R. Benenson, P. Dollár, and B. Schiele.What makesfor effective detection proposals?PAMI 2015.{Major} [Faster R-CNN] Ren, Shaoqing, Kaiming He, Ross Girshick,and Jian Sun. “Faster R-CNN: Towards real-time object detection withregion proposal networks.” In Advances in Neural InformationProcessing Systems, pp. 91-99. 2015.
Thursday(01/28)
Syed Ahmed
{Major}[R-CNN] Girshick,Ross,JeffDonahue,TrevorDarrell,andJagannathMalik."Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation."InComputerVisionandPatternRecognition(CVPR),2014IEEEConferenceon,pp.580-587.IEEE,2014.[FastR-CNN] Girshick,Ross."FastR-CNN."arXiv preprintarXiv:1504.08083 (2015).
LinkhasbeensenttoyourUCFemails
Today
• Administrivia• Neuralnetworks&backpropagation(PartII)• FundamentalsofConvolutionalNeuralNetworks(CNN),byFareeha
Review:biologicalneurons
• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold
Imagecredit:cs.stanford.edu/people/eroberts
Review:biologicalneurons
• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold
Imagecredit:cs.stanford.edu/people/eroberts
Review:artificialneurons/perceptrons
• Aneuronfiresifthesumofweightedinputsexceedssomethreshold
Imagecredit:www.hiit.fi/u/ahonkela/dippa/node41.html
y = '(nX
i=1
wixi + b)
= '(wTx+ b)
'(·) : activation function
Constructingneuralnetworksfromneurons
• Humanbrainshasabout10billionnuerons• Eachconnectedto10Kotherneurons• Aneuronfiresifthesumofelectrochemicalinputsexceedssomethreshold
Imagecredit:cs.stanford.edu/people/eroberts
Basicnetworkstructures
• Feed-forwardnetworks • Recurrentneuralnetworks
Imagecredit:http://mesin-belajar.blogspot.com/2016/01/a-brief-history-of-neural-nets-and-deep_84.html
Imposingdesiredproperties
• Totuneittowardsdesiredproperties
• E.g.,forbinaryclassification• Outputbetween0and1• Tellstheprobabilityoftheinputxbelongingtoeitherclass+1/-1
Imagecredit:Farid E Ahmed
Acasestudy
• Binaryclassification• Outputbetween0and1• Tellstheprobabilityoftheinputxbelongingtoeitherclass+1/-1
• Step1:choosenetworkstructure• Step2:chooseactivationfunction• Step3:determinethemodelparameters𝚯,
tomeetdesiredproperties
Imagecredit:Farid E Ahmed
-10 -5 0 5 10-1
-0.5
0
0.5
1Binary step
-10 -5 0 5 10-1
-0.5
0
0.5
1Logistic
-10 -5 0 5 10-1
-0.5
0
0.5
1TanH
-10 -5 0 5 100
2
4
6
8
10Rectified Linear Unit (ReLU)
'(x) =
1
1 + exp(�x)
Learningthemodelparameters𝚯 (1)
• Isequivalentto
• where,
• Questions:
Binary classification concept: c : X 7! Y = {0, 1}Hypotheses H = {net(⇥)|⇥d 2 R}
Choose one hypothesis h 2 H to approximate concept c
c is unknown
c 2 H?
EmpiricalRiskMinimization(ERM)
Learningthemodelparameters𝚯 (2)
• Isequivalentto
• Canbeimplementedby
Choose one hypothesis h 2 H to approximate concept c
⇥
? argmin
⇥R(⇥)
R(⇥) = Pr(net(x;⇥) 6= y) = E(x,y)⇠PXY[net(x;⇥) 6= y]
P
XY
is the underlying distribution of (x,y)
ß Calledthegeneralizationrisk
Nextclass
⇥? argmin⇥
E(x,y)⇠PXY[net(x;⇥) 6= y]
Today
• Administrivia• Neuralnetworks&backpropagation (PartI)• DeepresiduallearningbyDustin
Deep Residual Learning for Image RecognitionAuthors: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun (Microsoft Research)
Presented by Dustin Morley
About the paper
´ NOT peer-reviewed – published on arXiv (Dec. 2015)
´ Well supported claim: “We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.”
´ Questionable claim: “…residual nets with a depth of up to 152 layers – 8x deeper than VGG nets but still having lower complexity”´ Claim of lower complexity is not convincingly supported
´ My rating: 2´ Great innovation with high significance, but claims and experimental data
presentation are not organized that well.
Main Contributions
´ Proposed a novel approach to resolve the issue of performance degradation with increased depth
´ Obtained excellent object recognition and localization results´ Ensemble network on ImageNet dataset – 3.57% top-5 classification error
´ 101-layer ResNet on COCO validation set (object detection) – 27.2% mAP@[0.5, 0.95]
´ Won 1st place in several tracks in ILSVRC and COCO 2015 competitions´ ImageNet detection
´ ImageNet localization
´ COCO detection
´ COCO segmentation
Outline
´ Background – theoretical and experimental
´ Problem – NN scalability with added layers
´ Solution – Residual Learning via Identity Mapping “Shortcuts”
´ Experimental Results
´ Conclusion, evaluation, and future directions
Background
´ Convolutional Neural Networks´ Layers – Conv., pool,
Conv., pool, Conv., pool…
´ Conv./pool results propogated forward
´ Classification error propogated backward´ Each layer computes
error derivatives WRT its parameters
Image Credit: Oxford Visual Geometry Group
Background – ImageNet 2012
´ Dataset for image classification
´ 1000 classes
´ 1.28 million training images
´ 50k validation images
´ 100k test images (final results)
´ Top-1 and top-5 error rates
Background – CIFAR-10 Testing
´ Dataset for image classification
´ Images are small (32x32, color)
´ 10 classes
´ 50k training images
´ 10k test images (final result)
Background – MS COCO Testing
´ Dataset for object detection
´ 80 Object Categories
´ 80k training images
´ 40k test images
´ Detailed manual segmentations of images
´ Evaluation metrics revolve around mean average precision (mAP) and intersection over union (IoU)´ Partition results into different classes of IoU ([0.5,0.55], [0.55, 0.6], … [0.95, 1]
´ Compute average precision for each class
´ Compute mean of the average precisions over all classes
Background – PASCAL VOC Testing
´ Dataset for object detection
´ 16k training images from VOC 2012
´ First test set – 5k test images from VOC 2007
´ Second test set – 10k test images from VOC 2007
´ Evaluation metric – similar to MS COCO (but not exactly the same)
Problem – about adding layers…
´ Theory – only overfitting
´ Practice – multiple issues´ Convergence (mostly solved by normalization layers)
´ Accuracy degradation (Training accuracy degrades!!!)
Too many layers?
´ Theory – more layers should never harm training performance´ Take solution for m layers. Add more layers configured such that they only
perform identity operation – same performance.
´ Thus, equivalent or better solution always exists when more layers are added
´ Implication – optimization methods cannot handle too many layers
´ Need a reformulation of extra layers that makes optimization easier
Solution: Residual Network
´ Conjecture: difficult for optimization to deduce “unneeded layers” ´ Equivalently: difficult to determine a
layer should be an identity mapping
´ Recast initial condition so that result under identity mapping is visible
´ Use “shortcuts” to go “around” layers in addition to going “through” them
´ Mathematically: minimize F(x)+x instead of just minimizing F(x)
Residual Network Comparison
Implementation Details - ImageNet
´ Image resized with 1 randomized dimension for scale augmentation
´ Fixed size crop randomly sampled from an image or its horizontal flip
´ Per-pixel mean subtracted
´ Color augmentation´ According to: A. Krizhevsky et al, Imagenet classification with deep convolutional
neural networks, NIPS, 2012
´ Batch normalization right after each convolution, before activation
´ Learning rate divided by 10 when error plateaus
ImageNet Results – Direct Comparison
´ ImageNet
´ “Plain” network top-1 error: 27.94% for 18 layers, 28.54% for 34 layers
´ Residual network top-1 error: 27.88% for 18 layers, 25.03% for 34 layers
ImageNet Results – High Scalability
Configuration differences for A, B, and C are regarding how the “shortcuts” handle changes in dimensionality (A = zero padding, B = projection applied for increasing dimensions only, C = projection always applied)
ImageNet Results - Ensemble
´ ResNet ensemble built from 6 models of different depth (only 2 of the models are depth 152)
CIFAR-10 Results
Object Detection Results
´ PASCAL VOC ´ MS COCO
Conclusion
´ Blindly increasing depth of CNNs can lead (counterintuitively) to decreases in performance rather than increases
´ The residual “shortcut” approach allows benefit to be gained from increasing depth of CNNs
´ Networks built by the authors on this principle performed very well, winning 1st place in several competitions
Evaluation
Strengths´ Novel idea
´ Solves interesting and important problem
´ Approach should be able to be “dropped in” to virtually any CNN design
´ Authors obtained very good performance
Weaknesses´ Questionable statements about
complexity
´ Some parts in the presentation of results are confusing
´ Certain direct comparisons of results didn’t seem particularly meaningful
Future Directions
´ Are there other types of shortcuts in addition to the identity mapping shortcut that could further improve performance?´ Could these be inferred by studying the nonlinear mappings output by successful
small-depth networks?
´ Insert the identity mapping shortcuts into other neural networks.´ Results section pinned ResNet “against” vgg and other networks. This seems like a
tyranny of either/or.
top related