![Page 1: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/1.jpg)
CAP6412AdvancedComputerVision
http://www.cs.ucf.edu/~bgong/CAP6412.html
Boqing GongMarch 15,2016
![Page 2: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/2.jpg)
Lastweek:Spring Break
• ECCV• OneofthetopconferencesonComputerVision• Everyotheryear(2016,2014,2012,…)• OnparwithCVPR(everyyear),ICCV(everyotheryear,2015,2013,…)• ~2000submissions,~22%acceptancerate
• Double-blindpeerreview• ConferenceattendeesvoteforProgramChairs(PCs)• PCsselectAreaChairs(ACs)anddistributepaperstoACs• ACsassigneachoftheirpaperstoatleastthreeReviewers• Reviewersreadandevaluatepapers• Authorsrespondtoreviewers• Reviewersreadauthorresponsesanddiscussthepapers• ACsmeettogetheranddecidewhethertoacceptortorejectpapers(orals,posters)
![Page 3: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/3.jpg)
Lastweek:Spring Break
• AtypicalprogramofCVPR/ECCV/ICCV• Sunday:Tutorials• Monday—Thursday:Oralpresentations,spotlights,posterpresentations• Friday—Saturday:Workshops
![Page 4: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/4.jpg)
Thisweek:VisionandLanguage
Tuesday(03/15)
Fareeha Irfan
[Book2Movie] Tapaswi,Makarand,MartinBauml,andRainerStiefelhagen."Book2movie:Aligningvideosceneswithbookchapters."InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pp.1827-1835.2015.& Secondary papers
Thursday(03/17)&Next Tuesday(03/22)
ShreyasSomashekar
[Visual Genome] Krishna, Ranjay, Yuke Zhu, Oliver Groth, JustinJohnson, Kenji Hata, Joshua Kravitz, Stephanie Chen et al. "VisualGenome: Connecting Language and Vision Using Crowdsourced DenseImage Annotations."arXiv preprint arXiv:1602.07332 (2016).& Secondary papers
![Page 5: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/5.jpg)
Assignment 9:Dueon03/22, 12pm
• ReviewthefollowingpaperusingthePaperReviewTemplateathttp://www.cs.ucf.edu/~bgong/CAP6412/Review.docx.
[Visual Genome] Krishna, Ranjay, Yuke Zhu, Oliver Groth, JustinJohnson, Kenji Hata, Joshua Kravitz, Stephanie Chen et al. "VisualGenome: Connecting Language and Vision Using Crowdsourced DenseImage Annotations." arXiv preprint arXiv:1602.07332 (2016).
![Page 6: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/6.jpg)
Nextweek:DAG-CNN&Transferability
Tuesday(03/22)
Niladri Basu Bal
[DAG-CNN] Yang,Songfan,andDevaRamanan."Multi-scalerecognitionwithDAG-CNNs."InProceedingsoftheIEEEInternationalConferenceonComputerVision,pp.1215-1223.2015.& Secondary papers
Thursday(03/24)
Mert Ozerdem
[Transferability] Yosinski, Jason, Jeff Clune, Yoshua Bengio, and HodLipson. “How transferable are features in deep neural networks?.” InAdvances in Neural Information Processing Systems, pp. 3320-3328.2014.& Secondary papers
![Page 7: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/7.jpg)
What’snext
![Page 8: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/8.jpg)
What’snext
• Signupforvolunteerpresentationsat• https://docs.google.com/spreadsheets/d/1DxMQ_RVMx8BLmc5gij51dtXI2ZJzktR8EzJJZkVpGW4/edit#gid=0
• Suggestpapersyouwouldliketoread,share,challenge
![Page 9: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/9.jpg)
Today
• Administrivia• RecurrentNeuralNetworks(RNNs)(II)• OCR in the wild,byAisha
![Page 10: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/10.jpg)
(Discrete-time)RNN
• Threetimestepsandbeyond
Imagecredits:RichardSocher
![Page 11: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/11.jpg)
(Discrete-time)RNN
• Threetimestepsandbeyond • Alayeredfeedforward net• Tiedweights fordifferenttimesteps
• Conditioning (memorizing?)onallpreviousinput
• Cheap to save memoryinRAM
Imagecredits:RichardSocher
![Page 12: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/12.jpg)
Detour: Hidden Markov Model
• A probabilistic model of sequences
• Emission probability:• Transition probability:• Initial probability:
Imagecredits:ErikSudderth
![Page 13: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/13.jpg)
Detour: Hidden Markov Model
• Useful for modeling sequences• Discrete hidden states, which satisfy Markov assumption
• Inference and learning (optional)• Evaluation: forward probability• Decoding: forward-backward algorithm, Viterbi decoding• Learning: EM algorithm (Baum-Welch)
Imagecredits:ErikSudderth
![Page 14: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/14.jpg)
Begin: Slides from Geoffrey Hinton
![Page 15: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/15.jpg)
HiddenMarkovModels(computerscientistslovethem!)• HiddenMarkovModelshaveadiscreteone-of-Nhiddenstate.Transitionsbetweenstatesarestochasticandcontrolledbyatransitionmatrix.Theoutputsproducedbyastatearestochastic.
• Wecannotbesurewhichstateproducedagivenoutput.Sothestateis“hidden”.
• ItiseasytorepresentaprobabilitydistributionacrossNstateswithNnumbers.
• Topredictthenextoutputweneedtoinfertheprobabilitydistributionoverhiddenstates.
• HMMshaveefficientalgorithmsforinferenceandlearning.
output
output
output
time à
![Page 16: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/16.jpg)
AfundamentallimitationofHMMs• ConsiderwhathappenswhenahiddenMarkovmodelgeneratesdata.
• Ateachtimestepitmustselectoneofitshiddenstates.SowithNhiddenstatesitcanonlyrememberlog(N)bitsaboutwhatitgeneratedsofar.
• Considertheinformationthatthefirsthalfofanutterancecontainsaboutthesecondhalf:
• Thesyntaxneedstofit(e.g.numberandtenseagreement).• Thesemanticsneedstofit.Theintonationneedstofit.• Theaccent,rate,volume,andvocaltractcharacteristicsmustallfit.
• Alltheseaspectscombinedcouldbe100bitsofinformationthatthefirsthalfofanutteranceneedstoconveytothesecondhalf.2^100isbig!
![Page 17: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/17.jpg)
Recurrentneuralnetworks• RNNsareverypowerful,becausetheycombinetwoproperties:
• Distributedhiddenstatethatallowsthemtostorealotofinformationaboutthepastefficiently.
• Non-lineardynamicsthatallowsthemtoupdatetheirhiddenstateincomplicatedways.
• Withenoughneuronsandtime,RNNscancomputeanythingthatcanbecomputedbyyourcomputer. input
input
input
hidden
hidden
hidden
output
output
outputtime à
![Page 18: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/18.jpg)
Dogenerativemodelsneedtobestochastic?• LineardynamicalsystemsandhiddenMarkovmodelsarestochasticmodels.
• Buttheposteriorprobabilitydistributionovertheirhiddenstatesgiventheobserveddatasofarisadeterministicfunctionofthedata.
• Recurrentneuralnetworksaredeterministic.
• SothinkofthehiddenstateofanRNNastheequivalentofthedeterministicprobabilitydistributionoverhiddenstatesinalineardynamicalsystemorhiddenMarkovmodel.
![Page 19: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/19.jpg)
Recurrentneuralnetworks• WhatkindsofbehaviourcanRNNsexhibit?
• Theycanoscillate.Goodformotorcontrol?• Theycansettletopointattractors.Goodforretrievingmemories?• Theycanbehavechaotically.Badforinformationprocessing?• RNNscouldpotentiallylearntoimplementlotsofsmallprogramsthateachcaptureanuggetofknowledgeandruninparallel,interactingtoproduceverycomplicatedeffects.
• ButthecomputationalpowerofRNNsmakesthemveryhardtotrain.• FormanyyearswecouldnotexploitthecomputationalpowerofRNNsdespitesomeheroicefforts(e.g.TonyRobinson’sspeechrecognizer).
![Page 20: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/20.jpg)
End: Slides from Geoffrey Hinton
![Page 21: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/21.jpg)
Today
• Administrivia• RecurrentNeuralNetworks(RNNs)(II)• OCR in the wild,byAisha
![Page 22: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/22.jpg)
Uploadslidesbeforeorafterclass
• See“PaperPresentation”onUCFwebcourse
• Sharingyourslides• Refertotheoriginalssourcesofimages,figures,etc.inyourslides• ConvertthemtoaPDFfile• UploadthePDFfileto“PaperPresentation”afteryourpresentation
![Page 23: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/23.jpg)
Aligning Books and Movies: Towards Story-like Visual
Explanations by Watching Movies and Reading Books
Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, Sanja Fidler
![Page 24: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/24.jpg)
Motivation
Books provide rich fine-grained information:
● Appearance: how a character, an object or a scene looks like● High-level semantics: what someone is thinking, feeling and how these
states evolve through a story
Aim: Align books with its movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in current datasets.
![Page 25: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/25.jpg)
Overview of Evaluations in this work
● Evaluate similarity between sentences, trained by a corpus of Books
● Similarity between Movie clips and sentences in Books, via learnt embedding
● Weaving “context” of sentences into similarity evaluation using a 3-layer CNN
● Timeline wise alignment between books and movie using Conditional Random Field
![Page 26: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/26.jpg)
Demonstrated Results
● Movie/Book Alignment
● Describing a particular movie shot via its corresponding explanation in Book
● Given a Movie, retrieve its corresponding Book from a corpus of Books
● Caption a Movie clip w/ a paragraph from “any” Book
● Caption MS-COCO images using paragraph from Books
![Page 27: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/27.jpg)
Related Work
● Early work on movie-to-text alignment include dynamic time warping for aligning movies to scripts with the help of subtitles [5, 4].
● Sankar et al. [28] further developed a system which identified sets of visual and audio features to align movies and scripts without making use of the subtitles.
● Aligning plot synopses to shots in the TV series for story-based content retrieval. This work adopts a similarity function between sentences in plot synopses and shots based on person identities and keywords in subtitles.
![Page 28: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/28.jpg)
ChallengesPlot synopsis closely follow the storyline of movies, books are more verbose and might vary in the storyline from their movie release.
● Parallel to our work: [BOOK2MOVIE]○ Tapaswi et al. aims to align scenes in movies to chapters in the book.
■ Coarse approach: Operates on chapter-level ■ Dataset evaluates on 90 scene-chapter correspondences,■ Matches the presence of characters in a scene to those in a chapter■ Uses hand-crafted similarity measures between sentences in the subtitles
and dialogs in the books
○ This paper: ■ Our approach: sentence/paragraph level.■ Dataset draws 2,070 shot-to-sentences alignments.■ We remove character names to learn semantics and context better■ CNN learnt similarity function
![Page 29: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/29.jpg)
Dataset and Ground TruthAnnotators find correspondences:
● Mark exact time in movie to line number of beginning of the matched sentence.
○ If a shot is longer: Indicate time of ending.
○ If description in more lines: Indicate the last line.
● Tag alignment: ‘visual’, ‘dialogue’ or ‘audio’ match
Total: 11 movie/book pairs, 2070 correspondences
![Page 30: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/30.jpg)
Training Sentence Similarity modelR. Kiros, Y. Zhu, R. Salakhutdinov, R. S. Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-Thought Vectors. In Arxiv, 2015
Overview
Sentence i is encoded. Conditioning on this the model tries to reconstruct the sentence before and after it.
Motivation
Sentences that have similar surrounding context are likely to be both semantically and syntactically similar.
![Page 31: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/31.jpg)
“Thought Vectors”
The term is popularized by Geoffrey Hinton (Google).
word vector:
● represents the word’s meaning, relative to others (context)● linked by grammar
thought vector:
● represents a thought, relative to others● linked by chain of reasoning
Goal: Feed enough data (thoughts) to NN, enabling it to mimic those thoughts
![Page 32: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/32.jpg)
Sentence Embedding Model using Skip Thought
Training Data: Corpus of 11,038 books from the web
![Page 33: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/33.jpg)
![Page 34: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/34.jpg)
![Page 35: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/35.jpg)
![Page 36: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/36.jpg)
Encoder
![Page 37: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/37.jpg)
Decoder
![Page 38: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/38.jpg)
Encoder Decoder
![Page 39: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/39.jpg)
![Page 40: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/40.jpg)
Aligning Book w/ Movie
Inspired from the Multimodal Neural Model in Kiros et al.
Our approach: Embedding learnt
Training Data
● Each Clip Description:○ Descriptive Video Service dataset used to learn embedding○ 94 movies, 54000 described clips○ Pre-processing: Replace names with token ‘someone’
● Each Movie Clip vector: ○ mean-pooled features (GoogLeNet and hybrid-CNN) across each frame in the clip
![Page 41: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/41.jpg)
Aligning Book w/ Movie
Vector Representation of description using LSTM:
The states correspond to input, forget, cell, output and memory vectors for embedding word xt of sentence at time t.
is vector representation of sentence of length N.
![Page 42: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/42.jpg)
Aligning Book w/ Movie
Let q be a movie clip vector and its embedding
Scoring Function: s(m,v) = m . v
Optimize pairwise ranking loss:
where, mk is non-descriptive vector for embedding v and vk contrastive clip vector for sentence vector m
Model trained with Stochastic Gradient Descent w/o momentum
![Page 43: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/43.jpg)
Context Aware SimilarityLocal-level ambiguity:
Despite being a dark novel, Gone Girl has 15 instances of “I love you”. Match not isolated from surrounding context .
To compute dialogue similarity:
● BLEU: to find near identical ● Tf-idf (term frequency–inverse document frequency): find duplicates but weighting
down less frequent words ● Skip-thought: to find semantically similar paraphrased sentences
![Page 44: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/44.jpg)
Context Aware Similarity
To obtain Similarity, we take into account:
● Individual similarity measures● Fixed context window, in movie and book
Stack a set of M similarity measures into a tensor S(i, j, m), where
i: indices of sentences in the subtitle
j: in the book
m: individual similarity measures
![Page 45: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/45.jpg)
Context Aware SimilarityM = 9 similarities measures used:
● Visual and sentence embedding ● BLEU1-5 ● tf-idf● A uniform prior
To predict a combined score(i, j) = f(S(I, J,M)) at each location (i, j) based on all similarity measures:
3-layer Convoluted Neural Network, with ReLU nonlinearity and dropout.
Cross-entropy optimized over training with Adam’s Algorithm
![Page 46: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/46.jpg)
Context Aware Similarity
![Page 47: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/47.jpg)
Global Movie/Book AlignmentMost scenes follow a timeline.
Dynamic time warping not suitable since storyline can have crossings in time
![Page 48: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/48.jpg)
Global Movie/Book AlignmentMovie/Book alignment modelled as inference in a Conditional Random Field
● Each node yi: alignment (shot w/ subtitle, sentence in the book)● State space: set of all sentences in the book. ● CRF energy:
K: number of nodes (shots)N (i): the left and right neighbor of yiφu(·) unary potential: output of CNNψp(·) pairwise potentials measured by
ds(yi , yj ) time span between two neighbouring sentences in the subtitledb(yi , yj ) distance of their state space in the bookσ 2 is a robustness parameter to avoid punishing giant leaps too harsh
Pairwise potential to sure state consistency and incorporating long silence in the movie
![Page 49: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/49.jpg)
Evaluation
Model: CNN + CRFDataset: 11 Books/MoviesTraining Data: 1 Book/Movie Gone GirlTest Data: Remaining 10 Movies
Recalled paragraph/shot considered Ground Truth, if:
● Paragraph at most 3 paragraphs away● Shot was at most 5 subtitles away
** Average Precision reported at multiple alignment thresholds
![Page 50: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/50.jpg)
Evaluation: Movie/Book Alignment
![Page 51: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/51.jpg)
Evaluation: Describing Movie via Book
![Page 52: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/52.jpg)
Evaluation: Describing Movie via Book
![Page 53: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/53.jpg)
Evaluation: Book Retrieval
![Page 54: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/54.jpg)
Evaluation: Describe Shot via other Books
![Page 55: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/55.jpg)
Evaluation: MSCOCO Image captioning
![Page 56: CAP 6412 Advanced Computer Vision - UCF CRCV · 2019-03-27 · Next week: DAG-CNN & Transferability Tuesday (03/22) NiladriBasuBal [DAG-CNN]Yang, Songfan, and Deva Ramanan."Multi-scale](https://reader033.vdocuments.us/reader033/viewer/2022042309/5ed6ebafff4a11075f770c2a/html5/thumbnails/56.jpg)
Thank you.