international conference on computer vision (iccv 2017) li...
TRANSCRIPT
![Page 1: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/1.jpg)
Presented by:Gursimran SinghBorna Ghotbi{msimar,bgotbi}@cs.ubc.ca
Inferring and executing programs for Visual Reasoning
Justin Johnson, Bharath Hariharan, Laurens Maaten, Judy Hoffman, Li Fei-Fei, C.Lawrence Zitnick, Ross Girshick
Stanford University, Facebook ResearchInternational Conference on Computer Vision (ICCV 2017)
CPSC 532L presentation
1
![Page 2: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/2.jpg)
Visual question answering
LSTM
CNN
MERGE Predict
Woman
Who is wearing glasses?
Antol etal. Vqa: Visual question answering : Proceedings of the IEEE International Conference on Computer Vision2
● Generalizes well to new kinds of questions○ who is wearing spectacles; how many people?
![Page 3: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/3.jpg)
Compositional visual reasoning
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
3Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017
![Page 4: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/4.jpg)
Compositional visual reasoning
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
4
Identify big sphere
Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017
![Page 5: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/5.jpg)
Compositional visual reasoning
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
5
Identify big sphere
Spheres on left
Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017
![Page 6: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/6.jpg)
Compositional visual reasoning
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
6
Identify big sphere
Spheres on left
Rubber cylinder
Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017
![Page 7: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/7.jpg)
Compositional visual reasoning
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
7
Identify big sphere
Spheres on left
Rubber cylinder
Sphere of same color
Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017
![Page 8: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/8.jpg)
Compositional visual reasoning
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder? A:1
8
Identify big sphere
Spheres on left
Rubber cylinder
Sphere of same color
Count
Johnson, Justin, et al. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. (CVPR), 2017
![Page 9: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/9.jpg)
Standard VQA?Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
LSTM
CNN
MERGE Predict
Answer?
9
● Can’t model complex questions
LIMITATIONS
![Page 10: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/10.jpg)
Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
10
● Can’t model complex questions● Lacks composition
LIMITATIONS
LSTM
CNN
MERGE Predict
Answer?
![Page 11: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/11.jpg)
Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
Cylinder?
Sphere?
Predict
Answer?
11
● Can’t model complex questions● Lacks composition
LIMITATIONS
Move Left
Spheres of same
color
Decompose the network into multiple modules
![Page 12: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/12.jpg)
Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
● Can’t model complex questions● Lacks composition● Uses same structure
Q: How many objects are either red cylinders or metal objects?
LIMITATIONS
12
LSTM
CNN
MERGE Predict
Answer?
![Page 13: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/13.jpg)
Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
● Can’t model complex questions● Lacks composition● Uses same structure
Q: How many objects are either red cylinders or metal objects?
LIMITATIONS
13● Use composition and structure
Solution
B
AH
GC
D
B
AH
G
D
Use separate networks for each question
![Page 14: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/14.jpg)
Instead: consider a compositional model
Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder? Attributes identification
Counting objectsComparisonsSpatial relationshipsLogical operations
Q: Is the big sphere the same material as the thing on the right of the cube?
Network architecture corresponding to the
third question
Common operations
14
![Page 15: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/15.jpg)
Overview of approach
15Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk
![Page 16: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/16.jpg)
Overview of approach
16Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk
![Page 17: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/17.jpg)
Module networks
17Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk
NLPSemantic
Parser
![Page 18: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/18.jpg)
Module networks
18Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk
NLPSemantic
Parser
18
![Page 19: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/19.jpg)
Modules recap
19
Andreas etal; Deep Compositional Question Answering with Neural Module Networks: arxiv 2017
![Page 20: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/20.jpg)
Module networks - limitations
20Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk
NLPSemantic
Parser
20
Trained separatelyUses some pre-trained parser
![Page 21: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/21.jpg)
Inferring and executing programs
21Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk
Trained end-end!!!
![Page 22: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/22.jpg)
Inferring and executing programs
22Graphics take from -> https://www.youtube.com/watch?v=3pCLma2FqSk
![Page 23: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/23.jpg)
Execution engine
23
![Page 24: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/24.jpg)
Modules architectures
24
a) Visual feature extraction
b.1) Unary modules
b.2) Binary modules
d) Classifier
![Page 25: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/25.jpg)
What do the modules learn?
25
![Page 26: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/26.jpg)
Training● Train Program Generator● Freeze Program Generator,
Train Execution Engine● Finetune
26
Reinforce
![Page 27: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/27.jpg)
Clever datasetA training set of 70,000 images and 699,989 questions
● A validation set of 15,000 images and 149,991 questions
● A test set of 15,000 images and 14,988 questions
● Answers for all train and val questions
● Scene graph annotations for train and val images giving ground-truth
locations, attributes, and relationships for objects
● Objects can be cubes, cylinders and spheres.27
![Page 28: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/28.jpg)
Experiments: Baselines
28
![Page 29: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/29.jpg)
Experiments: Strongly and semi-supervised learning
29
![Page 30: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/30.jpg)
Experiments
30
Generalizing to new attribute combinations
![Page 31: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/31.jpg)
Experiments
31
Generalizing to new question types
Short: all questions which their questions family has a mean program length less than 16
Long: otherwise
![Page 32: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/32.jpg)
Experiments
32
The CLEVR-Humans Dataset
● Use of questions that are hard to answer for a “smart robot”● Filtered questions by asking three workers to answer them and removing
those that a majority of workers answers incorrectly.● About 17000 training questions and 7000 validation and test questions on● CLEVR images.
![Page 33: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/33.jpg)
Experiments
33
Human-composed questions
![Page 34: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/34.jpg)
Results
34
![Page 35: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/35.jpg)
Other approaches
35
Andreas et al. ICCV 2017 Santoro et al. arXiv 2017
![Page 36: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/36.jpg)
Strengths and weaknessesStrengths
● Novel idea of using compositional reasoning to answer complex questions● Train program generator on questions using LSTMs● Training the whole network end to end
Weaknesses
● Not enough results on real images!● More complex questions may not work properly
36
![Page 37: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/37.jpg)
Future works/ possible improvementsIdeas taken from paper
● Adding ternary operations (if/else/then) and loops (for, do) to answer questions like “What color is the object with a unique shape?” .
● Control-flow operators could be incorporated into the framework● Learning programs with limited supervision
Our ideas
● Using treeRNNs to synthesize programs● Testing the whole framework on real images
37
![Page 38: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/38.jpg)
Conclusion● This method outperforms previous baselines.● Neural module networks are a more natural way to reproduce reasoning step.● More flexibility in the composition of the neural module network as modules
have generic architectures.
38
![Page 39: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/39.jpg)
References● Inferring and Executing Programs for Visual Reasoning● CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
Visual Reasoning● Talk - https://www.youtube.com/watch?v=3pCLma2FqSk● Learning to Reason: End-to-End Module Networks for Visual Question
Answering● https://github.com/facebookresearch/clevr-iep● Deep Compositional Question Answering with Neural Module Networks
39
![Page 40: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/40.jpg)
Thanks!
40
![Page 41: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/41.jpg)
Visual question answering
LSTM
CNN
MERGE Predict
Woman
Who is wearing hat?
● But does not really understand the question; same answer for○ who is wearing hat? who is wearing?; wearing?
41Antol etal. Vqa: Visual question answering : Proceedings of the IEEE International Conference on Computer Vision
![Page 42: International Conference on Computer Vision (ICCV 2017) Li ...lsigal/532L/PP_ProgramsForVisualReasoning.pdf · Compositional visual reasoning Q: How many spheres are the left of the](https://reader030.vdocuments.us/reader030/viewer/2022041301/5e10d2c843e3ce5333489577/html5/thumbnails/42.jpg)
Standard VQA?Q: How many spheres are the right of the big sphere and the same color as the small rubber cylinder?
Q: How many spheres are the left of the big sphere and the same color as the small rubber cylinder?
INPUT Predict
Answer?
42
IdentifySphere Left/ Right
Decompose the network!!