open-ended visual question-answering

Post on 17-Jan-2017

821 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open-ended Visual Question-Answering

[thesis][web][code]

Issey Masuda Mora Santiago Pascual de la PuenteXavier Giró i Nieto

Roadmap

Introduction Related Work

Methodology Results Conclusions Future work

2

Introduction Related Work

Methodology Results Conclusions Future Work

Introduction

3

Visual Question-Answering

Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., & Parikh, D. (2015). Vqa: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2425-2433). 4

Predict the answer of a given question related to an image

5

Visual Question-Answering: Types

6

Real images Abstract scenes

Multi-Choice

Open-ended

Q: Does it appear to be rainy?

A: no

Q: What is just under the tree?

A: a ball

Q: How many slices of pizza are there?

A: 1, 2, 3, 4

Q: What is for desert?

A: cake, ice cream, cheesecake, pie

Example

7

Question: What is bobbing in the water other than the boats?Answer: buoys

Motivation

8

New visual Turing test

Motivation: AI research

● Multidisciplinary tasks● Models able to perform more

complex activities● Different sub-problems tackled at

once

9

Computer Vision

KnowledgeRepresentation and Reasoning

Natural Language Processing

Introduction Related Work

Methodology Results Conclusions Future Work

Related Work

10

Deep Learning

11Credit: Google

VQA: Common approach

12

Visual representation

Textual representation

Predict answerMerge

Question

What object is flying?

AnswerKite

CNN

Word/sentence embedding + LSTM

Tools: Convolutional Neural Networks (CNN)

13

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

AlexNet

Introduction Related Work

Methodology Results Conclusions Future Work

Methodology

16

First steps: Text-based QA

17

Extending text-based QA for VQA

18Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

Substitute VGG-16 with KCNN

19Liu, Z. (2015). Kernelized Deep Convolutional Neural Network for Describing Complex Images. arXiv preprint arXiv:1509.04581.

Sentence embedding and image projection

20

Image

Question

Answer

Introduction Related Work

Methodology Results Conclusions Future Work

Results

21

VQA Dataset: Real Images, Open-ended questions

22

Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., & Parikh, D. (2015). Vqa: Visual question answering. CVPR 2015.

1 (image) x 3 (questions) x 10 (answers)

Evaluation

23

Metric: Script:

● Characters to lowercase● Remove periods (unless decimal

periods)● Number words to digits● Remove articles● Add apostrophe to contractions● Replace punctuation with space

VQA Challenge

24

53.62%CVPR2016 VQA Challenge

Real Images Open-ended, test-standard dataset partition

25

Results in detail

26

VALIDATION SET TEST SET

Model Yes/No Number Other Overall Yes/No Number Other Overall

Model 1 71.82 23.79 27.99 43.87 71.62 28.76 29.32 46.70

Model 3 75.02 28.60 29.30 46.32 - - - -

Model 2 75.62 31.81 28.11 46.36 - - - -

Model 5 78.15 32.79 33.91 50.32 78.15 36.20 35.26 53.03

Model 4 78.73 32.82 35.5 51.34 78.02 35.68 36.54 53.62

Results in context

27

100%0%

Humans

83.30%

UC Berkeley & Sony

66.47%

Baseline LSTM&CNN

54.06%

Baseline Nearest neighbor

42.85%

Baseline Prior per question type

37.47%

Baseline All yes

29.88%

Ours

53.62%

Comparison with the baseline

Our model

● Single word answer● Generate answers

28

Baseline

● Multi word answers (hardcoded)● Classify over the 1000 most common

answers

Qualitative results: I

29

Qualitative results: II

30

Deep Python Project

31https://github.com/imatge-upc/vqa-2016-cvprw

Research contribution: Extended abstract

32VQA workshop, CVPR 2016

Research controbution: Extended abstract - Poster

33

… ticket to Las Vegas 34

35Presenting our poster and extended abstract at CVPR 2016, Las Vegas, USA

VQA Challenge statistics: Answering method

36

Introduction Related Work

Methodology Results Conclusions Future Work

Conclusions

37

Conclusion

38

✓ Present to VQA Challenge, CVPR 2016

Goals accomplished

✓ First GPI project using text processing techniques

✓ Create a scalable VQA model✓ Build a modular and reusable

software package

✓ Extended abstract accepted to VQA workshop CVPR 2016

ConclusionPersonal overview

● Submission to VQA Challenge● VQA, hot topic at CVPR 2016● Model designed to generate

answers instead of classifying them

● Question-Answer pair generation proposal

39

Introduction Related Work

Methodology Results Conclusions Future Work

Future Work

40

Future work

41

● Decoder for multiple word answers

● Character embedding● Attention mechanisms● Question-Answer pairs

generationNext steps

Automatic Question-Answer Pairs Generation

42

Thank You!43

Do you have any question?

Project resource links

● Thesis: https://imatge.upc.edu/web/sites/default/files/pub/xMasuda-Mora_0.pdf

● Web page: http://imatge-upc.github.io/vqa-2016-cvprw/● Source code: https://github.com/imatge-upc/vqa-2016-cvprw

44

Motivation: First steps towards QA Generation

45

AI System

Question

What is the man doing?

AnswerSurf

Experiments: Batch Normalization

47

Losses I

48

Losses II

49

Losses III

50

VQA Challenge statistics: Image modelling

51

VQA Challenge statistics: Question modelling

52

top related