![Page 1: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/1.jpg)
Paper presentation
Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et. al.
![Page 2: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/2.jpg)
Paper presentation
Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et. al.
![Page 3: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/3.jpg)
Has the problem of generating image descriptions
been solved?
![Page 4: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/4.jpg)
Problem Statement● Image captioning with a focus on fidelity, naturalness, and diversity
Captions generated by 3 different methods for the given image on the left side. [1]
![Page 5: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/5.jpg)
Motivation● Less variable and rigid captions produced by the state of the art methods
○ High resemblance captions to the ground truth● Unavailable metrics to capture variability and naturalness
○ Some of the current state of the art metrics:■ BLEU■ METEOR
![Page 6: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/6.jpg)
Related Work● Generation
○ Detection-based approaches■ CRF, SVM, CNN■ Retrieving sentences from existing data or using sentence templates
○ Encoder-decoder paradigm■ Maximum likelihood
![Page 7: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/7.jpg)
Related Work● Evaluation
○ Classical metrics■ BLEU: Precision■ ROUGE: Recall of n-grams
○ METEOR■ Combination of precision and recall
○ CIDEr■ Weighted statistics
○ SPICE■ Focus on linguistic entities reflecting visual concepts
![Page 8: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/8.jpg)
What is the state of the art model for image captioning?
![Page 9: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/9.jpg)
State of the art
LSTM based model for image captioning. [2]
● Combination of LSTM and CNN
● Metrics○ CIDEr○ METEOR○ BLEU○ ROUGE
![Page 10: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/10.jpg)
State of the art
Image description generation and evaluation for the state of the art approaches. [1]
![Page 11: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/11.jpg)
State of the art
Examples of images with two semantically similar descriptions. [1]
![Page 12: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/12.jpg)
Background● Conditional GAN
○ Conditioned on some extra information y.
Conditional adversarial net. [3]
![Page 13: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/13.jpg)
Background● Sequential Sampling
○ Non-probability sampling technique■ Not equal chances of being selected for all members of the population
○ Picking a single or a group of objects in every time interval○ Analyze the result, pick another sample
● Monte Carlo tree search○ A heuristic search algorithm for some kinds of decision processes esp. Game plays○ Steps
■ Selection■ Expansion■ Simulation■ Backpropagation
![Page 14: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/14.jpg)
Proposed Approach
Overall structure of both generator and evaluator of a CGAN. [1]
![Page 15: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/15.jpg)
Overall formulation● Evaluator
○ Quality of descriptions
● Learning objective
![Page 16: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/16.jpg)
The production of sentences is non-differentiable. How can
we backpropagate?
![Page 17: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/17.jpg)
Challenges● Using policy gradient for generating linguistic description
○ Sequential sampling procedure■ Sampling a discrete token at each time step
○ Non-differentiable● Expected future reward for early feedback using Monte Carlo rollouts
○ Vanishing gradients○ Error propagation
![Page 18: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/18.jpg)
Training G● Policy gradient
○ Action space: words○ Conditional policy
○ Reward given by the evaluator for a sequence of actions S
![Page 19: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/19.jpg)
Training G● Early feedback
○ Expected future reward
○ Gradient of the objective w.r.t.
![Page 20: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/20.jpg)
Training E● Enforcing naturalness and semantic relevance
○ Set of descriptions provided by human○ Descriptions from the generator○ Human descriptions uniformly sampled from other descriptions not associated with the given
image
![Page 21: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/21.jpg)
Generating paragraphs● Hierarchical LSTM
○ Sentence level○ Word level
Adding extension for paragraph generation. [1]
![Page 22: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/22.jpg)
Experiment● Datasets
○ MSCOCO○ Flickr30k
● Settings○ Removing non-alphabet characters○ Converting to lowercase○ Replacing less frequent words, less than 5 times, with UNK○ Max length of 16○ Pretrain G for 20 epochs based on MLE○ Pretrain E with supervised training for 5 epochs○ Batch = 64, lr = 0.0001, n = 16 (Monte Carlo)
![Page 23: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/23.jpg)
Experiment● Performance
Performances of different generators on MSCOCO and Flickr30k. [1]
![Page 24: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/24.jpg)
Has the proposed approach been able to achieve more
natural descriptions?
![Page 25: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/25.jpg)
Experiment● User Study
Human comparison results between each pair of generators. [1]
![Page 26: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/26.jpg)
Experiment● User Study
Corresponding G-GAN captions for images with similar descriptions in G-MLE. [1]
![Page 27: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/27.jpg)
Experiment● Diverse descriptions
Generated descriptions with different z. [1]
![Page 28: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/28.jpg)
Experiment● Evaluating semantic relevance by retrieval
Image rankings for different generators. [1]
S → E-GANP → Log-likelihood
![Page 29: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/29.jpg)
Experiment● Paragraph generation with different z values
Different paragraph descriptions generated by human, G-GAN, and G-MLE with different z values. [1]
![Page 30: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/30.jpg)
Failure Analysis● Incorrect details
○ Colors○ Counts
■ Few samples for each special detail■ Increased risk of putting more incorrect details due to focus on diversity
![Page 31: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/31.jpg)
Potential extensions?
![Page 32: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/32.jpg)
Future works● Use VAE instead of GAN● Use other similarity metrics instead of dot product in evaluator
![Page 33: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/33.jpg)
Presenters● Kevin Dsouza● Ainaz Hajimoradlou
![Page 34: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/34.jpg)
References1. B. Dai, S. Fidler, R. Urtasun, and D. Lin, Towards Diverse and Natural Image
Descriptions via a Conditional GAN, 2017, arXiv preprint arXiv: 1703.060292. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show and Tell: Lessons
learned from the 2015 MSCOCO Image Captioning Challenge, arXiv preprint arXiv: 1609.06647
3. M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, arXiv preprint arXiv: 1411.1784
![Page 35: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/35.jpg)
Thank you.
![Page 36: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/36.jpg)
Has the problem of generating image descriptions
been solved?
![Page 37: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/37.jpg)
Problem Statement● Image captioning with a focus on fidelity, naturalness, and diversity
Captions generated by 3 different methods for the given image on the left side. [1]
![Page 38: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/38.jpg)
Motivation● Less variable and rigid captions produced by the state of the art methods
○ High resemblance captions to the ground truth● Unavailable metrics to capture variability and naturalness
○ Some of the current state of the art metrics:■ BLEU■ METEOR
![Page 39: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/39.jpg)
Related Work● Generation
○ Detection-based approaches■ CRF, SVM, CNN■ Retrieving sentences from existing data or using sentence templates
○ Encoder-decoder paradigm■ Maximum likelihood
![Page 40: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/40.jpg)
Related Work● Evaluation
○ Classical metrics■ BLEU: Precision■ ROUGE: Recall of n-grams
○ METEOR■ Combination of precision and recall
○ CIDEr■ Weighted statistics
○ SPICE■ Focus on linguistic entities reflecting visual concepts
![Page 41: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/41.jpg)
What is the state of the art model for image captioning?
![Page 42: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/42.jpg)
State of the art
LSTM based model for image captioning. [2]
● Combination of LSTM and CNN
● Metrics○ CIDEr○ METEOR○ BLEU○ ROUGE
![Page 43: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/43.jpg)
State of the art
Image description generation and evaluation for the state of the art approaches. [1]
![Page 44: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/44.jpg)
State of the art
Examples of images with two semantically similar descriptions. [1]
![Page 45: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/45.jpg)
Background● Conditional GAN
○ Conditioned on some extra information y.
Conditional adversarial net. [3]
![Page 46: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/46.jpg)
Background● Sequential Sampling
○ Non-probability sampling technique■ Not equal chances of being selected for all members of the population
○ Picking a single or a group of objects in every time interval○ Analyze the result, pick another sample
● Monte Carlo tree search○ A heuristic search algorithm for some kinds of decision processes esp. Game plays○ Steps
■ Selection■ Expansion■ Simulation■ Backpropagation
![Page 47: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/47.jpg)
Proposed Approach
Overall structure of both generator and evaluator of a CGAN. [1]
![Page 48: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/48.jpg)
Overall formulation● Evaluator
○ Quality of descriptions
● Learning objective
![Page 49: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/49.jpg)
The production of sentences is non-differentiable. How can
we backpropagate?
![Page 50: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/50.jpg)
Challenges● Using policy gradient for generating linguistic description
○ Sequential sampling procedure■ Sampling a discrete token at each time step
○ Non-differentiable● Expected future reward for early feedback using Monte Carlo rollouts
○ Vanishing gradients○ Error propagation
![Page 51: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/51.jpg)
Training G● Policy gradient
○ Action space: words○ Conditional policy
○ Reward given by the evaluator for a sequence of actions S
![Page 52: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/52.jpg)
Training G● Early feedback
○ Expected future reward
○ Gradient of the objective w.r.t.
![Page 53: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/53.jpg)
Training E● Enforcing naturalness and semantic relevance
○ Set of descriptions provided by human○ Descriptions from the generator○ Human descriptions uniformly sampled from other descriptions not associated with the given
image
![Page 54: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/54.jpg)
Generating paragraphs● Hierarchical LSTM
○ Sentence level○ Word level
Adding extension for paragraph generation. [1]
![Page 55: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/55.jpg)
Experiment● Datasets
○ MSCOCO○ Flickr30k
● Settings○ Removing non-alphabet characters○ Converting to lowercase○ Replacing less frequent words, less than 5 times, with UNK○ Max length of 16○ Pretrain G for 20 epochs based on MLE○ Pretrain E with supervised training for 5 epochs○ Batch = 64, lr = 0.0001, n = 16 (Monte Carlo)
![Page 56: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/56.jpg)
Experiment● Performance
Performances of different generators on MSCOCO and Flickr30k. [1]
![Page 57: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/57.jpg)
Has the proposed approach been able to achieve more
natural descriptions?
![Page 58: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/58.jpg)
Experiment● User Study
Human comparison results between each pair of generators. [1]
![Page 59: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/59.jpg)
Experiment● User Study
Corresponding G-GAN captions for images with similar descriptions in G-MLE. [1]
![Page 60: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/60.jpg)
Experiment● Diverse descriptions
Generated descriptions with different z. [1]
![Page 61: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/61.jpg)
Experiment● Evaluating semantic relevance by retrieval
Image rankings for different generators. [1]
S → E-GANP → Log-likelihood
![Page 62: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/62.jpg)
Experiment● Paragraph generation with different z values
Different paragraph descriptions generated by human, G-GAN, and G-MLE with different z values. [1]
![Page 63: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/63.jpg)
Failure Analysis● Incorrect details
○ Colors○ Counts
■ Few samples for each special detail■ Increased risk of putting more incorrect details due to focus on diversity
![Page 64: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/64.jpg)
Potential extensions?
![Page 65: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/65.jpg)
Future works● Use VAE instead of GAN● Use other similarity metrics instead of dot product in evaluator
![Page 66: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/66.jpg)
Presenters● Kevin Dsouza● Ainaz Hajimoradlou
![Page 67: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/67.jpg)
References1. B. Dai, S. Fidler, R. Urtasun, and D. Lin, Towards Diverse and Natural Image
Descriptions via a Conditional GAN, 2017, arXiv preprint arXiv: 1703.060292. O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, Show and Tell: Lessons
learned from the 2015 MSCOCO Image Captioning Challenge, arXiv preprint arXiv: 1609.06647
3. M. Mirza, S. Osindero, Conditional Generative Adversarial Nets, arXiv preprint arXiv: 1411.1784
![Page 68: Paper presentation - Home | Computer Science at UBClsigal/532L/PP_Towards... · Paper presentation Towards diverse and natural image descriptions via a conditional GAN, B. Dai, et](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f26ca3396bf1347b12ab883/html5/thumbnails/68.jpg)
Thank you.