entity skeletons for visual storytelling · 2020-06-21 · visual storytelling [1] [1] huang,...

28
Entity Skeletons for Visual Storytelling Ruo-Ping*, Khyathi Chandu*, Alan W Black 1

Upload: others

Post on 25-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Entity Skeletons for Visual Storytelling

Ruo-Ping*, Khyathi Chandu*, Alan W Black

�1

Page 2: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

• Content as a Narrative Property

• Task Definition

• Dataset

• Models • Anchor Extraction • Anchor Informed Generation

• Results • Qualitative and Quantitative • Human Evaluation

�2

Overview

Page 3: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

History of Narratives

�3

Page 4: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

History of Narratives

�4

Page 5: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Recent Advancements

�5https://www.csgenerator.com/https://randomwordgenerator.com/paragraph.php

https://talktotransformer.com/

Page 6: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

�6

What makes a narrative effective?

Bridge the gap between human and machine generation

Bridge the gap between human and machine generation

Page 7: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

�7

Content - Relevance

http://picturebite.com/Views-Sample-Albums

We went to the beach. My kids had a lot of fun there. There were a lot of palm trees. We stayed in a resort.

We went to the library. I love reading books. I borrowed a lot of them.

Page 8: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

�8

Content - Relevance

Entities Beach Kids Resort

Entities Library Student Books

Page 9: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Fine-grained Entity Skeleton

Provides full guidance to each individual unit of narrative text

Anchoring Framework

- - - - - - - - - -

- - - - - - - - - -

- - - - - - - - - -

- - - - - - - - - -

- - - - - - - - - -

�9

Input : Ii and Ei = {e(1)i , e(2)

i , …, e(k)i }

Output : Ni = {s(1)i , s(2)

i , …, s(k)i }

Page 10: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Task: Introducing entities in visual storiesData:

Task Definition

�10

S = {S1, …, Sn}Si = {(I(1)

i , x(1)i , y(1)

i ), …, (I(5)i , x(5)

i , y(5)i )}

Input: Sequence of Images, Descriptions in Isolation (DII)Output: Stories in Sequences (SIS)Anchors: Entities

Page 11: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Visual Storytelling [1]

[1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016.

Descriptions in Isolation (DII) absent for 25% of images

Train Val Test

# Stories 40,155 4,990 5,055

#Images 200,775 24,950 25,275

#without DII 40,876 4,973 5,195

Dataset

�11

Page 12: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Entity Skeleton: defined as a linear chain of entities and referring expressions.

Coreference chains are extracted from Stanford CoreNLP

Entity Anchors

�12

Page 13: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Abstract Coreference Chains

Nominalized Coreference Chains

Surface Form Coreference Chains

{c1, c2, …, c5}

{[p, h]1, …, [p, h]5} p, h ∈ {0,1}

{person, location, misc, object}

�13

Entity Anchors: 3 Forms

Page 14: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

(1) Baseline: Glocal Context Model

lt = ResNet(It)

gt = Bi-LSTM([l1, l2 . . . l5]t)

wt ∼ ∏τ

Pr(wτt | w<τ

t , lt, gt)

Anchor Informed Generation

ResNet-152

BiLSTM

Vis Feature Extraction

�14

Page 15: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

lt = ResNet(It)

gt = Bi-LSTM([l1, l2 . . . l5]t)

wt ∼ ∏τ

Pr(wτt | w<τ

t , lt, gt, kt)

(2) Baseline: Skeleton Informed Local Context Model

Anchor Informed Generation

ResNet-152

BiLSTM

Vis Feature Extraction

�15

Page 16: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Generated Sentence Skeleton Entity Prediction

∑It,xt,yt∈S

αL1(It, yt) + (1 − α)L2(It, yt, kt)

(3) Multitasking Skeleton Prediction

Anchor Informed Generation

ResNet-152

BiLSTM

Vis Feature Extraction

�16

Page 17: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

BiLSTM

c1j c2j c3j c4j c5j Entity Skeleton

Story-Level

Sentence-Level

Anchor Informed Generation

(4) Hierarchical Glocal Model

ResNet-152

BiLSTM

Vis Feature Extraction

�17

Page 18: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Anchor Informed Generation

BiLSTM

c1j c2j c3j c4j c5j Entity Skeleton

ResNet-152

BiLSTM

Vis Feature Extraction

�18

(4) Hierarchical Glocal Model

Page 19: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

(from DII)

s1j s2j s3j s4j s5j

Attended Sentence Level Representation

…...

w1j w2j w3j wnjWords in Sentence 5

BiLSTM

w1j w2j w3j wnjWords in Sentence 1

BiLSTM

BiLSTM

(at each time step)

Anchor Informed Generation

ResNet-152

BiLSTM

Vis Feature Extraction

BiLSTM

c1j c2j c3j c4j c5j Entity Skeleton

�19

(4) Hierarchical Glocal Model

Page 20: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Models Skeleton Form METEOR Distance Avg # of Distinct Entities

Baseline None 27.93 1.02 0.4971

Baseline + Skeleton Surface 27.66 1.02 0.5014

MTG (α=0.5) Surface 27.44 1.02 0.9554

MTG (α=0.4) Surface 27.59 1.02 1.1013

MTG (α=0.2) Surface 27.54 1.01 0.9989

MTG (α=0.5) Nominalization 30.52 1.12 0.5545

MTG (α=0.5) Abstract 27.67 1.01 0.5115

Glocal Attention Surface 28.93 1.01 0.8963

Ground Truth: 0.7944

Evaluation: Quantitative

�20

Quantitative Experimental Results

Page 21: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Models Skeleton Form METEOR Distance Avg # of Distinct Entities

Baseline None 27.93 1.02 0.4971

Baseline + Skeleton Surface 27.66 1.02 0.5014

MTG (α=0.5) Surface 27.44 1.02 0.9554

MTG (α=0.4) Surface 27.59 1.02 1.1013

MTG (α=0.2) Surface 27.54 1.01 0.9989

MTG (α=0.5) Nominalization 30.52 1.12 0.5545

MTG (α=0.5) Abstract 27.67 1.01 0.5115

Glocal Attention Surface 28.93 1.01 0.8963

Quantitative Experimental Results

Ground Truth: 0.7944

Evaluation: Quantitative

�21

Page 22: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Models Skeleton Form METEOR Distance Avg # of Distinct Entities

Baseline None 27.93 1.02 0.4971

Baseline + Skeleton Surface 27.66 1.02 0.5014

MTG (α=0.5) Surface 27.44 1.02 0.9554

MTG (α=0.4) Surface 27.59 1.02 1.1013

MTG (α=0.2) Surface 27.54 1.01 0.9989

MTG (α=0.5) Nominalization 30.52 1.12 0.5545

MTG (α=0.5) Abstract 27.67 1.01 0.5115

Glocal Attention Surface 28.93 1.01 0.8963

Quantitative Experimental Results

Ground Truth: 0.7944

Evaluation: Quantitative

�22

Page 23: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Models Skeleton Form METEOR Distance Avg # of Distinct Entities

Baseline None 27.93 1.02 0.4971

Baseline + Skeleton Surface 27.66 1.02 0.5014

MTG (α=0.5) Surface 27.44 1.02 0.9554

MTG (α=0.4) Surface 27.59 1.02 1.1013

MTG (α=0.2) Surface 27.54 1.01 0.9989

MTG (α=0.5) Nominalization 30.52 1.12 0.5545

MTG (α=0.5) Abstract 27.67 1.01 0.5115

Glocal Attention Surface 28.93 1.01 0.8963

Quantitative Experimental Results

Ground Truth: 0.7944

Evaluation: Quantitative

�23

Page 24: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

�24

Human Evaluation

Preference Testing for Hierarchical Glocal Model

82% over Baseline

64% over Multitasking Model

Page 25: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Qualitative Analysis

Evaluation: Qualitative

�25

Page 26: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

Qualitative Analysis

Evaluation: Qualitative

�26

Page 27: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

�27

Takeaways

Improves Relevance component of visual storytelling

Improves Controllability in generation

Step towards interpretability with respect to intermediate representation

Page 28: Entity Skeletons for Visual Storytelling · 2020-06-21 · Visual Storytelling [1] [1] Huang, Ting-Hao Kenneth, et al. "Visual storytelling." Proceedings of the 2016 Conference of

�28

Thank You

Contact: [email protected]