video-to-text (vtt) task€¦ · amongst 10 assessors. ... zhao, wan-lei and ngo chong-wah....

42
TRECVID 2018 Video to Text Description Asad A. Butt NIST George Awad NIST; Dakota Consulting, Inc Alan Smeaton Dublin City University 1 TRECVID 2018 Disclaimer: Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards, nor is it intended to imply that the entities, materials, or equipment are necessarily the best available for the purpose.

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

TRECVID 2018

Video to Text Description

Asad A. ButtNIST

George AwadNIST; Dakota Consulting, Inc

Alan SmeatonDublin City University

1TRECVID 2018

Disclaimer: Certain commercial entities, equipment, or materials may be identified in

this document in order to describe an experimental procedure or concept adequately.

Such identification is not intended to imply recommendation or endorsement by the

National Institute of Standards, nor is it intended to imply that the entities, materials,

or equipment are necessarily the best available for the purpose.

Page 2: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Goals and Motivations

✓Measure how well an automatic system can describe a video in

natural language.

✓Measure how well an automatic system can match high-level

textual descriptions to low-level computer vision features.

✓Transfer successful image captioning technology to the video

domain.

Real world Applications

✓Video summarization

✓Supporting search and browsing

✓Accessibility - video description to the blind

✓Video event prediction

2TRECVID 2018

Page 3: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

• Systems are asked to submit results for two

subtasks:

1. Matching & Ranking:

Return for each URL a ranked list of the most likely text

description from each of the five sets.

2. Description Generation:

Automatically generate a text description for each URL.

3

TASKS

TRECVID 2018

Page 4: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Video Dataset

• Crawled 50k+ Twitter Vine video URLs.

• Max video duration == 6 sec.

• A subset of 2000 URLs (quasi) randomly selected, divided

amongst 10 assessors.

• Significant preprocessing to remove unsuitable videos.

• Final dataset included 1903 URLs due to removal of

videos from Vine.

4TRECVID 2018

Page 5: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Steps to Remove Redundancy

▪ Before selecting the dataset, we clustered videos based

on visual similarity.

▪ Used a tool called SOTU [1], which used Visual Bag of Words to

cluster videos with 60% similarity for at least 3 frames.

▪ Resulted in the removal of duplicate videos, as well as those which

were very visually similar (e.g. soccer games), resulting in a more

diverse set of videos.

TRECVID 2018 5

[1] Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012).

Page 6: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Dataset Cleaning

▪ Dataset Creation Process: Manually went through large

collection of videos.

▪ Used list of commonly appearing videos from last year to select a

diverse set of videos.

▪ Removed videos with multiple, unrelated segments that are hard to

describe.

▪ Removed any animated (or otherwise unsuitable) videos.

▪ Resulted in a much cleaner dataset.

TRECVID 2018 6

Page 7: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Annotation Process

• Each video was annotated by 5 assessors.

• Annotation guidelines by NIST:

• For each video, annotators were asked to combine 4 facets if

applicable:

• Who is the video describing (objects, persons, animals, …etc) ?

• What are the objects and beings doing (actions, states, events,

…etc)?

• Where (locale, site, place, geographic, ...etc) ?

• When (time of day, season, ...etc) ?

TRECVID 2018 7

Page 8: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Annotation Process – Observations

1. Different assessors provide varying amount of detail

when describing videos. Some assessors had very

long sentences to incorporate all information, while

others gave a brief description.

2. Assessors interpret scenes according to cultural or

pop cultural references, not universally recognized.

3. Specifying the time of the day was often not possible

for indoor videos.

4. Given the removal of videos with multiple disjointed

scenes, assessors were better able to provide

descriptions.

TRECVID 2018 8

Page 9: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Sample Captions of 5 Assessors

TRECVID 2018 9

1. Orange car #1 on gray day drives around curve in

road race test.

2. Orange car drives on wet road curve with

observers.

3. An orange car with black roof, is driving around a

curve on the road, while a person, wearing grey is

observing it.

4. The orange car is driving on the road and going

around a curve.

5. Advertisement for automobile mountain race

showing the orange number one car navigating a

curve on the mountain during the race in the

evening; an individual is observing the vehicle

dressed in jeans and cold weather coat.

1. A woman lets go of a brown ball attached to

overhead wire that comes back and hits her in the

face.

2. In a room, a bowling ball on a string swings and its

a woman with a white shirt on in the face.

3. During a demonstration a white woman with black

hair wearing a white top and holding a ball tether

to a line from above as the demonstrator tells her

to let go of the ball which returns on its tether and

hits the woman in the face.

4. A man in blue holds a ball on a cord and lets it

swing, and it comes back and hits a woman in

white in the face.

5. A young girl, before an audience of students,

allows a pendulum to swing from her face and all

are surprised when it returns to strike her.

Page 10: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

2018 Participants (12 teams finished)

Matching & Ranking (26 Runs) Description Generation (24 Runs)

INF P P

KSLAB P P

KU_ISPL P P

MMSys_CCMIP P P

NTU_ROSE P P

PicSOM P

UPCer P

UTS_CETC_D2DCRC_

CAI

P P

EURECOM P

ORAND P

RUCMM P

UCR_VCG P

10TRECVID 2018

Page 11: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Sub-task 1: Matching & Ranking11TRECVID 2018

Person reading newspaper outdoors at daytime

Three men running in the street at daytime

Person playing golf outdoors in the field

Two men looking at laptop in an office

• Up to 4 runs per site were allowed in the Matching & Ranking subtask.

• Mean inverted rank used for evaluation.

• Five sets of descriptions used.

Page 12: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Matching & Ranking Results – Set A

12TRECVID 2018

0

0.1

0.2

0.3

0.4

0.5

0.6

Run 1

Run 2

Run 3

Run 4

Mean Invert

ed R

ank

Page 13: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Matching & Ranking Results – Set B

13TRECVID 2018

0

0.1

0.2

0.3

0.4

0.5

0.6

Run 1

Run 2

Run 3

Run 4

Mean Invert

ed R

ank

Page 14: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Matching & Ranking Results – Set C

14TRECVID 2018

0

0.1

0.2

0.3

0.4

0.5

0.6

Run 1

Run 2

Run 3

Run 4

Mean Invert

ed R

ank

Page 15: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Matching & Ranking Results – Set D

15TRECVID 2018

0

0.1

0.2

0.3

0.4

0.5

0.6

Run 1

Run 2

Run 3

Run 4

Mean Invert

ed R

ank

Page 16: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Matching & Ranking Results – Set E

16TRECVID 2018

0

0.1

0.2

0.3

0.4

0.5

0.6

Run 1

Run 2

Run 3

Run 4

Mean Invert

ed R

ank

Page 17: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Systems Rankings for each Set

A B C D E

RUCMM RUCMM RUCMM RUCMM RUCMM

INF INF INF INF INF

EURECOM EURECOM EURECOM EURECOM EURECOM

UCR_VCG UCR_VCG UCR_VCG UCR_VCG UCR_VCG

NTU_ROSE KU_ISPL ORAND KU_ISPL KU_ISPL

KU_ISPL ORAND KU_ISPL ORAND ORAND

ORAND NTU_ROSE NTU_ROSE KSLAB KSLAB

KSLABUTS_CETC_D2DCR

C_CAIKSLAB NTU_ROSE

UTS_CETC_D2DCR

C_CAI

UTS_CETC_D2DCR

C_CAIKSLAB

UTS_CETC_D2DCR

C_CAI

UTS_CETC_D2DCR

C_CAINTU_ROSE

MMSys_CCMIP MMSys_CCMIP MMSys_CCMIP MMSys_CCMIP MMSys_CCMIP

TRECVID 2018 17

Not much difference between these runs.

Page 18: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Top 3 Results

TRECVID 2018 18

#1874 #1681

#598

Page 19: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Bottom 3 Results

TRECVID 2018 19

#1029 #958

#1825

Page 20: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Sub-task 2: Description Generation

TRECVID 2018 20

“a dog is licking its nose”

Given a video

Generate a textual description

• Up to 4 runs in the Description Generation subtask.• Metrics used for evaluation:

• BLEU (BiLingual Evaluation Understudy)• METEOR (Metric for Evaluation of Translation with Explicit

Ordering)• CIDEr (Consensus-based Image Description Evaluation)• STS (Semantic Textual Similarity)• DA (Direct Assessment), which is a crowdsourced rating of

captions using Amazon Mechanical Turk (AMT)• Run Types

• V (Vine videos used for training)• N (Only non-Vine videos used for training)

Who ? What ? Where ? When ?

Page 21: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

CIDEr Results

TRECVID 2018 21

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Run 1 Run 2 Run 3 Run 4

Page 22: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

CIDEr-D Results

TRECVID 2018 22

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Run 1 Run 2 Run 3 Run 4

Page 23: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

METEOR Results

TRECVID 2018 23

0

0.05

0.1

0.15

0.2

0.25

Run 1 Run 2 Run 3 Run 4

Page 24: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

BLEU Results

TRECVID 2018 24

0

0.005

0.01

0.015

0.02

0.025

0.03

Run 1 Run 2 Run 3 Run 4

Page 25: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

STS Results

TRECVID 2018 25

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Run 1 Run 2 Run 3 Run 4

Page 26: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

CIDEr Results – Run Type

TRECVID 2018 26

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

V N

Page 27: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Direct Assessment (DA)

• Measures …

• RAW: Average DA score [0..100] for each system (non-

standardised) – micro-averaged per caption then overall

average

• Z: Average DA score per system after standardisation

per individual AMT worker’s mean and std. dev. score.

TRECVID 2018 29

Page 28: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

DA results - Raw

TRECVID 2018 30

0

10

20

30

40

50

60

70

80

90

100

Raw

Page 29: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

DA results - Z

TRECVID 2018 31

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Z

Page 30: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

What DA Results Tell Us ..

TRECVID 2018 33

1. Green squares

indicate a

significant “win” for

the row over the

column.

2. No system yet

reaches human

performance.

3. Humans B and E

statistically perform

better than Human

D.

4. Amongst systems,

INF outperforms

the rest.

Page 31: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Systems Rankings for each Metric

CIDEr CIDEr-D METEOR BLEU STS DA

INF INF INF INF INF INF

UTS_CETC_D2

DCRC_CAI

UTS_CETC_D2

DCRC_CAI

UTS_CETC_D2

DCRC_CAI

UTS_CETC_D2

DCRC_CAI

UTS_CETC_D2

DCRC_CAI

UTS_CETC_D2

DCRC_CAI

NTU_ROSE UPCer UPCer UPCer PicSOM UPCer

PicSOM KSLAB PicSOM PicSOM NTU_ROSE PicSOM

UPCer PicSOM KU_ISPL KSLAB UPCer KU_ISPL

KSLAB NTU_ROSE KSLAB KU_ISPL KU_ISPL KSLAB

KU_ISPL KU_ISPL NTU_ROSE NTU_ROSE KSLAB NTU_ROSE

MMSys_CCMIP MMSys_CCMIP MMSys_CCMIP MMSys_CCMIP MMSys_CCMIP MMSys_CCMIP

TRECVID 2018 34

Page 32: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Observations

• The task continues to evolve as the number of

annotations per video were standardized to 5

(compare to last year’s task).

• Tried to remove redundancy and create a

diverse set with little or no ambiguity for

matching sub-task.

• Steps were taken to ensure that a cleaner

dataset was used for the task.

TRECVID 2018 36

Page 33: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Participants

• Teams that will present today:

• RUCMM

• KU_ISPL

• INF

• Very high level bullets on approaches by other teams.

TRECVID 2018 37

Page 34: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

UTS_CETC_D2DCRC

• Widely used LSTM based sequence to sequence model.

• Focus on improving generalization ability of the model.

• Different training strategies used.

• Several combinations of spatial and temporal features are

ensembled together.

• Simple model structure preferred to help generalization

ability.

• Training data: MSVD, MSR-VTT 2016, TGIF, VTT 2016,

VTT 2017

TRECVID 2018 38

Page 35: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

PicSOM

Description Generation

• LSTM recurrent neural networks used to generate

descriptions using multi-modal features.

• Visual features include image and video features and

trajectory features.

• Audio features also used.

• Training datasets used: MS COCO, MSR-VTT, TGIF,

MSVD.

• Significant improvement by expanding MSR-VTT training

dataset with MS COCO.

TRECVID 2018 39

Page 36: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

KSLAB

• The main idea is to extract representations from only key

frames.

• Key frames are detected for different types of events.

• The method uses a CNN encoder and LSTM decoder.

• Model trained using MS COCO dataset.

TRECVID 2018 40

Page 37: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

NTU_ROSE

• Matching & Ranking

• Trained 2 different models on MS COCO dataset.

• Image based retrieval methods found suitable.

• Description Generation

• Training dataset: MSR-VTT and MSVD.

• CST-captioning (Consensus-based Sequence Training) used as

baseline and adapted.

• Both visual and audio features used.

• Model trained on MSR-VTT performed better, probably because it

generates longer sentences than one trained on MSVD.

TRECVID 2018 41

Page 38: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

MMSys

• Matching & Ranking

• Wikipedia and Pascal Sentence datasets used for training.

• Used pre-trained cross-modal retrieval method for matching task.

• Description Generation

• MSR-VTT dataset used for training.

• Extract 1 fps per video and used pre-trained Inception-ResNetV2 to

extract features.

• Used sen2vec for text features.

• Model trained on frame and text features.

TRECVID 2018 42

Page 39: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

EURECOM

Matching & Ranking

• Improved approach of best team of 2017 (DL-61-86).

• Feature vectors derived from frames extracted at 2 fps using final

layer of ResNet-152.

• Contextualized features obtained and combined through soft

attention mechanism.

• Resulting vector v fed into two fully connected layers using RELU

activation.

• Vector v concatenated with vector from last layer of an

RGB-I3D.

• Instead of using Res-Net152 trained on ImageNet, it is

also finetuned on MSCOCO.

TRECVID 2018 43

Page 40: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

UCR_VCG

Matching & Ranking

• MS-COCO dataset used for training.

• Keyframes extracted from videos – representative frames

• A joint image-text embedding approach used to match

videos to descriptions.

TRECVID 2018 44

Page 41: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Conclusion

• Good number of participation. Task will be renewed.

• This year we had more annotations per video.

• A cleaner dataset created.

• Direct Assessment was used for a second year running.

This year we included multiple human responses. The

results are interesting.

• Lots of available training sets, some overlap ... MSR-

VTT, MS-COCO, ImageNet, YouTube2Text, MSVD,

TRECVid2016-2017 VTT, TGIF

• Some teams used audio features in addition to visual

features.

TRECVID 2018 45

Page 42: Video-to-Text (VTT) Task€¦ · amongst 10 assessors. ... Zhao, Wan-Lei and Ngo Chong-Wah. "SOTU in Action." (2012). Dataset Cleaning Dataset Creation Process: Manually went through

Discussion

• Is there value in the caption ranking sub-task? Should it

be continued, especially with some teams participating

only in this subtask?

• Is the inclusion of run type (N or V) valuable?

• Other possible run types? Video datasets only vs. video + image

captioning training datasets.

• Possibilities for a new dataset?

• Are more teams planning to use audio features? What

about motion from video?

• What did individual teams learn?

TRECVID 2018 46