detection track visual relationship · 2018-11-05 · evaluation server is hosted by kaggle public...

31

Upload: others

Post on 22-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API
Page 2: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Visual Relationship Detection track

Page 3: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Outline

P 3Open Images Challenge: Visual Relationships Detection track

● Visual relationship detection track overview● Dataset: data collection and statistics● Metrics● Result analysis

Page 4: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual relationship detection

Visual relationship detection

P 4

DOGDOG

BICYCLE BICYCLE

PERSON

PERSON

Both images have the same set of objects and layout but very different semantics

Task:

● Two objects locations and classes

● Relationship between two objects

Page 5: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Participation and winning requirements

● Additional annotations on top of Open Images V4● External data/pre-trained models are allowed but must be disclosed● Evaluation server is hosted by Kaggle● Full prize: 20K USD split between 3 winners● Winner obligations:

○ Detailed, minimum 2-page description of method● Winners encouraged:

○ Open-source their framework

Open Images Challenge: Visual Relationships Detection track P 5

Page 6: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Dataset: data collection

P 6

2. Generate label co-occurrence statistics of Open Images V4+ pick interesting relationships

1. Existing works● VRD dataset1

● Visual Genome Dataset2

3. Generate candidate triplets for annotation

Relationships:<Pets> under {Table, Chair, etc ...}<Object> on top of <Object><Object> inside of <Object><Human> holds <Object><Human> on top of <Object><Human> hits {Football,Tennis ball,...}<Human> plays {Drums, Guitar, …}<Object> is {attribute}

1Lu, C., Krishna, R., Bernstein, M, Fei-Fei, Li, “Visual Relationship Detection with Language Priors”, ECCV 20162 Krishna R., Zhu Y., Groth O., Johnson J., Hata K., Kravitz J., Chen S., Kalantidis Y., Jia-Li L., Ayman Shamma D., Bernstein M., Fei-Fei L., “Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations”, 2016

Page 7: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Dataset: annotation

P 7

ManMan Man

Example triplet: Man holds Microphone

Page 8: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Dataset: annotation

P 8

Microphone

Example triplet: Man holds Microphone

Page 9: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Dataset: annotation

P 9

Please verify that the relation holds connects the man and the microphone on the image: man holds microphone

Microphone

Man

Page 10: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Dataset: statistics

Train set:● 1,743,042 images● 374,768 relationship annotations● 3,290,070 bounding boxes● 329 distinct triplets● 100k subset for validation

Test set:● 100K images● 30% in public split● 70% in private split

P 10

Page 11: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Evaluation

P 11

No standard metric for visual relationships detection evaluation.

Evaluation server is hosted by Kaggle

Public metric implementation is available as a part of Tensorflow Object Detection API

Page 12: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Evaluation: metrics

P 12

Three metrics used in literature1,2:● AP relationships detection (but reported values are low)● AP phrase detection● Recall@50, Recall@100 for both relationship detection and phrase

detection

Final score:0.4*mAP(relationships) + 0.4*mAP(phrase) + 0.2*Recall@50(relationships)

1Lu, C., Krishna, R., Bernstein, M, Fei-Fei, Li, “Visual Relationship Detection with Language Priors”, ECCV 20162 Krishna R., Zhu Y., Groth O., Johnson J., Hata K., Kravitz J., Chen S., Kalantidis Y., Jia-Li L., Ayman Shamma D., Bernstein M., Fei-Fei L., “Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations”, 2016

Page 13: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Evaluation: metrics

AP per relationship (i.e. holds)

● mean AP(relationships)● Recall@50

True Positive:● IoU > 0.5 for each box● Object labels and

relationship labelmatch

P 13

Page 14: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Evaluation: metrics

P 14

AP per relationship (i.e. holds)

mean AP(phrase)

True Positive:● IoU > 0.5 for box union● Object labels and

relationship labelmatch

Page 15: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Results analysis: overview

P 15

Number of teams with at least one submission: 232 teamsEvaluation server days: 51

External datasets/pre-trained models used: ● OpenImagesV4● ImageNet● COCO● Visual Genome

Base model architectures:● ResNets, YOLO, Darknet, SENet,

Retinanet ...

Deep learning frameworks:● Tensorflow Object Detection API,

Detectron, Cadene (pyTorch), fastai library, ImageAI, ChainerCV, TensorFlow-Slim, Keras, MXNet

Page 16: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Results analysis: teams

P 16

Number of teams: 232

Page 17: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Results analysis: components of the final score (weighted)

P 17

● Some teams with lower overall score had higher score in mAPs

● Some teams scored well, mostly because of Recall@50

Page 18: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Results analysis: number of submissions per day

P 18

Page 19: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Results analysis: evolution of scores

P 19

Dots: winners entering the competition

Page 20: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Results analysis: evolution of scores (winning teams)

P 20

Page 21: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Results analysis: winners breakdown by score compos (unweighted)

P 21

2rd, 3rd and 4th place teams showed approximately the same performance on mAP(relationship)

Page 22: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Winning models: final result

P 22

Public leaderboard score Private leaderboard score

Seiji 0.33213 0.28544

tito 0.25571 0.23709

Kyle 0.28043 0.23491

toshif 0.25621 0.22832

Undisclosed

Page 23: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Winning models

P 23

Commonalities:● Different models for attributes (“is” relationship) and relationships

between two objects● The models combine a detector with a module on top for relationship

prediction

Page 24: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Open Images Challenge: Visual Relationships Detection track

Questions?

P 24

Next - presentations by winning teams

Page 25: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

P 25

Page 26: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Materials below this slide

P 26

Page 27: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Candidates generation

Man ManMan

Example triplet: Man holds Microphone

Page 28: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Microphone

Candidates generation

Example triplet: Man holds Microphone

Page 29: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Microphone

Man

Please verify that the relation holds connects the man and the microphone on the image: man holds microphone

Annotation process

More about annotation process: go/oi_triplet_annotation

Page 30: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Microphone

Please verify that the relation holds connects the man and the microphone on the image: man holds microphone

Annotation process

Man

More about annotation process: go/oi_triplet_annotation

Page 31: Detection track Visual Relationship · 2018-11-05 · Evaluation server is hosted by Kaggle Public metric implementation is available as a part of Tensorflow Object Detection API

Microphone

Please verify that the relation holds connects the man and the microphone on the image: man holds microphone

Annotation process

Man

More about annotation process: go/oi_triplet_annotation