linking images across text rebecka weegar | kalle astrom | pierre nugues cs671a paper presentation...

11
LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

Upload: emmeline-harper

Post on 21-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

LINKING IMAGES ACROSS TEXT

REBECKA WEEGAR | KALLE ASTROM | P IERRE NUGUES

CS671APaper Presentation by:Archit Rathore12152

Page 2: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

INTRODUCTION

a flat landscape with a dry meadow inthe foreground, a lagoon behind it andmany clouds in the sky

Annotated

Image

Caption

Page 3: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

GOAL

o Skyo Cloudo Landscap

eo Lagoono Meadow

To correctly map images and caption entities

Page 4: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

MOTIVATION

Improve image retrieval by complementing images with accompanying text

Mapping to translate knowledge and information across text and images

Deduce geometric/spatial relations in images using captions

Page 5: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

DATASET

•Segmented and Annotated IAPR TC-12 Benchmark data set (Escalantea et al., 2010) that consists of about 20,000 photographs with a wide variety of themes

•Each image has a short caption that describes its content, most often consisting of one to three sentences separated by semicolons

•Each region is labelled with one out of 275 predefined image labels

Page 6: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

PREPROCESSING

•The Stanford CoreNLP pipeline is applied to the captions to extract the entities.

•Consists of a part-of-speech tagger, lemmatizer, named entity recognizer (Finkel et al., 2005), dependency parser, and coreference solver.

•The cross product of all image entities with caption entities is taken to get (IEi , CEi ) pairs

•The test set is built by manually annotating 200 randomly selected images

Page 7: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

RANKING ENTITY PAIRSINITIAL RANKING

•Using Semantics distance: by employing 8 metrices provided by WS4J (PATH,WUP,RES,JCN,HSO,LIN,LCH and LESK)

•Using Statistical Associations: Co-occurrence counts, Pointwise Mutual Information (PMI) and simplified Student’s t-score

•All the pairs are ranked using the above 8+3 scoring functions

RERANKING

•The spatial features of images (topographical, horizontal and vertical) are taken into consideration on their own as well as were aggregated with corresponding syntactic features of the caption entity

Page 8: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

Reranking Example

Page 9: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

IMPROVING THE RESULTS

•Highest accuracy of 78.6% obtained by HSO similarity metric

•Further reranking done by creating an ensemble of the classifiers based on all scoring functions using a hard voting heuristic

•The number of votes for each classifier were picked from [0,3] and all possible permutations were tested

•The highest achieved accuracy thus obtained was 88.76%

Page 10: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

RESULTS

Page 11: LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore 12152

FIN

The slides are part of paper review for course CS671. All the work is authored by:

Weegar, Rebecka, Kalle Aström, and Pierre Nugues. "Linking Entities Across Images and Text." CoNLL 2015 (2015): 185.

This presentation, in no way, claims owernship of all or any content in the slides.