learning from the uncertain: leveraging social communities to generate reliable training data for...

Learning from the Uncertain: Leveraging Social Communities to generate reliable Training Data for Visual Concept Detection Tasksi-KNOW 2015

Christian Hentschel, Harald SackHasso Plattner Institute, University of Potsdam, Germany

Agenda

● Visual Concept Detection○ Problem: Insufficient Training Data○ Approach: Leveraging Social Photo Communities

● Relevant Image Retrieval○ Improved Dataset○ Community language model○ Visual Re-ranking

● Results● Conclusions & Outlook

Learning from the UncertainHarald Sack,10-21-2015

Chart 2

Visual Concept Detection

● Ability of learning visual categories in order to automatically identify new, unseen images of these categories only based on visual content

Learning from the Uncertain

Chart 3

Harald Sack,10-21-2015

● Supervised Machine Learning Task:○ Positive images (that depict a concept)

○ Negative images (that don’t)

○ Classification/Prediction:■ Test image if it depicts concept (or not):

Visual Concept Detection


Chart 4


● Approach: Convolutional Neural Networks (CNN)○ outperformed all other approaches○ variation of multi-layer perceptron○ deep (i.e. many hidden layers)○ many parameters to train → prone to overfitting

■ important: (very) large training datasets

Visual Concept Detection - Training Data


Chart 5Bag of Visual WordsConvolutional Neural Networks


● Training Data for Convolutional Neural Networks○ ImageNet

■ widely used■ benchmarking initiative■ > 14m photos■ > 21k classes

■ goal: 40k categories each covered by 10k photos evaluated as relevant by the majority of 10 human annotators

■ estimated annotation time: 63 years (2s per image)

Visual Concept Detection - Training Data


Chart 6


http://www.image-net.org /

http://www.image-net.org

http://www.image-net.org

● Training Data for Convolutional Neural Networks (cont.)○ Flickr

■ > 8b photos (as of 12-12-2012)■ user generated annotations■ potentially unlimited visual concepts and photos per concept■ problems:

● incomplete (not severe as there are so many)○ missing annotations

● highly subjective!

● often not related to visual content!○ e.g. describe viewpoint rather than object

Visual Concept Detection - Leveraging Social Photo Communities


Chart 7


● Task: identify images relevant for a given visual concept○ scene/object should be clearly visible, no major occlusions

Visual Concept Detection - Leveraging Social Photo Communities


Chart 8


Relevant Image Retrieval - The Dataset

● MIRFLICKR-1M○ 1 Million Flickr images (selection based on

interestingness score)○ published under Creative Commons Attribution Licence○ provides EXIF data and user tags only

Learning from the UncertainHarald Sack,10-21-2015

Chart 9

○ authoritative metadata: title, description, geo information

○ social metadata: user comments, note, album and group memberships

○ available for > 90% of the original data○ publicly available: www.s16a.org/mirflickr

● our s14a extension:

http://www.s16a.org/mirflickr

● query extension○ based on the language used by Flickr users, generate query terms

similar/related to the visual concept query○ example: → …○ learn these annotation relationships based on metadata corpus

● language model: word2vec○ make predictions about words meaning based on learned contextual

appearances○ unsupervised training of a neural network: given a corpus

■ predict a word given its context (Continuous Bag-of-Words)■ predict context given a word (skip-gram)

○ skip-gram better suited to represent infrequent words

Relevant Image Retrieval - Community Language Model


Chart 10


https://code.google.com/p/word2vec/



● currently: model is based on user tags only○ titles, descriptions, user comments use HTML encoded full-text strings○ pre-processing: lemmatization, stop word removal (English only)

● skip-gram model○ context window size: 6 (average # of tags per image: 12 )○ ignore words with total frequency f < 5○ 300-dimensional feature vector

● compute k=20 most similar terms per query concept○ extend initial visual concept query (e.g. ‘sunset’) by additional, related terms○ rank images based on number of query terms found in metadata

Relevant Image Retrieval - Community language model (cont.)


Chart 11


● Advantages of corpus-specific language model:○ retrieves synonyms:

■ airplane: [aircraft, aeroplane, plane]○ retrieves related concepts:

■ beach: [sand, ocean, shore]○ retrieves frequent instances:

■ flower: [dahlia, spiderwort, tulip], car: [ford, porsche]○ multilingual:

■ dog: [chien, perro, hund]○ captures sub- and superclasses:

■ boat: [fisherboat, sailboat, sailingships], tiger: [flickrbigcats]○ no external knowledge base, dictionary or thesaurus required

Relevant Image Retrieval - Community language model (cont.)


Chart 12


● Pre-trained CNN as feature extractor○ trained on ImageNet Large Scale Visual Recognition Challenge 2012 data○ deep feature encoding○ pen-ultimate (fully connected) layer○ 4,096-dimensional feature vector○ extracted for all images of MIRFLICKR-1M collection

■ 3 hours on NVIDIA Tesla K20X GPU○ publicly available: www.s16a.org/mirflickr

● Re-ranking: highest ranked image from extended query○ assumption: high relevance for visual concept○ compute cosine similarity using deep feature representations

Relevant Image Retrieval - Further Improvement by Visual Re-ranking


Chart 13



● baseline:○ select photos based on presence of concept term in tagset○ no query extension○ large number of candidate images: randomly select n=200

● compare against○ extended query using community language model○ extended query + visual re-ranking

● evaluation using average precision:

○ R: no. of relevant photos, Rk: relevant images among the top k ranked instances, rel(k) = 1, if the photo at rank k is relevant, 0 otherwise

Results


Chart 14


● tested for 10 visual concepts

● manual assessment of relevance:○ object/scene clearly visible○ no major occlusions

● visual-reranking of results obtained from language model superior

Results (cont.)


Chart 15


○ exceptions: ‘airplane’, ‘car’

Results (cont.)


Chart 16


● top ranked candidate images according to community language model

passenger, vliegtuig, aéroport, jetplane, traveller, airliner, lesavions, economysection, traveler, vacation, fuselage, motor, jetliner, transportation, airplane, legroom, transport, sky, passengerjet, jet, travel, flying, avion, airport, inflight, rudder, flugzeug, tail, bin, ptvs, aerial, cabin, passengerplane, aircraftpicture, schipholairport, flap, nederland, aviao, amsterdam, landinggear, cockpit, luggagebins, aircraft, plane, aviation, seat, economyclass, airship, aisle, netherlands, nosegear, holland, luggage, aircraftcabin, engine, aeroport, ptv, aeroplane, aeroplano, wing, paysbas

harney, ford, illinois, myoldpostcards, leeannharney, il, owner, coupe, chromeengine, backend, taillight, automobile, route66, custom, tail, motorvehicle, vintagecar, fomoco, international, fin, collectiblecar, 2012, custombuilt, september21232012, auto, dougthompson, motherroadfestival, 1950, 9212312, convertible, fordmotorcompany, classiccar, 2door, worldcars, car, ghostflames, vonliski, rearend, sidepipes, antiquecar, springfield, frankharney, deluxe, carsonconvertibletop, oldcar

airplane

car

Results (cont.)


Chart 17


● Problem:○ visual re-ranking based on airport/rear light image fails to capture essential

features of the visual concept:■ airplane top-5 ranked images according to language model:

■ airplane top-5 ranked images after visual re-ranking:

● Automatically retrieve training images for visual concept detection:○ extend MIRFLICKR-1M dataset by additional authoritative and social metadata

■ www.s16a.org/mirflickr○ community language model improves over tag-based ranking

■ mAP: 56% → 77%○ visual re-ranking improves in most cases

● Future Work:○ ground truth relevance estimation based on single evaluator○ include further annotation data

■ title, description, Flickr group information …○ improve visual re-ranking:

■ outlier detection (e.g. single class SVM)■ on top-n results (instead of top-1)

Conclusions & Outlook


Chart 18




Thank you for your attention!

Christian Hentschel, Harald SackHasso Plattner Institute, University of Potsdam, Germany

learning from the uncertain: leveraging social communities to generate reliable training data for...

Technology