mediaeval 2016 - video hyperlinking: pursuing a moving target: iterative use of benchmarking of a...

28
Video Hyperlinking: Pursuing a Moving Target: Itera9ve Use of Benchmarking of a Task to Understand the Task Maria Eskevich, Gareth J. F. Jones, Robin Aly , Roeland Ordelman, Benoit Huet

Upload: multimediaeval

Post on 16-Apr-2017

194 views

Category:

Science


1 download

TRANSCRIPT

Video Hyperlinking: Pursuing a Moving Target:

Itera9ve Use of Benchmarking of a Task to Understand the Task

MariaEskevich,GarethJ.F.Jones,RobinAly,RoelandOrdelman,BenoitHuet

Task intui9on

Task intui9on

...thequeen...

Taking Aim

VideoHyperlinking(LNK)

Chasing History

2009-2010 2011 2012 2013 2014 2015 2016

RichSpeechRetrieval

VideoCLEF2009LinkingTask

SearchandHyperlinking:<search><anchoring><hyperlinking>

TRECVid

Blip10000 BBC(growingdatacollecYon) Blip10000.2

Generalusers(MTurk) MulYmediaprofessionals(archivists,journalists)Cue-guided

users

Datasets

Typeofusersgenera6ngthequeries/anchors

Tasktypesandevalua6onmetrics

Known-itemsearch:MRR

Adhochyperlinking:adaptedMAP,P@10

Adhochyperlinking:

MAiSP

Dataset: Video collec9ons

Benchmark Collec6on Hours Videos Producers

ME'12 bliptv1000 2125 9550 Laymen

ME'13 BBC1 1260 1623 MediaProfessionals

ME'14/TV'15

AXES2 2686 3520 MediaProfessionals

TV'16 bliptv2 2492 11482 GuidedLaymen

Anchor crea9on

Who defines the anchor?

• ME'12:AnchorsarerelevantsegmentsforinformaYonneeds• ME'13:RegularinternetusersinastudyattheBBC• ME'14:Large-scaleseconduserstudyattheBBC•  TV'15:AskVideoProfessionals–journalists,archivists,…•  TV'16:Usecuesinvideotogettoaspecifictypeofanchor

ME'13 User study

• 28Users-policemen,hairdresser,bouncer,salesmanger,student,self-employed

• Twohoursession: • Browsethearchive• Defineknown-item• Defineanchor

FormulateInformaYonneed

Textsearch Visualsearch

AXESPROsystem

Describetarget

Describeanchor

Selectanchorusingsliders

Problems

•  Somenonesenseanchors(Beyonce??)• Hyperlinkingconceptwasnotunderstoodbyeveryone• PooranchorsegmentdefiniYon•  IncompletedescripYonofdesiredtargets• CorrecYonbyorganizersmethodologicallyproblemaYc

Moving on

• ME’14:•  Enlargedstudy• Beleruserinterfaceguidance•  Expensive,notmuchbelerthanME’13

•  TV’15:• Askmediaprofessionals•  InteracYvecoachingtocreateanchors• Mucheffortandavailabilityofprofessionalslimited• MulYmodalityimportantforcommunitybutnotensured

TV’16: Guided and focused chase for mul9modality

LookforsegmentscontainingacombinaYonofverbal-visualinformaYon:• VerballinguisYccues:`cansee',`seeinghere',`thislooks’,`lookslike',`showing',`wanttoshow’• Visualcues:acYonsandobjectscrucialforthisvideoarenotexplicitlynamedormenYoned

Ground truth crea9on

Anchors/ParYcipants’results

HIT

RelevanceJudgments/Workers

feedback

Anchors/ParYcipants’results

HIT

RelevanceJudgments

Anchors/ParYcipants’results

HIT

RelevanceJudgments

Itera9ve HIT Improvement

Finding common language with the crowd • ME’12:Watchtwovideosegmentsandsaywhethertheyarerelatedinsomeway.•  Yes/Noanswers

• ME’13+14:Yes/Noanswers• WatchtwovideosegmentsandsaywhetherthesecondvideoisrelatedtothefirstoneaccordingtothegivendescripYon.

•  TV’15:•  Detailsonwhatconnectsordisconnectsgivenvideosegments

•  TV’16:twostepprocedure:• Watchthevideosegmentanddescribewhywouldsomeoneshareit•  PleasewatchthevideosegmentandpickthedescripYon• Watch2videosegmentsandproviderelevancedescripYondetails

Turning to the Crowd for Insight and Evalua9on 3stageapproach

ManualAnchorsCreaYon

1.AnchorVerificaYon

2.TargetVerng

3.Video-to-VideoRelevanceAnalysis

Video-to-Videoretrievalsystems

MulYmodalAnchors

AutomaYcTargetsCreaYon

Qrel DetaileddescripYonsforrelevance

Metric evolu9on, or how to measure effec9veness

Adapted MAP

Results

Assess-ments

RawResults

Res1 Res3 Res2

Effective Results Res1 (Rel=1) Res2 (Rel=0)Res3 (Rel=0)

Adapted MAP Undesired Configura9ons

Results

Assess-ments

RawResults Res1 Res3 Res2

Effective Results

Rel=1 Rel=1 Rel=1

Binned Relvance

Bin 1

Time

Bins

Assess-ments

RawResults

Bin2 Bin3

Res1 Res3 Res3

Effective Results Res1 (Rel=1) Res2 (Rel=0)

Bin Size

Tolerance to Non-Relevance

Time

Assess-ments

RawResults

Res1 Res3 Res2

Effective Results Res1 (Rel=1) Res2 (Rel=0)

Tolerance Window

MAiSP Measure

Assess-ments

RawResults

Res1 Res3 Res2

Effort

Reward

TV'15: Comparison MAP and Maisp

Hands-on take home messages • Beginningoftheyearlycycle:• Gatherfeedbackontask•  IsthereaparYcipant?Istheremoney?•  ScienYficquesYon

•  Tasklifecycle:• Bepreparedfordisaster!•  InteractwiththeparYcipants• ClosecollaboraYonbetweentaskorganizers•  InteracYonwithbenchmarkorganizersimportant

• Wrapup:• Adjustthetarget•  Tasksustainabilityfornextyear