cuni at mediaeval 2014 search and hyperlinking task: search task experiments
TRANSCRIPT
CUNI at MediaEval 2014Search and Hyperlinking Task
Petra Galuščáková, Martin Kruliš, Jakub Lokoč and Pavel [email protected]
Faculty of Mathematics and PhysicsCharles University in Prague
MediaEval, 17. 10. 2014
2
System Description
● The recordings are segmented into overlapping passages
● Segments are indexed using the Terrier IR Platform● Language model● Predefined settings, stopwords removal, Porter
stemmer
3
System Description Cont.
● Concatenation of the segment and corresponding metadata● Title, episode title, description, short episode synopsis,
service name and program variant● Results filtering
4
Segmentation
● Fixed-length segmentation● 60-seconds long segments, 10-seconds long shift
● Decision Trees (DT)● For each word in the transcript/subtitles, it decides
whether the segment ends after this word or not● Model trained on manually marked segments in Similar
Segments in Social Speech Task MediaEval 2013● Cue word n-grams and cue tag n-grams, silence between
words, division given in transcripts, letter cases, and the output of the TextTiling algorithm
● 50- and 120-seconds long segments
5
Search Results
● Highest results achieved on the subtitles● Generally: LIMSI > NST-Sheffield > LIUM● Metadata improve the results● Fixed-length segmentation outperforms the
DT-based segmentation with 120-seconds-long segments
● 50-seconds-long DT segments outperform the fixed-length segments
6
Search Results - Overlapping
● All measures, except the MAP-tol measure, are notably higher in the experiments in which we did not remove partially overlapping segments
● These measures do not distinguish, whether a user had already seen the retrieved segment● The overlapping segments are expected not to be so
beneficial for the users● MAP-tol measure is not influenced by this behavior
7
Hyperlinking
● Transformed to the Search Task:Query segment was transformed into a textual query by including all the words of the subtitles lying within the segment boundary
● Context: we used a 200-seconds-long passage before and after each segment
● Metadata: we concatenated corresponding metadata with each segment and each query segment
● We combined textual similarity with visual and prosodic similarity
8
Visual Similarity
9
Visual Similarity Cont.
● We use Feature Signatures and Signature Quadratic Form Distance
http://siret.ms.mff.cuni.cz
10
Audio Similarity
11
Hyperlinking Results
● Metadata help, transcripts and post-filtering behave similarly as in the Search Task
● Unlike in the Search sub-task, the segmentation employing the Decision Trees outperforms the fixed-length segmentation for most of the measures
● A constant improvement when visual similarity is used
● A small improvement in MAP, P20, and MAP-tol measures when prosodic similarity is used
12
Results Cont.
Trans. Segm. Weights Overlap MAP P5 P10 P20 MAP-bin
MAP-tol
Sub. Fixed Visual Yes 4.1824 0.9667 0.9567 0.8967 0.3080 0.0996
Sub. DT Visual No 0.8253 0.8867 0.8567 0.7383 0.2525 0.1991
Trans. Segm. SegLen. Overlap MAP P5 P10 P20 MAP-bin
MAP-tol
Sub. DT 120s Yes 16.349 0.8400 0.8367 0.8433 0.3172 0.0515
Sub. DT 50s No 0.8028 0.7867 0.7667 0.6933 0.3199 0.2350
Search Sub-task
Hyperlinking Sub-task
13
Conclusion
14
Conclusion
● Need to be careful about overlapping of the retrieved results
● Fixed-length segmentation works well but could be outperformed by supervised ML-based methods
● Visual and audio modality can improve text-based retrieval on the subtitles and transcripts
15
Thank you
This research is supported by the Czech Science Foundation, grant number P103/12/G084, Charles University Grant Agency GA UK, grant number 920913, and by SVV
project number 260 104.