mediaeval 2013 spoken web search results slides

33
Spoken Web Search at Mediaeval 2013 Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke and Luis Javier Rodriguez-Fuentes

Upload: xanguera

Post on 19-Jun-2015

361 views

Category:

Technology


1 download

DESCRIPTION

These slides show the results and analysis of the 2013 SWS (Spoken Web Search) task within Mediaeval evaluation

TRANSCRIPT

Page 1: Mediaeval 2013 Spoken Web Search results slides

Spoken Web Search at Mediaeval 2013

Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke and Luis Javier

Rodriguez-Fuentes

Page 2: Mediaeval 2013 Spoken Web Search results slides

Spoken Audio Search (or Query-by-Example Spoken-Term Detection)

Given a spoken query we search for instances at lexical level within spoken documentsIt is similar to Spoken Term Detection (NIST STD2006, OpenKWS 2013) but…

Queries are spoken

Different speakers

Different acoustic conditions

No prior knowledge of the

language(s) might be available

Page 3: Mediaeval 2013 Spoken Web Search results slides

SWS history in Mediaeval• SWS 2011 had 5 finishing participants and

focused on 4 Indian languages• SWS 2012 had 9 finishing participants and

focused on 4 African Languages• SWS 2013 has 13 finishing (18 registered)

participants and contains 9 languages

2011 2012 20130

2

4

6

8

10

12

14

16

18

0

200

400

600

800

1000

1200

1400#teams

database size

Page 4: Mediaeval 2013 Spoken Web Search results slides

SWS 2013 evaluation setup

• 1 single search corpus with ~20 hours of data, collected from contributions of 9 languages– No transcription or language information is given

to participants• 500 queries for dev and 500 queries for eval– For each query, participants need to return all

instances of that query in the search corpus

Page 5: Mediaeval 2013 Spoken Web Search results slides

Mediaeval SWS 2013• 9 languages in different acoustic contexts: 4 African

languages (isixhosa, isizulu, sepedi, setswana), Albanian, Basque, Czech, non-native English, Romanian

#utts time Avg. length/utt.

Search corpus 10762 19:57:55 6.67s

Dev Queries 505 0:11:26h 1.35s

Extended dev* 1046 0:08:42h 0.49s

Eval Queries 503 0:11:37h 1.38s

Extended eval* 1037 0:08:57h 0.51s

Total 13853 20:38:37h*Only Basque (3x) and Czech (10x) queries have extended versions

Page 6: Mediaeval 2013 Spoken Web Search results slides

Database distribution per language

Language Number of utterances / total duration

Number of queries Speech quality (original sampling rate)

Recording environment

African - isixhosa 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech

African - isizulu 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech

African - sepedi 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech

African - setswana 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech

Albanian 968 / 127 min. 50 / 50 PC microphone, 16KHz Lab environment, read speech

Basque 1841 / 192 min. 100 / 100 (recorded by mobile phone)

TV Broadcast news, 16KHz

Studio, read speech

Czech 3667 / 252 min. 94 / 93 Telephone speech, 8KHz Telephone calls into radio broadcasts, spontaneous speech

Non-native English 434 / 141 min. 61 / 60 High quality mic, 44KHz Conference lectures, spontaneous speech

Romanian 2272 / 244 min. 100 / 100 PC microphone, 16KHz Lab environment, read speech

Page 7: Mediaeval 2013 Spoken Web Search results slides

SWS 2013 participantsTeam name countryDto. Electricidad y electrónica, Universidad Pais Vasco SpainSpeec@FIT, Brno University of Technology Czech RepublicTelefonica Research SpainUniversity Politechnica of Bucarest RomaniaSchool of Electrical and Computer Engineering, Georgia Institute of Technology USAL2F - INESC-ID PortugalDepartament de sistemes informàtics I Computació, Universitat Politècnica de València SpainAudiolab, University of Zilina SlovakiaLIA, University of Avignon FranceTechnical University of Kosice SlovakiaUniversitat Pompeu Fabra SpainDSP-STL, Dept. of EE, The chinese University of Hong Kong Hong KongInternational Institute of Information Technology- Hyderabad IndiaIAIS, Fraunhofer Institute GermanyTATA Consultancy Services Ltd. IndiaIndian Statistical Institute IndiaNorthwestern Polytechnical University of Xi’an ChinaToyota Technological Institute at Chicago USA

orga

nize

rsN

on-fi

nish

ing

Page 8: Mediaeval 2013 Spoken Web Search results slides

Possible approaches to QbE-STD

Pattern based

Lattice based

Word-based

Language spokenAcoustic models

Language models

+

+

Dynamic Tim

e Warping

Acoustic Keyword

Spotting

Full ASR

Page 9: Mediaeval 2013 Spoken Web Search results slides

Followed approachesTeam name DTW-like AKWSDto. Electricidad y electrónica, Universidad Pais VascoSpeec@FIT, Brno University of TechnologyTelefonica ResearchUniversity Politechnica of BucarestSchool of Electrical and Computer Engineering, Georgia Institute of TechnologyL2F - INESC-IDDept. de sistemes informàtics I Computació, Universitat Politècnica de ValènciaAudiolab, University of ZilinaLIA, University of AvignonTechnical University of KosiceUniversitat Pompeu FabraDSP-STL, Dept. of EE, The chinese University of Hong KongInternational Institute of Information Technology- Hyderabad

Page 10: Mediaeval 2013 Spoken Web Search results slides

Scoring metrics

• PRIMARY: Actual Term Weighted Value (ATWV) / Maximum Term Weighted Value (MTWV)

• Actual/minimum Cnxe

• Real-time factor• Memory usage

Page 11: Mediaeval 2013 Spoken Web Search results slides

Primary metric (dev)

Page 12: Mediaeval 2013 Spoken Web Search results slides

Primary metric (eval)

Page 13: Mediaeval 2013 Spoken Web Search results slides

Per language resultsAverage for the 10-best systems

Page 14: Mediaeval 2013 Spoken Web Search results slides

Per-language results: African (eval)

Page 15: Mediaeval 2013 Spoken Web Search results slides

Per-language results: Albanian(eval)

Page 16: Mediaeval 2013 Spoken Web Search results slides

Per-language results: Basque(eval)

Page 17: Mediaeval 2013 Spoken Web Search results slides

Per-language results: Czech (eval)

Page 18: Mediaeval 2013 Spoken Web Search results slides

Per-language results: Non-native English (eval)

Page 19: Mediaeval 2013 Spoken Web Search results slides

Per-language results: Romanian (eval)

Page 20: Mediaeval 2013 Spoken Web Search results slides

DET dev

Page 21: Mediaeval 2013 Spoken Web Search results slides

DET eval

Page 22: Mediaeval 2013 Spoken Web Search results slides

Cnxe metric

Page 23: Mediaeval 2013 Spoken Web Search results slides

Extended Queries

• 4 teams submitted 4 extended systems, making use of 3 repetitions of Basque queries and 10 repetitions of Czech queries available– TID: computes each query individually and then puts together all

results– GTTS: DTW-aligns all queries above a minimum duration and searches

with the resulting query– GeorgiaTech: builds a graphical keyword model using more than one

instance

Page 24: Mediaeval 2013 Spoken Web Search results slides

Extended systems

Page 25: Mediaeval 2013 Spoken Web Search results slides

Extended systems

Page 26: Mediaeval 2013 Spoken Web Search results slides

Extended systems

Page 27: Mediaeval 2013 Spoken Web Search results slides

Extended systems

Page 28: Mediaeval 2013 Spoken Web Search results slides

Real-Time Factor versus Memory usage

Page 29: Mediaeval 2013 Spoken Web Search results slides

Real-Time Factor versus Memory usage (partial)

Page 30: Mediaeval 2013 Spoken Web Search results slides

Take home messages

• The task was more complicated than in 2012– GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on

2013 data)– HKCU MTWV-12 = 0.74 (on 2012 data)

• It is possible to do QbE-STD on unknown/low resources data

Page 31: Mediaeval 2013 Spoken Web Search results slides

New things to watch out for in the posters session• BUT:

– Fusion of 26 systems (13 AKWS + 13 DTW)– M-norm normalization

• IIIT:– Articulatory Bottleneck features

• CUHK:– Tokenizer construction using Gaussian Component clustering– Query expansion using PSOLA

• L2F– DTW candidate pre-selection

• GTTS:– Distance matrix normalization in DTW

• GeorgiaTech:– Low-resource speech modeling using EHMM Models

• LIA:– Use of I-vectors in SWS

• ARF– DTW string matching algorithm with a novel scoring

Page 32: Mediaeval 2013 Spoken Web Search results slides

Poster session

Page 33: Mediaeval 2013 Spoken Web Search results slides

System presentations

• 16:30-16:45 "GTTS Systems for the SWS Task at MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE, Universidad del País Vasco

• 16:45-17:00 "The L2F Spoken Web Search system for Mediaeval 2013”, Alberto Abad, L2F, INESC-ID

• 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL APPROACH", Lucas Ondel, Speech@BUT, Brno University of Technology

• 17:15-17:30 "The CMTECH Spoken Web Search System for MediaEval 2013", Ciro Gracia, UPF

• 17:30-17:45 Discussion and SWS 2014 teaser, Xavier Anguera