clef-2005 cross-language speech retrieval track overview
DESCRIPTION
CLEF-2005 Cross-Language Speech Retrieval Track Overview. UMD: Douglas Oard , Dagobert Soergel , Ryen White , Kim Braun, Xiaoli Huang, Craig Murray, Scott Olsson, Jianqiang Wang, Ryen White DCU: Gareth Jones IBM: Bhuvana Ramabhadran, Martin Franz VHF: Sam Gustman. - PowerPoint PPT PresentationTRANSCRIPT
21-23 Sep 2005 CLEF 2005
CLEF-2005 Cross-Language Speech Retrieval
Track Overview
UMD: Douglas Oard, Dagobert Soergel, Ryen White, Kim Braun, Xiaoli Huang, Craig Murray, Scott Olsson, Jianqiang Wang, Ryen White
DCU: Gareth JonesIBM: Bhuvana Ramabhadran, Martin FranzVHF: Sam Gustman
Why Cross-Language Speech?
• Good News: We can search news well
• Bad News: News is a niche market
• Words spoken/day > present Google index– Words spoken/CLEF break > 1 CLEF paper
• Just 1 language has >10% of the speakers– And justt 10 people in this room can speak it
CLEF-2005 CL-SR Participants
• 7 teams / 4 countries– Canada: Ottawa, Waterloo– USA: Maryland, Pittsburgh– Spain: Alicante, UNED– Ireland: DCU
• 5 “official” runs per team (35 total)– Baseline required run: ASR / English TD topics
Mean Average Precision for Required Runs
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0 1 2 3 4 5 6 7 8
Site
Mea
n a
vera
ge
pre
cisi
on
Ottawa UMD UW/Toronto
UNED Alicante Pitt DCU
TD Queries / English Automatic
CLEF-2005 Contrastive Conditions
Site (query/document)MAP
EnglishΔMAPCzech
ΔMAPGerman
ΔMAPFrench
ΔMAPSpanish
Ottawa (TD/ASR04,AK1,AK2) 0.1653 + 2%
Ottawa (TDN/ASR04,AK1,AK2) 0.2176 - 41% - 14%
Maryland (TD/NAMES,KW,SUM) 0.3129 - 21%
Waterloo (T/ASR03,ASR04) 0.0980 - 52% - 13%
UNED (TD/ASR04) 0.0934 - 60%
DCU (T/ASR03,ASR04,AK1,AK2) 0.1429 + 16%
Test Collection Design
AutomaticSearch
BoundaryDetection
SpeechRecognition
QueryFormulation
Topic Statements
Ranked Lists
EvaluationRelevanceJudgments
Mean Average Precision
Interviews
ContentTagging
Manual: Topic boundaries
Manual: ~5 Thesaurus labelsSynonyms/broader termsPerson names3-sentence summaries
Automatic: Thesaurus labelsSynonyms/broader terms
Automatic: 35% interview-tuned40% domain-tuned
Training: 38 topicsEvaluation: 25 topics
CLEF-2005 CL-SR Collection• 8,104 topically-coherent segments
– 297 English interviews– “Known boundary” condition (VHF Segments)– Average 503 words/segment
• 63 “topics” (information need statements)– Title / Description / Narrative– 38 training + 25 (blind) evaluation– 5 topic languages (EN, SP, CZ, DE, FR)
• 48,881 Relevance judgments (9.6%)– Search-guided + post-hoc judgment pools
• Distributed to track participants by ELDA
Segment Duration
<DOCNO>VHF22029-209918.033<INTERVIEWDATA> Z… | 1922 | Esther Malke | Ettu | R… | R… | Ettuka<SUMMARY>EZ speaks of her family's difficulty to adjusting to Montréal, Canada. Tells of her work in her husband's restaurant. EZ notes the birth of her son. Speaks of her first husband's profession.<NAME>Yehuda Z…, Helen L…, Sandra B…, Allan Z…, Moishe S…<MANUALKEYWORD> Montréal (Quebec, Canada) | family businesses | acculturation | working life | extended family members | Canada 1945 (May 8) - 2000 (January 1) | occupations, spouse's<AUTOKEYWORD2004A1> extended family members | friends | working life | introduction of friends and/or family members | migration to the United States | migration from Germany | schools | Jewish-gentile relations | family businesses | family life | cultural
and social activities | occupations, father's | aspirations for the future | emigration and immigration policies and procedures | aid: assistance in migration | means of adaptation and survival |
United States 1945 (May 8) - 1952 (December 31) | France 1918 (November 11) - 1939 (August 31) | Paris (France) | Poland 1944
<AUTOKEYWORD2004A2> extended family members | fate of loved ones | anti-Jewish measures and legislation | Budapest (Hungary) | Poland 1939 (September 1) - 1945 (May 7) | aid: assistance in hiding and/or evasion | working life | seizure
of property | United States 1945 (May 8) - present | living conditions | Poland 1941 (June 21) - 1944 (July 21) | hiding | New York (New York, USA) | Poland 1918 (November 11) - 1939 (August 31) | occupations, interviewee's | Berlin (Prussia, Germany) | aid: provision of shelter | China 1939 (September 1) - 1945 (September 1) | separation of loved ones | contact with loved ones, renewed
<ASRTEXT2004A>well here the first five years we were not happen we wanted to go back to israel my husband couldn't red cross that we opened a grocery and then we lost that a lot of money and then we
had a cousin he said you know what that to make a living and best for me in a restaurant so you bastard and arrest and um and after two weeks because when they came smart about a sticky he didn't want to be there anymore so we didn't want to lose the two thousand dollar so we thought we going to sell it but the themselves so fast there was another jewish area on them near takes three percent province so to make a living my husband took me and to be cooked and we could make a living and i became pregnant with holland and my husband wanted to go to a um venezuela we made already passport and her sister from pennsylvania again there might be care of the group and he was a very angry at him we should come to pennsylvania and to settle we didn't want to go to a small town so he came back and when i was and i continued and the restaurant i myself i didn't want to let go and this time was the monthly a star they took away from him the building about the start clothes and discrete guy wanted to buy my arrested because he heard them expecting a baby and we made a lot of friends and we had the store because it was a jewish tired after i make a lot this and on nothing cash and she skate and i was very very good cook to people came to me trust the muscle and paid that double the price like somebody else and it and alan was for them discrete came in and he gave me six thousand dollars for the rest of two thousand intention four thousand dollars in notes in my name there was my first time that a canadian man and he liked us he invited us for parties on monday my husband and got a terrible terrible parkland and he wanted to start to build because uh you know this was a straight would he uh he he bark wood from the forest and sold it to the sea on our way to the trains so he knew measurements that we couldn't he wanted to build the when he told our neighbors time there that he wants to build we kissed them he said we want to be a part so he became part of for five years and my first time that we built a law that houses and that time of the trial one girl yeah yeah and then me that happy after the children that could i could children my husband was sick alone so i manic i was in the office i attempted he build always then we started to the small buildings like thirty units fifty units and my rent and then he uh
<ASRTEXT2003A>from here the first five years we were not have been wanted to go back to israel my husband couldn't red cross and we opened a grocery and then we must have a lot of money and
then we had a cousin he said you know what have to make a living in that for me in a restaurant so you bastard and the rest of the um and after two weeks because when they came smart about a sticky he didn't want to be there anymore to we didn't want to lose the two thousand dollar so we thought we going to sell it but the themselves so fast there was a jewish area on them near creeks because some products so to make a living my husband took me into the cook cook it and he could make a living and i became pregnant with holland and my husband wanted to go to um venezuela we made already passport and her sister from pennsylvania again there might be the the group and he was a very angry at him he should come to pennsylvania and to settle they didn't want to go to a small town so he came back and when i was and i continued and the rest of it and i myself i didn't want to let go and this time was them on the us us they took away from him in the building a contest our clothes and this great guy wanted to buy my arrested because he heard them expecting a baby and we made a lot of friends and we had the store because it was a to we started to i make a lot this and run out and kishka and she skate and i was very very good cook to people came to me that the muscle and they did double the price like somebody else and a half an hour was for them discrete came in and he gave me six thousand dollars for the rest of two thousand intention four thousand dollars in notes in my neighbors was my first time that factory in man and he liked us he invited us for parties on monday my husband and got a terrible time in parkland and he wanted to start to build because they were of this was this trade would he uh he he bark wood from the forest and sold it to the see mr and to the trains so he knew measurements that we couldn't he wanted to build the when he told our neighbors time if if he wants to build we kissed them he said we want to the park so he became part of for five years went and my first time that we built a law that houses in that time with um one girl yet and then me that happy after the children that could i could children my husband was it called so i managed i was in the office i attempted he build always then we started to the small buildings like thirty unit fifty units and i rented and we were
0
10
20
30
40
50
60
70
80
90
100
Jan-02 Jul-02 Jan-03 Jul-03 Jan-04 Jul-04 Jan-05
En
glis
h W
ord
Err
or
Ra
te (
%)
English ASR
Training: 200 hours from 800 speakers
ASR2003A
ASR2004A
An English Topic
Number: 1148
Title: Jewish resistance in Europe
Description: Provide testimonies or describe actions of Jewish resistance in Europe before and during the war.
Narrative: The relevant material should describe actions of only- or mostly Jewish resistance in Europe. Both individual and group-based actions are relevant. Type of actions may include survival (fleeing, hiding, saving children), testifying (alerting the outside world, writing, hiding testimonies), fighting (partisans, uprising, political security) Information about undifferentiated resistance groups is not relevant.
Relevance Judgments• Search-guided
– Iterate topic research/query formulation/judging • Done by grad students in History• Trained by expert librarians
– Essential for collection reuse with future ASR– Done between topic release and submission
• Top-ranked (=“Pooled”)– Same assessors (usually same individual)– 100-deep pools from 14 systems
• 2 per site, chosen in order recommended by sites• Omit segments with search-guided judgments
5 Types of RelevanceTopical
Relevance
Type
On topic Relationship between
overall document topic &
overall user need topic
Diagram Illustrating
Relationship1
Relatedness
to the given
topic
Specificity
of
information
Richness of
relevant
information
Direct Directly Matching High High High ~
medium
Indirect After
inference
Matching High ~
medium
High High ~
medium
Context No Surrounding Low ~
medium
Low Low ~
medium
Comparison Partially Comparing/ contrasting Low ~
medium
High ~
medium
Medium ~
low
Pointer No Apart Low High Low
1 In the diagram, the document topic is black; the user need topic is white.
Figure 3. Common Dimension or Factors Underlying Different Types of Topical Relevance
• Five levels of relevance for each (0=none, 4=highly)• Collapsed to binary relevance using {Direct≥2} U {Indirect≥2}• Script provided that sites can use to explore other combinations
0
200
400
600
800
1000
1200
1400
1600
20
55
20
06
23
84
23
67
24
04
18
71
18
50
21
98
18
77
22
32
22
13
24
00
21
85
22
64
19
79
22
24
20
12
22
65
23
61
23
58
18
97
18
29
23
64
18
43
22
53
12
25
16
05
14
14
11
81
16
20
11
79
14
46
12
79
14
27
16
63
13
21
14
31
21
63
01
28
61
62
81
15
91
58
71
33
21
33
71
42
41
16
61
55
41
33
01
18
71
64
71
31
01
50
81
19
21
25
91
43
13
11
85
15
60
11
18
82
00
01
42
91
31
11
56
02
12
88
13
45
13
25
11
33
13
41
15
92
11
73
16
23
16
68
12
82
16
24
17
46
15
51
Non-relevant Relevant
25 Evaluation Topics 38 Training Topics 12 Others
Total (Binary) Judgments
0
50
100
150
200
250
300
1850
2400
1877
2055
2361
2264
2367
2404
2232
2006
1979
1871
2358
2384
2185
2224
2364
2265
2198
2213
1897
1829
1843
2012
2253
Both Search-Guided Highly-Ranked
Contributions by Assessment Type
Binary relevance, 25 evaluation topics, 14-run 100-deep pools
Search-Guided: Stable Rankings
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Search-Guided + Highly Ranked
Se
arc
h-G
uid
ed
On
ly
Mean average precision, 35 official CLEF-2005 CLEF runs, 25 evaluation topics
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Search-Guided + Highly Ranked
Hig
hly
Ran
ked
On
lyHighly Ranked: Less Stable
Mean average precision, 35 official CLEF-2005 CLEF runs, 25 evaluation topics
A Unified Track for Everything• We’re an ad hoc bilingual track
– Standard distribution is a standard CLEF track
• We’re a domain-specific track– Thesaurus, human indexing, lead-in vocabulary
• We’re a Geo track– Every “document” is location and time tagged
• We’ll be a “new language” track in 2006– Czech “documents”
• We could be an image retrieval track– Thousands of images, with captions and ASR
• And, of course, we’re a speech track!
Almost
For More Information
• CLEF Cross-Language Speech Retrieval track– Come see the posters!– http://clef-clsr.umiacs.umd.edu/
• The MALACH project– http://www.clsp.jhu.edu/research/malach
• NSF/DELOS Spoken Word Access Group– http://www.dcs.shef.ac.uk/spandh/projects/swag
Backup Slides
5-level Relevance Judgments
• “Classic” relevance (to “food in Auschwitz”)
Direct Knew food was sometimes withheld
Indirect Saw undernourished people
• Additional relevance typesContext Intensity of manual laborComparison Food situation in a different campPointer Mention of a study on the subject
Binary qrels
Supplementary Resources
• Thesaurus (Included in the test collection)– ~3,000 core concepts
• Plus alternate vocabulary + standard combinations
– ~30,000 location-time pairs, with lat/long– Synonym, is-a and part-whole relationships
• Digitized speech (provided to Waterloo)– .mp2 or .mp3
• In-domain expansion collection (MALACH)– 186,000 scratchpad + 3-sentence summaries
Assessor Agreement
0.0
0.2
0.4
0.6
0.8
1.0
Overall Direct Indirect Context Comparison Pointer
Relevance Type
Kap
pa
14 topics, 4 assessors in 6 pairings, 1806 judgments
(2122) (1592) (184) (775) (283) (235)
44% topic-averaged overlap for Direct+Indirect 2/3/4 judgments
The CLEF CL-SR Team
• Shoah Foundation– Sam Gustman
• IBM TJ Watson– Bhuvana Ramabhadran– Martin Franz
• U. Maryland– Doug Oard– Dagobert Soergel
• Johns Hopkins– Zak Schefrin
• U. Cambridge (UK)– Bill Byrne
• Charles University (CZ)– Jan Hajic– Pavel Pecina
• U. West Bohemia (CZ)– Josef Psutka– Pavel Ircing
• UNED (ES)– Fernando López-Ostenero
USA Europe
CL-SR Track: Plans for 2006
DCU: Gareth Jones,
UMD: Doug Oard, Dagobert Soergel, Ryen WhiteCU: Jan Hajic, Pavel Pecina
UWB: Josef Psutka
JHU: Bill Byrne (Cambridge), Zak Schefrin
IBM: Bhuvana Ramabhadran
USC: Sam Gustman (Shoah Foundation)
Tentative CLEF-2006 Plans(Intent is to do both)
• New Czech test collection– ~700 hours unseen speakers (+ 700 hours seen)
• ~35% mean Word Error Rate
– 25 evaluation topics (+2-3 topics for training)– New unknown-boundary relevance assessment
• Larger English test collection– ~900 hours unseen speakers
• ~25% mean Word Error Rate (2006: ~35% WER)
– 25 evaluation topics (+63 topics for training)– Word lattice distributed as standard data
CLEF-2007 Options(probably only 1 will be possible)
• English– Expand to ~5,000 hours of unseen speakers– 25 new topics– Full ASR training data
• Czech– 25 new topics (total 50 across 2 years)– Full ASR training data
• Russian– 25 topics– Full ASR training data
Breakout Session Agenda• Lessons learned from this year
– Evaluation design, logistics
• English lattice options for 2006– Word lattice, phone lattice, partial decoding
• Czech test collection design– 2-channel ASR, unknown boundary measures
• Resource sharing among participants– Czech morphology/translation, in-domain expansion
• Additional data from VHF (special arrangement)– ASR training, BRF expansion,
• ELDA test collection release to nonparticipants
CL-SR Breakout Session
Doug Oard and Gareth Jones
Topic Construction
• English topics based on mailed requests– 25 new topics
• Czech topics based on discussions w/users– 25 topics, some from existing (English) topic set– Don’t tune Czech system to existing topics!
• All topics created in English– Standard translations into CZ, DE, FR, SP– Other languages possible if you can help
Standard Resources for Czech
• Brain-dead automatic segmentation– 400-word passages, with 50% overlap– Passage-to-start time conversion for results
• Prague Dependency Treebank (LDC)– Czech/English translation lexicon
• Precompiled morphology
– Czech morphology• For both formal and colloquial
Run Submission
• English: ranked lists of segments
• Czech: ranked lists of start times
• 5 official runs for each language?– Additional runs can be scored locally
• One required condition for each language– Monolingual TD queries automatic data only?
Czech Relevance Assessment
• Integrated search– Thesaurus-based onset marks– ASR results
• Unknown-boundary evaluation measure– With unknown boundaries in ground truth
• Summary generation– Descriptor history + ASR results
• Thesaurus translation into Czech
Potential Concerns• Czech
– # of teams contributing to highly-ranked pools– Clearance of interviews for ELDA distribution– Measure: Tuning [rank] vs. [start time error]– How to treat speakers seen during ASR training– Colloquial usage
• English– Cost-benefit tradeoff for highly-ranked judging– Partial decoding design and data format
Interview Use in English & Czech
0 30 60 90 120 150 180Minutes from interview start
……
……
Cze
chE
nglis
h 800
AS
R T
rain
3,20
0IR
Eva
l
350
AS
R T
rain
350
IR E
val
Fair
Seen spkr
Unfair
Ranked Start Time Evaluation
RR
R
AP=(0.5*1/1 + 0.5*2/2 + 0.5*3/4)/3 = 0.46
R
R
R
AP=(1.0*1/2 + 1.0*2/4 + 1.0*3/8)/3 = 0.46
Key Idea: Weight contribution to average precision by f(absolute error)
Eval Measure Design Issues
• What error function to use?– Symmetric? [yes]– Max allowable error? [1.5 minutes]– Shape? [step function, 1-minute increments]
• How to prevent double counting?– Guard band w/list compression? [4 minutes]
• What time granularity to use– For ground truth? [15 seconds]– For system output? [1 second]
Known Problems
• “Synonyms” are really “lead-in vocabulary”– Extended family members -> aunts, uncles, …
• Missing tapes cause errors– For segments crossing missing-tape boundary
• Incomplete judgments for added English– Only a problem for the 63 existing topics– But questionable judgments are marked
• Some 2006 topics are in the 75 from 2005– Don’t look at topics at topics beyond the 63!
Things to Think About
• Effective and efficient lattice search
• Failure analysis using judgment types– Direct, indirect, context, comparison, pointer
• Clever expansion approaches– Thesaurus, sequence, time, similarity
• Interactive experiments
21-23 Sep 2005 CLEF 2005
CLEF-2005 at Maryland:Cross-Language Speech Retrieval
Douglas Oard, Jianqiang Wang, Ryen White
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
metadata+syn.en.qe
metadata+syn.fr2en.qe
autokey+asr.en.qe
asr.de.en.qe
asr.en.qe
Mean Average Precision
Maryland CLEF-2005 Results
25 evaluation topics, English TD queries
Precision@10
0.28
0.27
0.48
0.28
0.37
Comparing ASR with Metadata
0
0.2
0.4
0.6
0.8
1
1188
1630
2400
2185
1628
1187
1337
1446
2264
1330
1850
1414
1620
2367
2232
2000
14313
2198
1829
1181
1225
14312
1192
2404
2055
1871
1427
2213
1877
2384
1605
1179
Avera
ge P
recis
ion
ASR Metadata Increase
CLEF-2005 training + test – (metadata < 0.2), ASR2004A only, Title queries, Inquery 3.1p1
ASR of % Metadata
0 10 20 30 40 50 60 70 80
1188
1630
2400
2185
1628
1187
1337
1446
2264
1330
1850
1414
1620
2367
2232
2000
14313
2198
1829
1181
1225
14312
1192
2404
2055
1871
1427
2213
1877
2384
1605
1179
Somewhere in ASR Only in Metadata
wallenberg (3/36)* rescue jews
wallenberg (3/36) eichmann
abusive female (8/81) personnel
minsko (21/71) ghetto underground
art auschwitz
labor camps ig farben
slave labor telefunken aeg
holocaust sinti roma
sobibor (5/13) death camp
witness eichmann
jews volkswagen
* (ASR/Metadata)
Error Analysis
CLEF-2005 training + test – (metadata < 0.2), ASR2004A only, Title queries, Inquery 3.1p1