tqe: transcription quality evaluation
DESCRIPTION
TQE: Transcription Quality Evaluation. A CLARIN-NL project. Radboud University Nijmegen Institute for Dutch Lexicology Max Planck Institute for Psycholinguistics. TQE: practical information. Duration: 01/04/2010 – 01/07/2011 Type: Demonstrator Project Project team: - PowerPoint PPT PresentationTRANSCRIPT
TQE: Transcription Quality Evaluation
A CLARIN-NL project
Radboud University Nijmegen
Institute for Dutch Lexicology
Max Planck Institute for Psycholinguistics
TQE: practical information
• Duration: 01/04/2010 – 01/07/2011
• Type: Demonstrator Project
• Project team:o CLST: Centre for Language and Speech Technology
Helmer Strik (coord.), Joost van Doremalen, Eric Sanders,Catia Cucchiarini, Robin Oostrum, Ferdy Hubers
o INL: Instituut voor Nederlandse LexicologieRemco van Veenendaal, Laura van Eerten
o MPI: Max Planck Institute for PsycholinguisticsDaan Broeder, Tobias van Valkenhoef, Peter Withers
• CLARIN centreo MPI: Max Planck Institute for Psycholinguistics
Daan Broeder
Automatic TranscriptionQuality Evaluation• Input:
o Audio signalso Phone(tic) transcriptions
• Output:o For each phone: TQE measure
• How:o Audio and phonetic transcriptions are alignedo Phone boundaries are derivedo For each phone a TQE measure is determined,
a confidence measure, e.g. ranging from 0-100%indicating how well phone & segment ‘fit together’,i.e. what the quality of the transcription is
MPI version
CLST development version
Survey: 2a) De bestandsformaten
Antwoord Telling %WAV 30 34,88OGG 6 6,98AIFF 13 15,12MP3 16 18,60MP4 5 5,81FLAC 5 5,81ALAW 4 4,65ULAW 3 3,49anders 4 4,65
0
5
10
15
20
25
30
35
WAV OGG AIFF MP3 MP4 FLAC ALAW ULAW anders
Survey: 2c) De opnameprecisie
Antwoord Telling %8 bit 3 7,89
12 bit 3 7,8916 bit 24 63,1624 bit 8 21,05
0
5
10
15
20
25
30
8 bit 12 bit 16 bit 24 bit
Survey: 3) De formaten en standaarden voor fonetische transcripties
Antwoord Telling %SAMPA 23 28,40
X-SAMPA 6 7,41IPA 25 30,86
CGN-set 9 11,11YAPA 3 3,70Celex 7 8,64LH+ 3 3,70
anders 5 6,17 0
5
10
15
20
25
30
SAMPA X-SAMPA IPA CGN-foneemset
YAPA Celex LH+ anders
Survey: 4) De software
Antwoord Telling %Praat 32 53,33
Audacity 10 16,67CoolEdit 7 11,67Audition 5 8,33anders 6 10,00
0
5
10
15
20
25
30
35
Praat Audacity CoolEdit Audition anders
Survey: 8) Interesse in opname CLARIN-infrastructuur
Antwoord Telling %Ja 22 64,71
Nee 11 32,35Weet niet 1 2,94
0
5
10
15
20
25
Ja Nee Weet niet
Survey: 9) Bereid tot meeleveren metadata
Antwoord Telling %Ja 27 90Nee 3 10
0
5
10
15
20
25
30
Ja Nee
Survey: 10) Huidig gebruik van metadataforma(a)t(en)
Antwoord Telling %OLAC 1 2,38IMDI 4 9,52CMDI 5 11,9
Dublin Core 4 9,52TEI 4 9,52
Geen 21 50Anders 3 7,14
0
5
10
15
20
25
OLAC IMDI CMDI Dublin Core TEI Geen Anders
Epilogue
• More information:http://lands.let.ru.nl/~strik/research/TQE/
• Questions ?