phone-level pronunciation scoring and assessment for interactive language learning

20
Phone-level pronunciation scoring and assessment for interactive language learning S.M. Witt *, S.J. Young Speech Communication 30 (2000) 95-108 Chun-Yu Chen

Upload: nola

Post on 24-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Phone-level pronunciation scoring and assessment for interactive language learning. S.M. Witt *, S.J . Young Speech Communication 30 (2000) 95-108. Chun-Yu Chen. Introduction GOP scoring Basic GOP algorithm Phone dependent thresholds Explicit error modelling Performance measures - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Phone-level pronunciation scoring and assessment for interactive language learning

Phone-level pronunciation scoring and assessment for

interactive language learningS.M. Witt *, S.J. Young

Speech Communication 30 (2000) 95-108

Chun-Yu Chen

Page 2: Phone-level pronunciation scoring and assessment for interactive language learning

Outline• Introduction• GOP scoring

• Basic GOP algorithm• Phone dependent thresholds• Explicit error modelling

• Performance measures• The transcription of pronunciation errors• Performance measures

• Collection of a non-native database• The labelling consistency of the human judges• Experimental results• Conclusions

Page 3: Phone-level pronunciation scoring and assessment for interactive language learning

• Computer-assisted language learning (CALL) system requires the ability to accurately measure pronunciation

• The system described here is focussed on measuring pronunciation quality of non-native speech at the phone level and locate pronunciation errors

Introduction

Page 4: Phone-level pronunciation scoring and assessment for interactive language learning

• The aim of the GOP measure is to provide a score for each phone of an utterance

• the individual GOP scores are calculated by the forced alignment pass and the phone recognition pass where each phone can follow the previous one with equal probability

• GOP1(p) = =

Basic GOP algorithm

Page 5: Phone-level pronunciation scoring and assessment for interactive language learning

• The quality of the GOP scoring procedure described above depends on the quality of the acoustic models used

Page 6: Phone-level pronunciation scoring and assessment for interactive language learning

• A simple phone-specfic threshold can be computed from the global GOP statistics. The threshold for a phone p can be defined in terms of the mean and variance of all the GOP scores

• The other way to approximate human performance is to learn from human labelling behaviour. The phone dependent threshold can be defined by averaging the normalised rejection counts over all speakers

Phone dependent thresholds

Page 7: Phone-level pronunciation scoring and assessment for interactive language learning

• Pronunciation errors can be grouped into two main error classes− Individual mispronunciations when the speaker is not familiar with

the pronunciation of a specific word− substitutions of native sounds for sounds of the target language,

which do not exist in the native language. This type also called systematic mispronunciations

• The knowledge of the native tongue of the learner can be included in the GOP scoring to improve the detection of errors : using phone model sets of both the target and the speaker’s native language

• The posterior probability of the target phones can be calculated by

Explicit error modelling

Page 8: Phone-level pronunciation scoring and assessment for interactive language learning

• scores for systematic mispronunciations are defined as

• Combining the basic with

Page 9: Phone-level pronunciation scoring and assessment for interactive language learning

• Performance measures are only concerned with the detection of pronunciation errors , and four different dimensions are considered− Strictness : how strict was the judge in marking pronunciation

errors− Agreement : the overall agreement between reference

transcription and the automatically derived transcription

− Cross-correlation : the overall agreement between the errors marked in the reference and the automatically

detected errors− Overall phone correlation : Overall rejection statistics for each

phone correlate between the reference and the automatic system

Performance measures

Page 10: Phone-level pronunciation scoring and assessment for interactive language learning

• All performance measures compare transcriptions on a frame by frame basis as follows1. forced alignment of the acoustic waveform with the corrected

transcriptions2. substituted, inserted or deleted phones are marked with ''1'',

other ones with ''0'‘ and this yielded vector x3. the vectors representing corrected transcriptions are smoothed

by a Hamming window

The transcription of pronunciation errors

Page 11: Phone-level pronunciation scoring and assessment for interactive language learning

• if rejected frames in one transcription are immediately followed by rejected frames in the other transcription, the rejections can be considered to have been caused by the same pronunciation error

Page 12: Phone-level pronunciation scoring and assessment for interactive language learning

• Stricness : use the difference between strictness levels for the two

• Agreement : distance between the corresponding transcription vectors

• Cross-Correlation : takes into account only those frames where there exists a rejection in either of them

,where

Performance measures

Page 13: Phone-level pronunciation scoring and assessment for interactive language learning

• Phoneme Correlation : the overall similarity of the phone rejection statistics

Page 14: Phone-level pronunciation scoring and assessment for interactive language learning

• In order to evaluate the pronunciation scoring , a database of non-native speech from second-language learners has been recorded and annotated

• The speakers understand the prompting texts and their competence level was low enough to produced easily detectable mispronunciations

• The annotation of database was performed at three different levels1. The original transcriptions were annotated with all substitution,

deletion and insertion errors made by the non-native speaker2. Each word was scored on a scale of 1~43. Each sentence was socred on the same scale

Collection of a non-native database

Page 15: Phone-level pronunciation scoring and assessment for interactive language learning

• Four performance measures described above are to determine these characteristics

• The results have been calculated by averaging A, CC, PC and between the respective judge and all other ones

The labelling consistency of the human judges

Page 16: Phone-level pronunciation scoring and assessment for interactive language learning

• This table shows the similarity between the human judges and the baseline GOP scoring method for each non-native speaker in that judge's group

Page 17: Phone-level pronunciation scoring and assessment for interactive language learning

• This figure shows CC and PC results grouped according to each student's mother-tongue

Page 18: Phone-level pronunciation scoring and assessment for interactive language learning

• human and machine judgements agree on which phones to accept and to reject with two exceptions

Experimental results

Page 19: Phone-level pronunciation scoring and assessment for interactive language learning

• This table shows the effects of incorporating error modelling into the GOP algorithm and in adaptation, judge-based individual thresholds

Page 20: Phone-level pronunciation scoring and assessment for interactive language learning

• Using a specially recorded database of non-native speech, the basic GOP method has been investigated and the effectiveness of the performance measures studied

• The combination of the baseline method with several refinements became comparable to the human-human benchmark values

• A computer based pronunciation scoring system can judge with regard to which phonetic segments in an utterance can be accepted as correct or not like a human

Conclusions