better punctuation prediction with dynamic conditional random fields wei lu and hwee tou ng national...

Better Punctuation Prediction with Dynamic Conditional Random Fields

Wei Lu and Hwee Tou Ng

National University of Singapore

Talk Overview

• Background• Related Work• Approaches

– Previous approach: Hidden Event Language Model– Previous approach: Linear-Chain CRF– This work: Factorial CRF

• Evaluation• Conclusion

• Automatically insert punctuation symbols into transcribed speech utterances

• Widely studied in speech processing community• Example:

>> Original speech utterance:

>> Punctuated (and cased) version:

You are quite welcome . And by the way , we may get other reservations , so could you please call us as soon as you fix the date ?

you are quite welcome and by the way we may get other reservations so could you please call us as soon as you fix the date

Punctuation Prediction

Our Task

• Processing prosodic features requires access to the raw speech data, which may be unavailable

• Tackles the problem from a text processing perspective

Perform punctuation prediction for conversational speech texts without relying on prosodic features

Related Work

• With prosodic features– Kim and Woodland (2001): a decision tree framework– Christensen et al. (2001): a finite state and a multi-

layer perceptron– Huang and Zweig (2002): a maximum entropy-based

approach– Liu et al. (2005): linear-chain conditional random

fields

• Without prosodic features– Beeferman et al. (1998): comma prediction with a

trigram language model– Gravano et al. (2009): n-gram based approach

Related Work (continued)

• One well-known approach that does not exploit prosodic features– Stolcke et al. (1998) presented a hidden event

language model– It treats boundary detection and punctuation insertion

as an inter-word hidden event detection task– Widely used in many recent spoken language

translation tasks as either a pre-processing (Wang et al., 2008) or post-processing (Kirchhoff and Yang, 2007) step

Hidden Event Language Model

• HMM (Hidden Markov Model)-based approach– A joint distribution over words and inter-word events– Observations are the words, and word/event pairs are

hidden states

• Implemented in the SRILM toolkit (Stolcke, 2002)• Variant of this approach

– Relocates/duplicates the ending punctuation symbol to be closer to the indicative words

– Works well for predicting English question marks

where is the nearest bus stop ?

? where is the nearest bus stop

Linear-Chain CRF

• Linear-chain conditional random fields (L-CRF): Undirected graphical model used for sequence learning– Avoid the strong assumptions about dependencies in

the hidden event language model – Capable of modeling dependencies with arbitrary non-

independent overlapping features

Y1 Y2 Y3 Yn

X1 X2 X3 Xn

…word-layer tags

utterance

An Example L-CRF

• A linear-chain CRF assigns a single tag to each individual word at each time step– Tags: NONE, COMMA, PERIOD, QMARK, EMARK

– Factorized features

• Sentence: no , please do not . would you save your questions for the end

of my talk , when i ask for them ?

COMMA NONE NONE PERIOD NONE NONE … NONE COMMA NONE … QMARK

no please do not would you … my talk when … them

Features for L-CRF

• Feature factorization (Sutton et al., 2007)– Product of a binary function on assignment of the set

of cliques at each time step, and a feature function solely defined on the observation sequence

– Feature functions: n-gram (n = 1,2,3) occurrences within 5 words from the current word

Example: for the word “do”:

do@0, please@-1, would_you@[2,3], no_please_do@[-2,0]

Problems with L-CRF

• Long-range dependency between the punctuation symbols and the indicative words cannot be captured properly

• For example: no please do not would you save your questions for the end of

my talk when i ask for them

It is hard to capture the long range dependency between the ending question mark (?) and the initial phrase “would you” with a linear-chain CRF

Problems with L-CRF

• What humans might do– no please do not would you save your questions for the end

of my talk when i ask for them

– no please do not would you save your questions for the end of my talk when i ask for them

– no , please do not . would you save your questions for the end of my talk , when i ask for them ?

• Sentence level punctuation (. ? !) are associated with the complete sentence, and therefore should be assigned at the sentence level

What Do We Want?

• A model that jointly performs all the following three tasks together– Sentence boundary detection (or sentence

segmentation)– Sentence type identification– Punctuation insertion

Factorial CRF

• An instance of dynamic CRF– Two-layer factorial CRF (F-CRF) jointly annotates an

observation sequence with two label sequences– Models the conditional probability of the label

sequence pairs <Y,Z> given the observation sequence X

Y1 Y2 Y3 Yn

X1 X2 X3 Xn

Z1 Z2 Z3 Zn…sentence-layer tags

word-layer tags

utterance

Example of F-CRF

DEBEG DEIN DEIN DEIN QNBEG QNIN … QNIN QNIN QNIN … QNIN

• Propose two sets of tags for this joint task– Word-layer: NONE, COMMA, PERIOD, QMARK, EMARK

– Sentence-layer: DEBEG, DEIN, QNBEG, QNIN, EXBEG, EXIN

– Analogous feature factorization and the same feature functions used in L-CRF are used

Why Does it Work?

• The sentence-layer tags are used for sentence segmentation and sentence type identification

• The word-layer tags are used for punctuation insertion

• Knowledge learned from the sentence-layer can guide the word-layer tagging process

• The two layers are jointly learned, thus providing evidences that influence each other’s tagging process[no please do not]declarative sent. [would you save your questions

for the end of my talk when i ask for them]question sent.

?QNBEG QNIN …

Evaluation Datasets

BTEC CT

CN EN CN EN

Number of utterance pairs 19,972 10,061

Percentage of declarative sentences 64% 65% 77% 81%

Percentage of question sentences 36% 35% 22% 19%

Multiple sentences per utterance 14% 17% 29% 39%

Average words per utterance 8.59 9.46 10.18 14.33

• IWSLT 2009 BTEC and CT datasets• Consists of both English (EN) and Chinese (CN)• 90% used for training, 10% for testing

Experimental Setup

• Designed extensive experiments for Hidden Event Language Model– Duplication vs. No duplication– Single-pass vs. Cascaded – Trigram vs. 5-gram

• Conducted the following experiments– Accuracy on CRR texts (F1 measure)– Accuracy on ASR texts (F1 measure)– Translation performance with punctuated ASR texts

(BLEU metric)

• Precision # correctly predicted punctuation symbols

# predicted punctuation symbols

• Recall # correctly predicted punctuation symbols

# expected punctuation symbols

• F1 measure 2

1/Precision + 1/Recall

Punctuation Prediction: Evaluation Metrics

BTECNO DUPLICATION USE DUPLICATION

L-CRF F-CRFSingle Pass Cascaded Single Pass Cascaded

LM ORDER 3 5 3 5 3 5 3 5

Prec. 87.40 86.44 87.72 87.13 76.74 77.58 77.89 78.50 94.82 94.83

Rec. 83.01 83.58 82.04 83.76 72.62 73.72 73.02 75.53 87.06 87.94

F1 85.15 84.99 84.79 85.41 74.63 75.60 75.37 76.99 90.78 91.25

Prec. 64.72 62.70 62.39 58.10 85.33 85.74 84.44 81.37 88.37 92.76

Rec. 60.76 59.49 58.57 55.28 80.42 80.98 79.43 77.52 80.28 84.73

F1 62.68 61.06 60.42 56.66 82.80 83.29 81.86 79.40 84.13 88.56

Punctuation Prediction Evaluation: Correctly Recognized Texts (I)

• The “duplication” trick for hidden event language model is language specific

• Unlike English, indicative words can appear anywhere in a Chinese sentence

CTNO DUPLICATION USE DUPLICATION

LM ORDER 3 5 3 5 3 5 3 5

Prec. 89.14 87.83 90.97 88.04 74.63 75.42 75.37 76.87 93.14 92.77

Rec. 84.71 84.16 77.78 84.08 70.69 70.84 64.62 73.60 83.45 86.92

F1 86.87 85.96 83.86 86.01 72.60 73.06 69.58 75.20 88.03 89.75

Prec. 73.86 73.42 67.02 65.15 75.87 77.78 74.75 74.44 83.07 86.69

Rec. 68.94 68.79 62.13 61.23 70.33 72.56 69.28 69.93 76.09 79.62

F1 71.31 71.03 64.48 63.13 72.99 75.08 71.91 72.12 79.43 83.01

Punctuation Prediction Evaluation: Correctly Recognized Texts (II)

• Significant improvement over L-CRF (p<0.01)• Our approach is general: requires minimal

linguistic knowledge, consistently performs well across different languages

LM ORDER 3 5 3 5 3 5 3 5

Prec. 85.96 84.80 86.48 85.12 66.86 68.76 68.00 68.75 92.81 93.82

Rec. 81.87 82.78 83.15 82.78 63.92 66.12 65.38 66.48 85.16 89.01

F1 83.86 83.78 84.78 83.94 65.36 67.41 66.67 67.60 88.83 91.35

Prec. 62.38 59.29 56.86 54.22 85.23 87.29 84.49 81.32 90.67 93.72

Rec. 64.17 60.99 58.76 56.21 88.22 89.65 87.58 84.55 88.22 92.68

F1 63.27 60.13 57.79 55.20 86.70 88.45 86.00 82.90 89.43 93.19

Punctuation Prediction Evaluation: Automatically Recognized Texts

• 504 Chinese utterances, and 498 English utterances• Recognition accuracy: 86% and 80% respectively• Significant improvement (p < 0.01)

LM ORDER 3 5 3 5 3 5 3 5

CN EN 30.77 30.71 30.98 30.64 30.16 30.26 30.33 30.42 31.27 31.30

EN CN 21.21 21.00 21.16 20.76 23.03 24.04 23.61 23.34 23.44 24.18

Punctuation Prediction Evaluation: Translation Performance

• This tells us how well the punctuated ASR outputs can be used for downstream NLP tasks

• Use Berkeley aligner and Moses (lexicalized reordering)

• Averaged BLEU-4 scores over 10 MERT runs with random initial parameters

Conclusion

• We propose a novel approach for punctuation prediction without relying on prosodic features– Jointly performs punctuation prediction, sentence

boundary detection, and sentence type identification– Performs better than the hidden event language

model and a linear-chain CRF model – A general approach that consistently works well

across different languages– Effective when incorporated with downstream NLP

better punctuation prediction with dynamic conditional random fields wei lu and hwee tou ng national...

hidden event language

hidden statesimplemented

comma prediction

better punctuation prediction

crfa linearchain crf

linearchain crfthis

punctuation insertion

punctuation symbols

Documents

chee hwee lee, md brain1-20(2011年)

the conll-2013 shared task on grammatical error correction...

chapter 15: punctuation choices: exploring punctuation

chee hwee lee, md brain21-25(2011年)

punctuation saves lives! the importance of punctuation

research overview of the department of computer science prof...

teas english 8 - punctuation · punctuation ati teas...

chee hwee lee, md gu21-37(2011年)

chee hwee lee, md gi1-20(2011年)

nomoi tou newton

csc 8520 spring 2010. paula matuszek cs 8520: artificial...

chee hwee lee, md body1-2(2011年)

hwee tou ng - tcci.ccf.org.cn

vapathor tou tov.pdf

cs 8520: artificial intelligence intelligent agents and...

improved statistical machine translation for resource-poor...

chee hwee lee, md angio1-3(2011年)

grammar and punctuation - files.schudio.com · rammar and...

olympic · 2018-02-20 · h ení0kewn tou npoÉöpou tnc...

lee boon hwee