markpong jongtaveesataporn † chai wutiwiwatchai ‡ koji iwano † sadaoki furui † † tokyo...

41

THAI BROADCAST NEWS CORPUS CONSTRUCTION AND EVALUATION Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Upload: estefania-hanly

Post on 14-Dec-2015

214 views

Category:

Documents

1 download

Report

Download

Tags:

Embed Size (px):

TRANSCRIPT

Page 1: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

THAI BROADCAST NEWS CORPUS CONSTRUCTION AND EVALUATION

Markpong Jongtaveesataporn †

Chai Wutiwiwatchai ‡

Koji Iwano †

Sadaoki Furui †

† Tokyo Institute of Technology, Japan ‡NECTEC, Thailand

Page 2: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Background on Thai speech recognition research

2

1987

Isolated syllable recogniti

on

1995

Isolated word

recognition

Connected sub-word

recognition

1999

Small task continuous

speech recognition

2003

LVCSR

2005

Broadcast news

transcription system

2007

Difficulty

Thienlikit et al., 2004• Newspaper read-speech recognition

Page 3: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Development of Thai Broadcast News Transcription System• Research on broadcast news transcription

system for Thai falls behind other languages• English: 1995 (Stern, 1997)• Japanese: 1997 (Matsuoka et al., 1997)• Mandarin: 1998 (Guo et al., 1998)• Italian: 2000 (Federico et al., 2000)

• We need to speed up our research activities to catch up with others

3

Targets

1. Development of Thai broadcast news corpus• Speech corpus: training and testing data• Text corpus: language modeling

2. Development of a prototype system

Page 4: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Speech corpus

Structure information of broadcast news was annotated Section, Speaker’s turn, Segments

Property tags were annotated to each speaker’s turn Speaker’s name, if known Speaker’s gender: male / female Speaking mode: planned / spontaneous Background noise: clean / music / noise

Only speech from announcers speaking in the studio was transcribed

Transcription and annotation was created by one transcriber and checked by another transcriber

4

Page 5: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Episode : one broadcast news session

Structure of broadcast news

5

Section 1 : one news topicSection 1 : one news topic

Section 2

Section 3

Page 6: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Episode : one broadcast news session

Section 1 : one news topic

Structure of broadcast news

5

Speaker’s turn : speaker ASpeaker’s turn : speaker A

Speaker’s turn : speaker B

Speaker’s turn : speaker A

Page 7: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Episode : one broadcast news session

Structure of broadcast news

7

Section 1 : one news topic

Speaker’s turn : speaker A

Segment : one sentence or clause

Segment : one sentence or clause

Segment : one sentence or clause

Page 8: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Speech corpus

Structure information of broadcast news was annotated Section, Speaker’s turn, Segments

Property tags were annotated to each speaker’s turn Speaker’s name, if known Speaker’s gender: male / female Speaking mode: planned / spontaneous Background noise: clean / music / noise

Only speech from announcers speaking in the studio was transcribed

Transcription and annotation was created by one transcriber and checked by another transcriber

8

Page 9: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Episode : one broadcast news session

Example of structure information

9

Section 1 :

Speaker’s turn :

Segment : sentence A

Segment : sentence B

Segment : sentence C

Sports

Mr. A, male, planned speech, clean speech

Page 10: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Speech corpus

Structure information of broadcast news was annotated Section, Speaker’s turn, Segments

Property tags were annotated to each speaker’s turn Speaker’s name, if known Speaker’s gender: male / female Speaking mode: planned / spontaneous Background noise: clean / music / noise

Only speech from announcers speaking in the studio was transcribed

Transcription and annotation was created by one transcriber and checked by another transcriber

10

Page 11: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Text corpus

No structure information was annotated

Additional information Speaking mode: planned / spontaneous

11

Page 12: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Problems of Thai transcription text No space between words Definition of word is very ambiguous No good morphological analyzer Difficulties in transcription and checking process

Manually word-segmented transcription was made Instruction was created for transcribers

Automatically segmented transcription

12

Future

target

Page 13: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Broadcast news collection

News programs from one public TV station in Thailand were recorded

Total of 105 news episodes Speech corpus : 35 news episodes 17

hours Text corpus : 70 news episodes

13

Page 14: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Analysis of speech corpus

14

Back-ground

Mode

Gender female male

planned

sponta-neous

noise clean music

Page 15: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Information of speech & text corpora

Attribute Speech corpusText

corpus

No. of sentences

13k 32k

No. of words 224k 573k

No. of unique words

10k 14k

No. of phonemes

899k -

No. of speakers8 female,

4 male-

15

Page 16: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Data used in experiments Test set data

Randomly selected from the speech corpus 3,000 utterances

Acoustic model training data for the baseline system Phonetically balanced sentence speech corpora

LOTUS (Kasuriya et al., 2003) and the corpus developed internally

Read speech corpora 40.3 hours (68 male and 68 female)

Acoustic model adaptation data Selected from the speech corpus No overlap between adaptation data and test set

data

Language model training data Text corpus + transcript from speech corpus

excluded test set

16

Page 17: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Experimental condition

Acoustic model Gender-dependent acoustic model 12 MFCCs, delta, and delta energy Triphones, 1000 tied-states, 8 Gaussian mixtures

Language model Tri-grams

Dictionary size: about 18k words TITech WFST speech recognition system

(Dixon et al., 2007) was used as a speech decoder

17

Page 18: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Acoustic model adaptation

Supervised adaptation using MLLR F-condition adaptation

F0 : clean, planned F1 : clean, spontaneousF3 : music noise F4 : other noise

Adaptation data: 200 utterances regardless of speaker randomly selected from the speech corpus

Speaker adaptation Adaptation data: 200 utterances regardless of

F-condition randomly selected from the speech corpus

18

Page 19: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

F0 F1 F3 F4 Overall20

24

28

32

36

40

44

48

52

56

60

26.0

43.6

56.4

41.5 38.4

22.3

35.9 38.4

34.2 30.8

21.8

36.9 38.0

31.9 29.1

No adaptation F-condition adapt. Speaker adapt.

WER

(%

)WER results

19

Speaker adaptation yielded

better WER

F-condition

Proportion

Time #words

F0 35.3% 17160

F1 1.0% 629

F3 14.0% 7882

F4 49.7% 27542

Page 20: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Discussion

High WER Mismatch recording condition

The speech corpus was only used as testing and adaptation data

Small text corpus Inefficient language model

20

Page 21: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Conclusion

Construction of the first Thai broadcast news corpus and overview of the corpus analysis was presented

Speech corpus was annotated with structure information which is useful for further research purpose

An LVCSR system was setup and tested with the corpus

21

Page 22: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Future work

Applying our Thai language modeling technique (Jongtaveesataporn et al., 2007) Compound pseudo-morpheme (CPM) unit Pseudo-morpheme error rate (F0 condition)

Manually-segmented word unit system: 20.5% CPM unit system: 19.9%

Improving language model by using newspaper text

Collaboration with NECTEC: additional 50 hours of speech corpus

22

Page 23: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Thank you

23

Page 24: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Thank you

24

Page 25: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Thank you

25

Page 26: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Background

26

1987

Isolated syllable recogniti

on

1995

Isolated word

recognition

Connected sub-word

recognition

1999

Small task continuous

speech recognition

2003

LVCSR

2005

Broadcast

news LVCSR

2007

Difficulty

Thienlikit, 2004• Newspaper read-speech recognition

Page 27: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Development of Thai Broadcast News LVCSR System Development of an LVCSR system requires

speech and text corpora Existing speech corpora for Thai LVCSR

research NECTEC-ATR LOTUS (NECTEC) GlobalPhone (CMU)

27

Newspaper read-speech

1. Development of Thai broadcast news corpus• Speech corpus: training and testing

data• Text corpus: language modeling

2. Development of a prototype of LVCSR system

Page 28: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Experiments & Developed corpora Speech corpus

The size of the speech corpus is still rather small

It was used in three ways Test data Adaptation data A part of transcription text was used for

training LM

Text corpus It was used for training LM

28

Page 29: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Perplexity & OOV rates

F-conditio

n

Perplexity OOV rate

Male Female Male Female

F0 107.5 106.9 0.9 0.8

F1 126.4 100.1 0.9 0.6

F3 145.2 100.0 0.7 0.9

F4 141.6 157.6 1.5 1.9

Overall 126.9 125.6 1.2 1.3

29

Page 30: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Transcription processText corpus transcribing7 persons

Guideline

30

Speech corpus transcribing4 persons

Speech corpus checking2 persons

Lexical entries checking1 person

Speech corpus

Lexical entries checking1 person

Text corpus

Page 31: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Speech corpus

Transcription and annotation of about 17 hours of TV broadcast news

Tool: “Transcriber” (Barras et al., 2001)

Additional information speaker information: name, gender speaking mode: planned/spontaneous

speech Speech from announcers speaking in

the studio31

Page 32: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Transcription conventions

Guideline for the transcription process Segment segmentation Word segmentation Repeating word Thai/English abbreviation Number entity Special tags

32

Page 33: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Introduction

Thai speech processing research in TokyoTech Dialogue system [Whittiwiwattchai, 2003] LVCSR system

Dictation system [Tianlikid,2005] Broadcast news recognition system

33

Page 34: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Overview

Introduction Corpus description Recording and transcription

processes Corpus evaluation Conclusion

34

Page 35: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Thai language corpora

Large language corpora are crucial to a state-of-the-art natural language processing system

Thai speech resources for speech processing NECTEC-ATR LOTUS (NECTEC) GlobalPhone (CMU) TSynC-1 (NECTEC)

35

Newspaper read-speech

Unit-selection speech synthesis

Page 36: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

WER Result

F-conditionTime

proportion

WER (%)

Male Female

F0 28.1% 44.4 40.8

F1 1.5% 62.4 60.2

F3 11.5% 82.2 72.4

F4 58.9% 54.9 57.5

Overall 100% 56.8 45.5

36

Page 37: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Text corpus

Text transcribed from 35 hours of TV broadcast news

Additional information Speaking mode: planned/spontaneous

37

Page 38: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Transcription conventions (1) Sentence segmentation

No sentence marker in Thai language Ambiguous Grammatically, there are 3 types of

sentence Simple sentence Compound sentence Complex sentence

Sentence was defined as a simple sentence or clause with the help of delimited breaths

38

Composed from several of clauses or simple sentences

Page 39: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Transcription conventions (2) Word segmentation

No word boundary marker in Thai language

Lead to difficulties in transcription and data checking processes

Too ambiguous to define all rules A few rules of simple segmentation

patterns were defined Undefined patterns were left to the

decision of transcribers

39

Page 40: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Transcription conventions (3) Repeating word Thai/English abbreviation Number entity Special tags

Disfluencies, filled-pauses, exclamations Foreign words Some other events: uncertainly

transcribed part, etc.

40

Page 41: Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand

Recorded programs

News programs from one public TV station in Thailand was recorded

Total of 105 news episodes Speech corpus

35 news episodes About 17 hours of speech data

Text corpus: 70 news episodes

41

Security Issues of Electric Power Supply in Japan May 2005 Hiroshi Iwano Agency for Natural Resources and Energy (ANRE), Ministry of Economy, Trade and

Unexpected radioactive iodine accumulation on whole-body ...whole-body scan after I-131 ablation therapy for differentiated thyroid cancer Shingo Iwano 1, Shinji Ito 1, Shinichiro

Ekvacioj By Simpliﬁed Equations - MATHfe.math.kobe-u.ac.jp/FE/FE_pdf_with_bookmark/FE11... · 252 M. IWANo has a positive real part. Under these assumptions, by the use Hukuhara’sof

ProtoSteer: Steering Deep Sequence Model with Prototypes · sequence models, esp. recurrent neural networks (RNN) can be used • Yao Ming, Furui Cheng, and Huamin Qu are with Hong

· Michail Iwano- witsch Glinka, geboren am 1.Juni (jul. 20. Mai) 1804 im Gouver- nement Smolensk, gestorben arn 15. Februar 1857 in Berlin schen Opern gefunden, und so musste

MR Imaging vs CT in Subacute Sclerosing Panencephalitis fileKazuhiro Tsuchiya' Teiyu Yamauchi Shigeru Furui Yoshio Suda Eiichi Takenaka Received September 11 , 1987; accepted after

Furui Cheng, Yao Ming, Huamin Qu - arxiv.org

Palm Olein in Automotive Diesel for Ueda n Furui

2. Review of Literature - INFLIBNETshodhganga.inflibnet.ac.in/bitstream/10603/7503/5/05_chapter 2.pdf · 2. Review of Literature ... automatic speaker recognition system. S. Furui

Ryan Bishop, et al. v. Ballard Power Systems Inc., et al ...shareholdersfoundation.com/system/files/complaints/ballard_power... · Zhangjiagang Furui Special Equipment Co., Ltd. ("Furuise"),

Automatic Speech Recognition: Trials, Tribulations and ...events.eventact.com/afeka/aclp2012/ASR Trials Tribulations and Triumphs.pdfTrials, Tribulations and Triumphs Sadaoki Furui

CausalVAE: Disentangled Representation Learning via Neural ...CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models Mengyue Yang 1, Furui Liu , Zhitang

FissiontrackageoftheOSIド｜フu汀 ... and Iwano... · 山e叩ljel-partoftheKobanatolowestpal-toftheO卯neFormationsいI㱭artlyl・c？visedfromTanakaandTali2lhashi，1998）．

Social Dividend and Employment Cheng Furui Tsinghua University, China 2007-01-25

THE USAGE OF THE DIGITS OF A CAPTIVE AYE … Study Monographs, 12(2): 87-98, August 1991 THE USAGE OF THE DIGITS OF A CAPTIVE AYE-AYE (Daubentonia madagascariensis) 87 Taizo IWANO

Copyright by Kenji Furui 2004A COMPREHENSIVE SKIN FACTOR MODEL FOR WELL COMPLETIONS BASED ON FINITE ELEMENT SIMULATIONS by Kenji Furui, B.S., M.S. Dissertation Presented to the Faculty

[Sadaoki Furui] Digital Speech Processing, Synthes(BookFi.org)

References - dbd.puc-rio.br€¦ · Intelligent well completions. SPE 80993, Technology Today Series, August 2003. SAKOWSKI, S.A.; ANDERSON, A.; FURUI, K. Impact of intelligent well

1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by

50 years of progress in speech recognition technologymarvaini/intonation/furui-icassp2007.pdf · Histogram illustrating the prediction of phonemes in a test sentence ... • Isolated

CausalVAE: Structured Causal Disentanglement in Variational … · CausalVAE: Structured Causal Disentanglement in Variational Autoencoder Mengyue Yang1 Furui Liu 2Zhitang Chen Xinwei

Yoshi · maker of the entire era. Around MAN-JI 2, 1659, he made his change from FURUI "Old" KO, to the TORA "Tiger," another way of KO. KOTETSU had become well used to marketing

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui

Sappo ctory presents cozy a —Y) …...Sappo ctory presents cozy a —Y) DavidSanbornØ7JV / X'/AÎSong From The Night BeforeJ V *IJYfHEfëY*JV Furui Riho Furui Riho