for low-resource languages tts and data selection: improving … · 2015-08-03 · tts and data...

39
TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015

Upload: others

Post on 18-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

TTS and Data Selection: Improving Systems for Low-Resource Languages

Chevy Levitan, DREU 2015

Page 2: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

outline

I. Project

II. Approach

III. Methods

IV. Status

V. Future

Page 3: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

I.Project

synthesize natural, intelligible voices for low resource languages using data selection

Page 4: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

motivation

▷ bridge the gap

Page 5: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

motivation

▷ bridge the gap▷ allow for cross-language communication

Page 6: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

why data selection?

Page 7: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

HRLs vs. LRLs

★ prepared data★ abundance of

training material

high quality speech systems

★ found data★ limited training

material

low quality speech systems

Page 8: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

A. filter out unwanted data from training set

Page 9: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

A. filter out unwanted data from training set

B. supplement limited LRL data with choice data from similar HRL

Page 10: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

II.APPROACH

preparing the experiment

Page 11: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

▷ Boston Radio News Corpus▷ pre-processed▷ English

corpus

Page 12: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

dat

a se

lect

ion

pro

cess

extract features

sort values

create subsets

synthesize data

Page 13: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

evaluate.

Page 14: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

evaluate.compare/contrast voices

Page 16: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

solution

1. subset data2. complete dataset

Page 17: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

III.METHODStesting our hypothesis

Page 18: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

★ follow standard procedures for evaluating TTS voices

standards

Page 19: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

★ follow standard procedures for evaluating TTS voices

★ successful voice = intelligible + natural

standards

Page 20: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

★ follow standard procedures for evaluating TTS voices

★ successful voice = intelligible + natural

★ use crowdsourcing for unbiased results

standards

Page 21: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

Intelligibility➔ transcribe nonsense sentences➔ accurate transcription = intelligible voice

mechanical turk

Page 22: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

Intelligibility➔ transcribe nonsense sentences➔ accurate transcription = intelligible voice

mechanical turk

Naturalness➔ use Likert scale to rate voices from very unnatural to very natural➔ identify the voices are categorized as natural+

Page 23: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline
Page 24: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

IV.STATUSour current state

Page 25: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets

intelligibility HIT

Page 26: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets✓ synthesize voices with this data

intelligibility HIT

Page 27: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets✓ synthesize voices with this data✓ design and implement HIT

intelligibility HIT

Page 28: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets✓ synthesize voices with this data✓ design and implement HIT✓ publish on MTurk site

intelligibility HIT

Page 29: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets✓ synthesize voices with this data✓ design and implement HIT✓ publish on MTurk site✓ workers complete HITs

intelligibility HIT

Page 30: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ created subsets✓ synthesized voices with this data✓ design and implement HIT✓ publish on MTurk site✓ workers complete HITs✓ accept/reject work

intelligibility HIT

Page 31: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets

naturalness HIT

Page 32: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets✓ synthesize voices with this data

naturalness HIT

Page 33: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets✓ synthesize voices with this data✓ design and implement HIT

naturalness HIT

Page 34: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

✓ create subsets✓ synthesize voices with this data✓ design and implement HIT

- publish on MTurk site- workers complete HITs- accept/reject work

naturalness HIT

Page 35: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

V.FUTURE

further exploration of this research

Page 36: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

evaluationanalyze mechanical turk responses

Page 37: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

evaluationanalyze mechanical turk responses

low-resourceimplement data selection for LRLs

Page 38: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

evaluationanalyze mechanical turk responses

textapply similar methods to automatically select text data

low-resourceimplement data selection for LRLs

Page 39: for Low-Resource Languages TTS and Data Selection: Improving … · 2015-08-03 · TTS and Data Selection: Improving Systems for Low-Resource Languages Chevy Levitan, DREU 2015. outline

Thanks!Any questions?