a web application for automated dialect...

1
“Semi-automated” – e.g. FAVE (fave.ling.upenn.edu) Alignment: automated with dynamic programming) Formant extraction: automated with LPC Transcription: manual We now have access to thousands of hours of speech – manual transcription is impossible. A Web Application for Automated Dialect Analysis Sravana Reddy & James Stanford, Dartmouth College Socio-phoneticians study accents and social variables. Quantify accent with formants (resonance frequencies), F1 & F2. Accents = systematic shifts in formant space. Common task: audio formants of each vowel. Problem: Vowel Formant Extraction wave spectro- gram phones Northern speaker F1: 500Hz F2: 3000Hz Southern speaker F1: 1000Hz F2: 2000Hz Transcription “paper” Alignment Formant Extraction DARLA darla.dartmouth.edu Existing Tools Automate transcription with speech recognition … but isn’t speech recognition inaccurate? Insight: stressed vowels are usually correct Filter out vowels with low acoustic confidence. Result: Formants from completely automated system formants from semi-automated. Our Idea REF: no it’s it’s wood turning HYP: no it it would turn it REF: a real dog and cat and all the others HYP: a real docking tap and on the others 2000 1800 1600 1400 1200 1000 700 600 500 400 F2 F1 IY AY EH AA IH UW AO AH OW EY AE ER OY AW UH EH AY AA IH IY UW AO AH OW EY AE UH ER AW OY Where is Obama from? Semi-Automated Completely Automated Speech recognition with CMU Pocketsphinx Generic English acoustic models trained on LibriSpeech (400 hours), language models on WSJ and Fisher transcripts. Alignment and formant extraction with FAVE. Web interface accepts files or YouTube links. Processing time is about 3x the audio length. Implementation

Upload: vominh

Post on 08-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Web Application for Automated Dialect Analysiscs.wellesley.edu/~sravana/slides/darla_naacl_poster.pdf · “Semi-automated” – e.g. FAVE (fave.ling.upenn.edu) • Alignment:

  “Semi-automated” – e.g. FAVE (fave.ling.upenn.edu)

•  Alignment: automated with dynamic programming)•  Formant extraction: automated with LPC•  Transcription: manual

  We now have access to thousands of hours of speech – manual transcription is impossible.

A Web Application for Automated Dialect Analysis!!

Sravana Reddy & James Stanford, Dartmouth College

•  Socio-phoneticians study accents and social variables.•  Quantify accent with formants (resonance frequencies), F1 & F2.

•  Accents = systematic shifts in formant space.

•  Common task: audio à formants of each vowel.

Problem: Vowel Formant Extraction

wave

spectro- gram

phones

Northern speaker

F1:  500Hz  

F2:  3000Hz  

Southern speaker

F1:  1000Hz  

F2:  2000Hz  

Transcription “paper”

Alignment

Formant Extraction

DARLA!

darla.dartmouth.edu

Existing Tools

Automate transcription with speech recognition… but isn’t speech recognition inaccurate?

Insight: stressed vowels are usually correct

•  Filter out vowels with low acoustic confidence.

•  Result: Formants from completely automated system ≅ formantsfrom semi-automated.

Our Idea

REF: no it’s it’s wood turning HYP: no it it would turn it REF: a real dog and cat and all the others HYP: a real docking tap and on the others

2000 1800 1600 1400 1200 1000

700

600

500

400

Vowel Space

F2

F1

● Obama_ManualObama_Automated

IY

AY

EH

AA

IH

UW

AO

AH

OWEY

AE

ER

OY

AW

UH

EH

AY

AA

IH

IYUW

AO

AH

OW

EY

AE

UH

ER

AW

OY

Where is Obama from?

Semi-Automated

Completely Automated

•  Speech recognition with CMU Pocketsphinx•  Generic English acoustic models trained

on LibriSpeech (400 hours), language models on WSJ and Fisher transcripts.

•  Alignment and formant extraction with FAVE.•  Web interface accepts files or YouTube links.•  Processing time is about 3x the audio length.

Implementation