a web application for automated dialect...

  “Semi-automated” – e.g. FAVE (fave.ling.upenn.edu)

•  Alignment: automated with dynamic programming)•  Formant extraction: automated with LPC•  Transcription: manual

  We now have access to thousands of hours of speech – manual transcription is impossible.

A Web Application for Automated Dialect Analysis!!

Sravana Reddy & James Stanford, Dartmouth College

•  Socio-phoneticians study accents and social variables.•  Quantify accent with formants (resonance frequencies), F1 & F2.

•  Accents = systematic shifts in formant space.

•  Common task: audio à formants of each vowel.

Problem: Vowel Formant Extraction

wave

spectro- gram

phones

Northern speaker

F1: 500Hz

F2: 3000Hz

Southern speaker

F1: 1000Hz

F2: 2000Hz

Transcription “paper”

Alignment

Formant Extraction

DARLA!

darla.dartmouth.edu

Existing Tools

Automate transcription with speech recognition… but isn’t speech recognition inaccurate?

Insight: stressed vowels are usually correct

•  Filter out vowels with low acoustic confidence.

•  Result: Formants from completely automated system ≅ formantsfrom semi-automated.

Our Idea

REF: no it’s it’s wood turning HYP: no it it would turn it REF: a real dog and cat and all the others HYP: a real docking tap and on the others

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

2000 1800 1600 1400 1200 1000

700

600

500

400

Vowel Space

F2

F1

● Obama_ManualObama_Automated

IY

AY

EH

AA

IH

UW

AO

AH

OWEY

AE

ER

OY

AW

UH

EH

AY

AA

IH

IYUW

AO

AH

OW

EY

AE

UH

ER

AW

OY

Where is Obama from?

Semi-Automated

Completely Automated

•  Speech recognition with CMU Pocketsphinx•  Generic English acoustic models trained

on LibriSpeech (400 hours), language models on WSJ and Fisher transcripts.

•  Alignment and formant extraction with FAVE.•  Web interface accepts files or YouTube links.•  Processing time is about 3x the audio length.

Implementation

a web application for automated dialect...

Documents