detecting intoxicated speech - columbia universitydgw2109/presentation.pdf · alc – alcohol...

{

Detecting Intoxicated Speech

Daniel Wilkey John Graham CS6998

Given speech, was the speaker intoxicated?

Interspeech 2011 Intoxication Challenge

Application for field sobriety testing, ignition-guards

Background

ALC – Alcohol Language Corpus

162 total participants: 84 male, 78 female

Participants reached a BAC .28 – 1.75

Read 15 minutes of intoxicated speech

Returned 2 weeks later

Read 30 minutes of sober speech

The Corpus

5400 samples in total, 75 per person

Divided into 3 sets:

Development, Training, Test

Development & Training are labeled with 4368 features

Used cross validation to obtain results

The Corpus p2

Shrikanth Narayanan of UCLA

Global speaker normalization

Normalizing by the sober class

Relative improvement of 7.04% overall

Professor Hirchberg

Phonotactic and phonetic cues

Experiment tests un-weighted average recall… why?

We chose f-measure

Includes recall and precision

Prior Research

Remove extraneous features with WEKA

Info-gain ratio algorithm

MFCC features performed well

No F0-based features near the top

Experiment Preparation

Ignore test set

unlabeled

Down-sampling the training set

Achieved 50/50 ratio of alcoholised to non-alcoholised speech

Experiment Preparation

Global Speaker Normalization (Narayanan) Insignificant negative change

Sober class normalization (Narayanan) Insignificant negative change

Gender class normalization Insignificant positive change

Combining global speaker with gender normalization 10.75% relative improvement in f-measure

Poor performance potentially related to some F0 features being filtered out

Normalization Attempts

Tried retesting data with fringe cases omitted

Fringe case BAC between .08% and .16% proposed by Batliner

We tried .02% to .08%

Difference in data set and threshold

Relative decrease of F-measure by 3.25%

On the Fringe

Machine Learning Optimizations

Optimizing the SVM

Varied polynomial kernels

Radial basis function (RBF)

Varying number

Folds

Iterations

Optimization Techniques

Configuration

SVM kernel n=3

10-fold cross validation

Gender normaliation

Sober class normalization

Final Results Difficult to compare!!

Difficult to compare results

Need better corpus

Extend with GMM super-vectors

Conclusions / Extensions

detecting intoxicated speech - columbia universitydgw2109/presentation.pdf · alc – alcohol...

Documents