practical hidden voice attacks against speech and speaker

PracticalHiddenVoiceAttacksagainstSpeechandSpeakerRecognitionSystems

HadiAbdullah,WashingtonGarcia,ChristianPeeters,PatrickTraynor,KevinR.B.Butler,andJosephWilson

FloridaInstituteforCybersecurityResearch 2

VoiceasanInterface

3

InjectingCommands

4

InjectingCommands(Demonstration)

5

Youmightbethinking...

●  Benign? ●  Easy to Defend?

6

Is there a generic, transferrable way to produce audio that:

●  sounds like noise to humans, ●  sounds like a valid command to the system? ●  works against both speech and speaker recognition

systems ●  with Black-Box access to target system

OurWork!

7

Structure

Modern Speech Recognition Systems Feature Extraction

How the Attack Works

Demo

Takeaway

8

AudioSample InferencePreprocessing FeatureExtraction

ModernSpeechRecognitionSystems

9

AudioSample Preprocessing FeatureExtraction Inference


10

Preprocessing FeatureExtraction InferenceAudioSample


AudioSample

11

FeatureExtraction InferencePreprocessingAudioSample


AudioSample

12

FeatureExtraction InferencePreprocessing


AudioSample

13


MostAttacks**●  N.CarliniandD.Wagner,“AudioAdversarialExamples:TargetedAttacksonSpeech-to-Text,”IEEEDeepLearningandSecurityWorkshop,2018●  N.Carlini,P.Mishra,T.Vaidya,Y.Zhang,M.Sherr,C.Shields,D.Wagner,andW.Zhou,“Hiddenvoicecommands.”inUSENIXSecuritySymposium,2016●  X.Yuan,Y.Chen,Y.Zhao,Y.Long,X.Liu,K.Chen,S.Zhang,H.Huang,X.Wang,andC.A.Gunter,“Commandersong:Asystematicapproachforpracticaladversarialvoice

recognition,”inProceedingsoftheUSENIXSecuritySymposium,2018.●  M.Alzantot,B.Balaji,andM.Srivastava,“Didyouhearthat?Adversarialexamplesagainstautomaticspeechrecognition,”NIPS2017MachineDeceptionWorkshop


AudioSample

14


OurAttack

AudioSample

MostAttacks*

MODELDOESNOTMATTER!!


15

Modern Speech Recognition Systems

Feature Extraction How the Attack Works

Demo

Takeaway

Structure

16

●  Designed to approximate the human ear ●  Retains the most important features ●  Magnitude Fast Fourier Transform (mFFT)

FeatureExtraction

17

●  Converts time domain to frequency domain ●  Multiple Inputs can have same output ●  mFFT is lossy

FeatureExtraction(mFFT)

18


Feature Extraction

How the Attack Works Demo

Takeaway

Structure

19

●  Grouped into 4 types

HowtheAttackWorks(Types)

20

●  Intelligibility hard to measure ●  Fundamentals of psychoacoustics ●  Spread energy across spectrum

HowtheAttackWorks(Psychoacoustics)

21

Task Data

Queries

Models

> 20,000 successful attack samples 22 speakers

<10 queries to model (a few seconds!)

Noise -> text Noise -> user

x2

Speaker Speech

HowtheAttackWorks(Evaluation)

22


Feature Extraction


Demo Takeaway

Structure

23

Demo!(1/2)

Moreat:https://sites.google.com/view/practicalhiddenvoice/home

24

Demo!(2/2)


25


Demo!(2/2)

26


Feature Extraction


Demo

Takeaway

Structure

27

Takeaway

●  Simple, efficient audio transformations yield “noise” that is understood as commands by speech systems

●  The model is irrelevant ●  All systems we tested are vulnerable ●  Achieve the same goals as traditional Adversarial ML

[email protected]

hadiabdullah1

Projectwebpage:

sites.google.com/view/practicalhiddenvoice/home

29

VoiceBiometrics

●  Easy to get around ●  Artificially generate a target’s speech ○  LyreBird

●  Capture and stitch together target’s speech

30

Defenses

●  Must be implemented at or before feature extraction ●  Adversarial Training? ●  Voice Activity Detection? ●  Environmental Noise ●  Liveness Detection ○  Blue et al.*

* L. Blue, L. Vargas, and P. Traynor, “Hello, is it me you‘re looking for? Differentiating between human and electronic speakers for voice interface security,” in 11th ACM Conference on Security and Privacy in Wireless and Mobile Networks, 2018.

practical hidden voice attacks against speech and speaker

Documents