privacy protection for life-log video

41
Privacy Protection for Life-log Video Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington, KY 40507

Upload: rico

Post on 12-Jan-2016

50 views

Category:

Documents


5 download

DESCRIPTION

Privacy Protection for Life-log Video. Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington, KY 40507. Outline. Motivation and Background Proposed Life-Log System Privacy Protection Methodology Face Detection and Blocking - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Privacy Protection for Life-log Video

Privacy Protection for Life-log Video

Jayashri Chaudhari

November 27, 2007

Department of Electrical and Computer EngineeringUniversity of Kentucky, Lexington, KY 40507

Page 2: Privacy Protection for Life-log Video

Outline

Motivation and Background Proposed Life-Log System Privacy Protection Methodology

Face Detection and Blocking Voice Segmentation and Distortion

Experimental Results Segmentation Algorithm Analysis Audio Distortion Analysis

Conclusions

Page 3: Privacy Protection for Life-log Video

What is a Life-Log System?

Applications include• Law enforcement

• Police Questioning

• Tourism

• Medical Questioning

• Journalism

“A system that records everything, at every moment and everywhere you go”

Existing Systems/work

1) “MyLifeBits Project”: At Microsoft Research

2) “WearCam” Project: At University of Toronto, Steve Mann

3) “Cylon Systems”: http::/cylonsystems.com at UK (a portable body worn surveillance system)

Page 4: Privacy Protection for Life-log Video

Technical Challenges

Security and Privacy Information management and storage Information Retrieval Knowledge Discovery Human Computer Interface

Page 5: Privacy Protection for Life-log Video

Technical Challenges

Security and Privacy Information management and storage Information Retrieval Knowledge Discovery Human Computer Interface

Page 6: Privacy Protection for Life-log Video

Why Privacy Protection?

Privacy is fundamental right of every citizen Emerging technologies threaten privacy right There are no clear and uniform rules and

regulations regarding video recording People are resistant toward technologies like

life-log Without tackling these issues the deployment of

such emerging technologies is impossible

Page 7: Privacy Protection for Life-log Video

Research Contributions

Practical audio-visual privacy protection scheme for life-log systems

Performance measurement (audio) onPrivacy protectionUsability

Page 8: Privacy Protection for Life-log Video

Proposed Life-log System

“A system that protects the audiovisual privacy of the persons captured by a portable video recording device”

Page 9: Privacy Protection for Life-log Video

Privacy Protection Scheme

Design Objectives

• Privacy• Hide the identity of the subjects being captured

• Privacy verses usefulness: • Recording should convey sufficient information to be useful

√ Usefulness× Privacy

× Usefulness√ Privacy

√ Usefulness√ Privacy

Page 10: Privacy Protection for Life-log Video

Design Objectives Anonymity or Ambiguity

• The scheme should generate ambiguous identity of the recorded subjects.

• Every individual will look and sound identical• Reduce correlation attacks

Speed• Protection scheme should work in real time

Interview Scenario• Producer is speaking with a single subject in relative quiet

room

Page 11: Privacy Protection for Life-log Video

Privacy Protection Scheme Overview

audio

Audio Segmentation

Audio Segmentation

Audio Distortion

Audio Distortion

Face Detection and

Blocking

Face Detection and

Blocking

videoSynchronization & Multiplexing

Synchronization & Multiplexing

storage

S

P

S: Subject (The person who is being recorded)

P: Producer (The person who is the user of the system)

Page 12: Privacy Protection for Life-log Video

Voice Segmentation and distortion

Statek=Statek-1 or Subject or Producer

Windowed

Power, Pk

Computation

Windowed

Power, Pk

ComputationPk <TSPk <TS Pk <TP

Pk <TP

Y Y

Statek= Producer

Statek= Subject

Storage

Pitch Shifting

We use the PitchSOLA time-domain pitch shifting method.

* “DAFX: Digital Audio Effects” by Udo Zölzer et al.

Page 13: Privacy Protection for Life-log Video

Pitch Shifting Algorithm

Pitch Shifting (Synchronous Overlap and Add):

Steps 1) Time Stretching by a factor of α using window of size N and stepsize Sa

Input Audio

N

X1(n)

SaX2(n)

α*Sa

Step 2) Re-sampling by a factor of 1/α to change pitch

X2(n) X2(n)Km

Max correlationReduce discontinuity in phase and pitchMixing

Page 14: Privacy Protection for Life-log Video

Face Detection and Blockingcamera

FaceDetection

FaceDetection

Face detection is based on Viola & Jones 2001.

FaceTracking

FaceTracking

SubjectSelection

SubjectSelection

SelectiveBlocking

SelectiveBlocking

Audio segmentationresults

Subjecttalking

Producertalking

Page 15: Privacy Protection for Life-log Video

Initial Experiments1

• Analysis of Segmentation algorithm

• Analysis of Audio distortion algorithm

1) Accuracy in hiding identity

2) Usability after distortion

1: Chaudhari J., S.-C. Cheung, and M. V. Venkatesh. Privacy protection for life-log video. In IEEE Signal Processing Society SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics, 2007.

Page 16: Privacy Protection for Life-log Video

Segmentation ExperimentExperimental Data:

• Interview Scenario in quiet meeting room

• Three interviews recording of about 1 minute and 30 seconds long

Transitions

P S P S P PS Silence

S: Subject Speaking

P: Producer Speaking

Page 17: Privacy Protection for Life-log Video

Segmentation Results

Meeting# Transition#

(Ground truth)

Correctly identified transitions#

Falsely detected

Transitions#

Precision Recall

1 7 6 10 0.375 0.857

2 7 7 5 0.583 1

3 6 6 10 0.353 1

truthgroundin stransition#

ns transitioidentifiedcorrectly #Recall

ns transitioidentified #

ns transitioidentifiedcorrectly # Precision

Page 18: Privacy Protection for Life-log Video

Comparison With CMU Segmentation Algorithm

Meeting # Our Algorithm CMU Algorithm

Precision Recall Precision Recall

1 0.375 0.857 0.667 0.57

2 0.583 1 1 0.57

3 0.353 1 0.4 0.5

CMU audio segmentation algorithm1 used as benchmark

1:Matthew A. Seigler, Uday Jain, Bhiksha Raj, and Richard M. Stern. Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings of the Ninth Spoken Language Systems Technology Workshop, Harriman, New York, 1997.

Page 19: Privacy Protection for Life-log Video

Speaker Identification Experiment

Experimental Data

• 11 Test subjects, 2 voice samples from each subject

• One voice sample is used as training and the other is used for testing

• Public domain speaker recognition software

Script1This script is used for training the speaker recognition software

Train

TestScript2This script is used to test the performance of audio distortion in hiding the identity

Page 20: Privacy Protection for Life-log Video

Speaker Identification Results

Person ID

Without Distortion

(Person ID identified)

Distortion 1

(Person ID identified)

Distortion 2

(Person ID identified)

Distortion 3

(Person ID identified)

1 1 5 8 5

2 2 6 8 6

3 3 5 3 5

4 4 6 6 5

5 5 3 10 6

6 6 8 6 5

7 7 5 2 5

8 8 10 11 5

9 9 5 8 5

10 10 5 2 5

11 11 4 8 5

Error Rate

0% 100% 90.9% 100%

Distortion 1: (N=2048, Sa=256, α =1.5) Distortion 2: (N=2048, Sa=300, α =1.1)

Distortion 3: (N=1024, Sa=128, α =1.5)

Page 21: Privacy Protection for Life-log Video

Usability Experiments

Experimental Data

• 8 subjects, 2 voice samples from each subject

• One voice sample is used without distortion and the other is distorted

• Manual transcription (5 human tester)

1.Wav (transcription1)1.Wav (transcription1)This transcription is of undistorted This transcription is of undistorted voice --- stored in one dot wav file.voice --- stored in one dot wav file.

2.Wav (transcription2)2.Wav (transcription2)This transcription is of distorted voice This transcription is of distorted voice sample --- in two dot wav ---.sample --- in two dot wav ---.

Manual Transcription

Unrecognized words

Page 22: Privacy Protection for Life-log Video

Usability after distortion

Word Error Rate: Standard measure of word recognition error for speech recognition system

WER= (S+D+I) /N

S = # substitution

D = # deletion

I = # insertion

N = # words in reference sample

Tool used: NIST tool SCLITE

Page 23: Privacy Protection for Life-log Video

Extended Experiments

Data set TIMIT (Texas Instruments and Massachusetts Institute of

Technology) Speech Corpora

Experimental Setup Allowable range of alpha (α): 0.2-2.0 Five alpha values (α=0.5,0.75,1,1.25,1.40) Increase the scope of experiments

• “Subjective Experiments”: Use testers to access privacy and usability

Privacy Experiments (Speaker Identification)

Page 24: Privacy Protection for Life-log Video

• Total 30 audio clips in each set

• Re-divide the audio clips from each sets into five groups (1-5)

• Each group consists of 6 audio clips randomly selected from each set

• Each group was assigned to three testers and were asked to do 3 tasks

TIMIT Corpora

(630 speakers, 10 audio clips per speaker)

Our Experiments

(30 speakers, 5 audio clips per speaker)

Set A

(α=1)

Set B

(α=0.5)

Set C

(α=0.75)

Set E

(α=1.40)Set D

(α=1.25)

Experimental Setup

Page 25: Privacy Protection for Life-log Video

Task 1: Transcribe audio clips in the assigned group.

Purpose: Determine usability of the recording after distortion

Results:Metric: WER for each transcription by the

testerAverage WER for each clip from 3 testers

WER for Speaker with the given alpha(α) value

Subjective Experiments

Page 26: Privacy Protection for Life-log Video

(Effect of distortion on WER) Average WER for Set A,B,C,D,E

0

20

40

60

80

100

120

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29Person ID(1-30)

Aver

age

WER

Per

cent

age

set A

set B

set C

set D

set E

Page 27: Privacy Protection for Life-log Video

0

10

20

30

40

50

60

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

set A

set C

set D

set E

Average WER per speaker for each alpha value

(0-30)

(0-60)

(0-35)

(0-35)

Page 28: Privacy Protection for Life-log Video

Average WER per Set

Avg WER for each set

0

20

40

60

80

100

120

1

Avg

WE

R

A B C D E

14.2

100

22.4 15.3 14.4

Sets

Page 29: Privacy Protection for Life-log Video

Statistical Analysis Z-test calculations

Null Hypothesis: The average WER does not change (from Set A (before distortion) ) after the distortion for a given value of pitch scaling parameter (alpha)

H0: p1 = p2 (Null Hypothesis) Ha: p1 != p2

Parameters Value

Population Size 12*30=360

α 0.05

Confidence Level 95%

Z-Test critical

( |Zα/2| )

1.96

Rule for Rejection of H0

Z>=Zα/2 or

Z<=-Zα/2

Comparison Statistics

Set A and B (0.50) 46.71>=1.96

Set A and C (0.75) 2.873>=1.96

Set A and D (1.25) 0.419<=1.96

Set A and E (1.40) 0.0695<=1.96

Z-Test parameters Z-Test Results

Page 30: Privacy Protection for Life-log Video

Subjective Experiments

Group Average # of distinct voices per subset

(Each subset consist of 6 audio clips)

Subset of

A

Subset of

B

Subset of

C

Subset of

D

Subset of

E

1 6.0 3.33 4.33 4.0 3.33

2 6.0 3.0 3.33 4.0 4.0

3 6.0 2.0 4.0 3.0 4.0

4 6.0 2.67 4.0 3.67 2.67

5 6.0 3.0 3.0 3.67 4.0

Average Number of Distinct voices

6.0 2.75 3.92 3.67 3.50

Task 2: Identify the number of distinct voices in each subset in the assigned group.

Purpose: Estimate ambiguity created by pitch shifting

Results:

Page 31: Privacy Protection for Life-log Video

Subjective Experiments

Task 3: For each clip from subset of Set A (which is the original un-distorted speech set); identify a clip in other subsets in which the same speaker may be speaking

Purpose: Qualitatively measure the assurance of Privacy Protection achieved by distortion

Results: None of the speakers from set A was identified from other distorted sets. (100% Recognition Error Rate)

Page 32: Privacy Protection for Life-log Video

Privacy Experiments

Speaker Identification Experiments

ASR tools (LIA_Spk-Det and ALIZE)1 by LIA lab at the University of Avignon Speaker Verification Tool GMM-UBM (Gaussian Mixture Model-Universal

Background Model)• Single Speaker Independent Background Model

• Decision: Likelihood Ratio:

1: Bonastre, J.-F., Wild, F., Alize: a free, open tool for speaker recognition, http://www.lia.univ-avignon.fr/heberges/ALIZE/

0

1

( | )

( | )

p Y H

p Y H

Page 33: Privacy Protection for Life-log Video

LIA_RAL Speaker-Det

WarpingTrainingInitialization

World Modeling

Bayesian Adaptation (MAP)

Target Speaker Modeling

32 coefficients = 16 LFCC + 16 derivative coefficients

(SPRO4)

2 GMM (2048 components)

1: Male 2:Female

Feature Extraction

(SPRO Tool)

Silence Frame Removal

(EnergyDetector)

Parameter Normalization

(NormFeat)

Front Processing

Adapts a World Model

(TrainWorld)(TrainTarget)

Speaker Detection

(ComputeTest)

( | )( | ) log

( | )

l s TLLR s T

l s W

Feature Vectors

Page 34: Privacy Protection for Life-log Video

Experimental Setup World Model

Number of male speakers     = 325 Number of female speakers   = 135

Target Speaker Model Number of male test clips   =  20  Number of female test clips =  10

Two sets of experiments Same Model:

• World Model and Individual Speaker Models: (Training Set: distorted speech with the corresponding alpha)

Cross Model: • World Model and Individual Speaker Models: (Training Set: un-

distorted speech)

Page 35: Privacy Protection for Life-log Video

Privacy Results

Alpha Sex Same Model Cross Model

Set A M 1.0 1.0

Set A F 4.4 4.4

Set B M 2.5 150.75

Set B F 1.7 57.80

Set C M 8.65 170.90

Set C F 5.4 46.40

Set D M - 185.75

Set D F 20.30 67.80

Set E M 52.05 157.45

Set E F 29.20 79.80

Conclusions

• Cross Model: Distorted speech, no matter what alpha value is used, is very different from the original speech.

• Same Model: Set B and Set C do not provide adequate protection as the rank is still very near the top.

• Numbers in table is Average rank for the true speakers of the test clips for the corresponding alpha value

Page 36: Privacy Protection for Life-log Video

Example Video

Page 37: Privacy Protection for Life-log Video

Conclusions

Proposed Real time implementation of voice-distortion and face blocking for privacy protection in Life-log video

Analysis of Audio Segmentation Analysis of Audio Distortion for usability Analysis of Audio Distortion for privacy protection

Page 38: Privacy Protection for Life-log Video

Acknowledgment

• Prof. Samson Cheung• People at Center of Visualization and

Virtual Environment• Prof. Donohue and Prof. Zhang

Thank you!

Page 39: Privacy Protection for Life-log Video
Page 40: Privacy Protection for Life-log Video

Voice Distortion

Voice Identity Vocal Track (Formats) : Filters Vocal Chord (Pitch): Excitation Source

Different ways to distort audio: Random mixture

• Makes the recording useless Voice Transformation

• For example, • More Complex, not suitable for real-time applications

Pitch-shifting • Changes the pitch of voice• Keeps the recording useful

PitchSOLA time-domain pitch shifting method. * “DAFX: Digital Audio Effects” by U. Z. et al. Simple with less complexity

Page 41: Privacy Protection for Life-log Video

• Cross Model:

• World Model and Individual Speaker Models: (Training Set: un-distorted speech)

• Same Model

• World Model and Individual Speaker Models: (Training Set: distorted speech)