writehear

27
WriteHe ar Jesse Miller Dayton, OH ISSI

Upload: jesse-miller

Post on 22-Jan-2018

289 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WriteHear

WriteHearJesse Miller

Dayton, OHISSI

Page 2: WriteHear

2

I. What is WriteHear?A.ConceptB.How it works

1.Software2.Hardware

II.Testing and analysisA.HardwareB.Software

Overview

Page 3: WriteHear

3

What is WriteHear?

Page 4: WriteHear

4

Audio Voice Recognition

Visual Handwriting Recognition

Audio Handwriting Recognition

Concept

Page 5: WriteHear

5

Concept

Sounds produced by handwriting

Recorded by microphone

Transcribed by software

Page 6: WriteHear

6

Motivation

Using audio alone:

• Transcribe handwriting

• Verify signatures

• Send encrypted messages

Page 7: WriteHear

7

Past ResearchSketch Recognition Lab at Texas A&M

Did Not Did• Continuous character

recognition• Noise reduction• Test various sets of

characters• Test different

hardware setups• Signature verification

• Distinguish between uppercase characterswith %86.8 accuracy

Page 8: WriteHear

8

Am

plit

ude

Time (sec)

Software

Templates

Moving Average

Input Character

Signal

Endpoint Noise

Removal

Dynamic Time

WarpingResult

System Architecture

Am

plitu

de

Time (sec)

A B C…

Is Character?

yes

No

Page 9: WriteHear

9

Moving Average

Software

Window Size=45ms

Increment=15ms

Input Signal Moving Average

Page 10: WriteHear

10

Endpoint Noise Removal

Software

𝐸 = 𝑠𝑖𝑔𝑛𝑎𝑙 𝑒𝑛𝑒𝑟𝑔𝑦 =1

𝑛

𝑖=1

𝑛

𝑆𝑖2 𝐸𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟 > 𝑇 × 𝐸𝑛𝑜𝑖𝑠𝑒

T is a fixed threshold

Moving Average Noise Removal

Page 11: WriteHear

11

Dynamic Time Warping

Software

Method for Comparison:• Stretches signal along time

axis• Independent of signal

length

Page 12: WriteHear

12

Dynamic Time Warping

Software

2 2

4

0

1

4

1 1

1 2 3 4

Time

1 1 3 1

2 2 0 4

1 1 3 1

1 1 3 1

𝑑𝑟,𝑐=|A(r)-B(c)|

A

B

𝑆𝑐𝑜𝑟𝑒 =1

min(𝑙𝑒𝑛𝑔𝑡ℎ𝐴, 𝑙𝑒𝑛𝑔𝑡ℎ𝐵)

𝑖=1

𝑛

𝑝(𝑖)

=1

41 + 1 + 0 + 1 + 1 = 1

p(i)

start

end

p=path of minimal accumulated distance

Page 13: WriteHear

13

Software

• Sakoe-Chiba band width of 8% used to restrict warping and maximize accuracy

• Square root of the signals are used for comparison to maximize accuracy

Dynamic Time Warpingbandwidth

*see appendix for details

Page 14: WriteHear

14

SoftwareSpellcheck database contains over 58,000 words

Page 15: WriteHear

15

Contact Microphone

Thin Writing Surface

(2.5x2.5x1/8)”

Acoustically Isolating Base(1/4” thick)

Hardware

Page 16: WriteHear

16

Testing and Analysis

Page 17: WriteHear

17

Type of Mic.Dynamic ContactVs.

• Electromagnetic induction• Senses vibrations through

air and solid objects• Susceptible to noise

• Piezoelectric• Senses vibrations through

solid objects only• Significantly less effected by

noise

90 dB Noise Test 90 dB Noise Test

Page 18: WriteHear

18

Distance From Mic.1 inch 48 inchesVs.

• Amplitude = 0.01-0.03 • Unaffected by distance

• Amplitude = 0.01-0.03 • Unaffected by distance

Page 19: WriteHear

19

Surface Size15 in2 720 in2Vs.

• Time to decay to noise level = 300ms

• Less distinct features in writing• Lower amplitude• More susceptible to noise

Tap Test Tap Test

• Time to decay to noise level = 15ms

• More distinct features in writing• Higher amplitude

• Less susceptible to noise

Page 20: WriteHear

20

Surface MaterialPlastic MetalVs.

• Negligible effect on amplitude

• Negligible effect on amplitude

Page 21: WriteHear

21

Surface Thickness1/8” Thick 1/2” ThickVs.

• Amplitude = 0.005-0.01(1/8th the amplitude)

• Amplitude = 0.04-0.05

Page 22: WriteHear

22

Writing UtensilPen PencilVs.

• Amplitude = 0.04-0.05 • Amplitude = 0.04-0.05

Page 23: WriteHear

23

Accuracy

A-Z a-z 0-9 Spellcheck (A-Z) Signature

87.5%

Based on:8 Individuals832 samples

77.9%

Based on:2 Individuals208 samples

91.7%

Based on:3 Individuals120 samples

85.9%

Based on:3 Individuals

78 words

Type I Error:

4.8%Type II Error:

6.3%Based on:

3 Individuals53 samples

• Similar sound profiles for different characters lowers accuracy

• Character accuracy calculated with a template database of 3 samples per character. A similar method was used by SRL at Texas A&M

Page 24: WriteHear

24

Conclusions• WriteHear is a robust and versatile software that

can learn and understand different people’s handwriting

• A Contact microphone setup greatly reduces background noise allowing WriteHear to work fine with 90dB (loud traffic)

• WriteHear works with various materials and writing utensils

• WriteHear can catch a forged signature with over 93% accuracy

• WriteHear recognized the upper case letters (A-Z) with 87.5% accuracy, matching the 86.8% accuracy achieved by the Sketch Recognition Lab at Texas A&M

Page 25: WriteHear

25

Future Work• Continue to improve and optimize software

• Test more conditions such as how a persons handwriting varies over time

• Create a user independent system

• Create a more fluid system where the user can write naturally without having to pause between characters

Page 26: WriteHear

26

References• Li, W., & Hammond, T. A. (2011, April).

Recognizing Text Through Sound Alone. In AAAI.

Page 27: WriteHear

27

Appendix

0.83

0.84

0.85

0.86

0.87

0.88

0 2 4 6 8 10

Acc

ura

cy

Sakoe-Chiba Bandwidth (%)

Sakoe-Chiba Bandwidth Optimization

0.84

0.845

0.85

0.855

0.86

0.865

0.87

0.875

0.88

0 0.2 0.4 0.6 0.8 1A

ccu

racy

Signal Exponent

Signal Exponent Optimization

• Optimal bandwidth for dynamic time warping is 8%

• Optimal exponent on the signal is 0.5