Transcript
Page 1: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

Carlos Asmat – David López Sanzò – Kanwen Wu

Speech RecognitionUsing FPGA Technology

ByCarlos Asmat 260148251David López Sansò 260146414Kanwen Wu 260045745

Presentation Date: Wednesday, June 6, 2007

Project Supervisor: Prof. Miguel Marin

Project Coordinator: Prof. Kenneth L. Fraser

Page 2: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

2Carlos Asmat – David López Sanzò – Kanwen Wu

Outline

1) Introduction

2) MATLAB™ Demonstration

3) Hardware Implementation

4) Hardware Demonstration

5) Final remarks

Page 3: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

3Carlos Asmat – David López Sanzò – Kanwen Wu

What is speech recognition?● Convert analog sound into binary digits.

● Compare with the pre-stored word.

● Not to confuse with speaker recognition.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 4: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

4Carlos Asmat – David López Sanzò – Kanwen Wu

Speech Recognition Performance

● Priority: Accuracy and Reliability.

● Consumer products.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 5: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

5Carlos Asmat – David López Sanzò – Kanwen Wu

Objectives● Hardware implementation of a simple speech recognition

system.

● Single word identification.

● Cost efficiency, reliability, and simplicity are the major consideration.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 6: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

6Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● The sound identification is based on its frequency content.

● Two steps:

➔ Training

➔ Recognition

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 7: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

7Carlos Asmat – David López Sanzò – Kanwen Wu

Background theory● A MATLAB™ implementation was devised to assess the

project feasibility.

● Two files were produced:

➔ train.m

➔ recogniz.m

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 8: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

8Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● Training:

➔ Input several versions of a sound.

➔ Translate them to the frequency domain by using the FFT.

➔ Average their amplitude in the frequency domain.

● This produces the sound's fingerprint.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 9: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

9Carlos Asmat – David López Sanzò – Kanwen Wu

● Note on the FFT:

➔ Only half of it is used.

➔ Five 1024-points FFTs are performed per sound sample.

Background Theory

X k=∑n=0

N−1

xn e−2 i

Nnk

k=0,... , N−1

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 10: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

10Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● User inputs .wav files.

● Decimate and quantize the input sound files.

● Sound acquisition parameters:

➔ Sound samples are quantized down to 8 bits.

➔ The sampling frequency is 5 kHz.

➔ Around one second (1.024s) of sound is stored.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 11: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

11Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● Sound detection:

➔ Compute the average of a window.

➔ Compare it to the average of the next window.

➔ If the difference is significant then the sound is assumed to start at that point.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 12: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

12Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory

w1=w2=1024 samples=0.2048s

L=5120 samples=1.024s

● Sound detection (cont'd):

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 13: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

13Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● Store detected sound stream into a vector.

● Apply FFT to the above vector's first 1024 points and put it in 's'.

● Store 's' as the first row in the matrix 'x' and repeat with the following 1024 points until there are five rows in 'x'.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 14: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

14Carlos Asmat – David López Sanzò – Kanwen Wu

Background Theory● Sound recognition:

➔ Compute the fingerprint of a sound.

➔ Compute the distance between the sound's fingerprint and the reference fingerprint

➔ If both are close enough, then the sound is assumed to match the reference sound.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 15: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

15Carlos Asmat – David López Sanzò – Kanwen Wu

D=∑i=0

1024

ai−bi 2

Background Theory● Note on the distance computation:

➔ The sounds fingerprint and the reference fingerprint are considered as 1024-dimensional vectors.

➔ The distance between them is computed using the euclidean distance formula:

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 16: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

16Carlos Asmat – David López Sanzò – Kanwen Wu

System Overview

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 17: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

17Carlos Asmat – David López Sanzò – Kanwen Wu

Hardware Implementation● Design approach

● A/D Conversion

● Word detector

● FFT

● Memory Management

● Distance Computation

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 18: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

18Carlos Asmat – David López Sanzò – Kanwen Wu

Design Approach● Quartus II

➔ VHDL process blocks

➔ Computer-Aided Design

● Datapath/Overall Controller

● Intermediate controllers

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 19: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

19Carlos Asmat – David López Sanzò – Kanwen Wu

A/D Conversion

Introduction ● Hardware Implementation ● Demo ● Final Remarks Source: http://www.societyofrobots.com/images/analogdigital.jpg

Page 20: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

20Carlos Asmat – David López Sanzò – Kanwen Wu

A/D – Overall Configuration

Introduction ● Hardware Implementation ● Demo ● Final Remarks

MCLK

BCLK

LRCLK

ADCDAT

WolfsonCODEC

FPGA

I2C Bus

MASTER SLAVE

Page 21: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

21Carlos Asmat – David López Sanzò – Kanwen Wu

A/D Conversion

● Internal signals set by bus.

➔ De-mute.

➔ Boost mic.

➔ Change path.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

MUTE

MUX A/D D/ADigital Filters

LINEIN

MICIN

MUTEMIC INSEL ADCDAT

LINEOUT

Page 22: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

22Carlos Asmat – David López Sanzò – Kanwen Wu

I2C Bus

● RADDR → Base address = 0011010

● R/W → Read/Write = 0

● B[15-9] → Control Address = 0000100

● B[8-0] → Control Data = 000001101

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Source: Wolfson WM8731 data sheets, p.43

Page 23: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

23Carlos Asmat – David López Sanzò – Kanwen Wu

● B[8-0] → Control Data = 000001101

I2C Bus

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Source: Wolfson WM8731 data sheets, p.43

'INSEL'

'MUTE MIC''MIC BOOST'

Page 24: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

24Carlos Asmat – David López Sanzò – Kanwen Wu

I2C Bus – ACK Signal● ACK signal goes from the Wolfson to the FPGA

➔ Opposite direction from rest of data

➔ Only one data line

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 25: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

25Carlos Asmat – David López Sanzò – Kanwen Wu

I2C Bus – ACK Signal● ACK signal goes from the Wolfson to the FPGA

➔ Opposite direction from rest of data

➔ Only one data line

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Solution...

Page 26: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

26Carlos Asmat – David López Sanzò – Kanwen Wu

I2C Bus – ACK Signal● ACK signal goes from the Wolfson to the FPGA

➔ Opposite direction from rest of data

➔ Only one data line

Introduction ● Hardware Implementation ● Demo ● Final Remarks

d a t a [ ]

e n a b l e d t

e n a b l e t r

r e s u l t [ ]t r i d a t a [ ]

L P M _ B U S T R I

i n s t

Solution...

Tri-state buffer!

Page 27: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

27Carlos Asmat – David López Sanzò – Kanwen Wu

A/D – ADCDAT Fetcher

● Clock module

● MSB available on 2nd rising BCLK edge

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Source: Wolfson WM8731 data sheets, p.34

Page 28: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

28Carlos Asmat – David López Sanzò – Kanwen Wu

Quantization● Codec output: two's complement

● Quantize 24 bits into 8.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Decimalnumber

Binary (2's comp.)

Quantizeddecimal

Quantizedbinary

(2's comp.)

3 011

2 0101 01

1 001

0 0000 00

-1 111

-2 110-1 11

-3 101

-4 100-2 10

Page 29: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

29Carlos Asmat – David López Sanzò – Kanwen Wu

Downsampler

● Implementation

➔ Flip-flop

➔ Counters (and FSM)

Introduction ● Hardware Implementation ● Demo ● Final Remarks

DownsamplerDATA_IN @ 48 kHz DATA_OUT@ 5 kHz

READY

Page 30: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

30Carlos Asmat – David López Sanzò – Kanwen Wu

Word Detector

● Detects sharp transitions.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Comparator

DATA_IN Average

Register 1

Register 2AbsoluteDifference

8

THRESHOLD9

9

SOUND_STARTS

Page 31: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

31Carlos Asmat – David López Sanzò – Kanwen Wu

Fast Fourier Transform● Altera IP MegaCore® 1024-points FFT module:

➔ Natural order streaming data input.

➔ Bit-reversed streaming data output.

➔ Low latency.

➔ Time Limited Version.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

c l k

r e s e t _ n

i n v e r s e

s i n k _ v a l i d

s i n k _ s o p

s i n k _ e o p

s i n k _ r e a l [ 7 . . 0 ]

s i n k _ i m a g [ 7 . . 0 ]

s i n k _ e r r o r [ 1 . . 0 ]

s o u r c e _ r e a d y

s i n k _ r e a d y

s o u r c e _ e r r o r [ 1 . . 0 ]

s o u r c e _ s o p

s o u r c e _ e o p

s o u r c e _ v a l i d

s o u r c e _ e x p [ 5 . . 0 ]

s o u r c e _ r e a l [ 7 . . 0 ]

s o u r c e _ i m a g [ 7 . . 0 ]

F F T

i n s t 1

Page 32: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

32Carlos Asmat – David López Sanzò – Kanwen Wu

Memory Management● Three memory modules:

➔ FALSH (4MB)

➔ SDRAM (8MB)

➔ SRAM (512 kB)

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 33: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

33Carlos Asmat – David López Sanzò – Kanwen Wu

Data I/O

Address

Chip Enable

Write Enable

Output Enable

High Byte Mask

Low Byte Mask

18

16SRAM Chip

Memory Management● 512 kB SRAM memory module

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 34: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

34Carlos Asmat – David López Sanzò – Kanwen Wu

218 blocks

16 bits

8 bits

0123

262 141262 142262 143262 144

0 1

2 3

4 5

6 7

524 280 524 281

524 282 524 283

524 284 524 285

524 287 524 288

Memory Management● Memory structure:

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 35: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

35Carlos Asmat – David López Sanzò – Kanwen Wu

Memory Management● Memory Controller:

DATA_OUT

ADDR

DATA_IN

MODE

ENABLE

19

88

Memory Controller

Add

ress

Chi

p E

nabl

e

Wri

te E

nabl

e

Out

put E

nabl

e

Hig

h B

yte

Mas

k

Low

Byt

e M

ask

18

Dat

a I/

O

16

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 36: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

36Carlos Asmat – David López Sanzò – Kanwen Wu

Memory Management● Batch Operations:

MemoryBatch Operator

START_ADDR

DATA_IN

MODE

DATA_READY

19

8

END_ADDR19

ENABLE

CLK

DATA_OUT8

MEM_MODE

MEM_ENABLE

ADDR19

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 37: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

37Carlos Asmat – David López Sanzò – Kanwen Wu

Distance Computation● The distance computation module:

Distance

A

RST

8

ENABLE

CLK

DISTANCE8

B8

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 38: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

38Carlos Asmat – David López Sanzò – Kanwen Wu

Distance Computation● The distance computation module (cont'd):

Introduction ● Hardware Implementation ● Demo ● Final Remarks

SquareDifferenceA

8

B8

Accumulator

RST

CLK

DISTANCESquareRoot

Page 39: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

39Carlos Asmat – David López Sanzò – Kanwen Wu

Demonstration

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Sound Detection

I2C Done Signal

Threshold Settings

Assign Threshold

Send I2C Configuration

Current Average

Original image source: http://users.ece.gatech.edu/~hamblen/DE2/DE2.jpg

Page 40: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

40Carlos Asmat – David López Sanzò – Kanwen Wu

Final Remarks● Deficiencies.

● Strengths.

● Potential Improvements.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 41: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

41Carlos Asmat – David López Sanzò – Kanwen Wu

Deficiencies● Lack of accuracy.

● Lack of observability.

● Requires complex hardware

➔ FFT (Nios II)

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 42: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

42Carlos Asmat – David López Sanzò – Kanwen Wu

Strengths● Fast.

● Trainable.

● The system is not limited to speech.

Introduction ● Hardware Implementation ● Demo ● Final Remarks

Page 43: Speech Reognition Using FPGA Technology

101010101010101010101010111100101011001011110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101101010111010010101001010111100101011001011110101010101010101001010100101010100110111010101010001110101110101010001010111011000101101011000110100101010100110111010100101110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000110101010101010101001010100101010100110111010100101001010001010101010101101010101011010101010001110101110101010001010111011000101101011000110100101010100110111010100101001010001010101010101

43Carlos Asmat – David López Sanzò – Kanwen Wu

Potential Improvements● Recognize several words

● Improve accuracy

● Variable length word

● Recognize sentences

➔ Requires hidden Markov model (HMM) (Very complex!)

Introduction ● Hardware Implementation ● Demo ● Final Remarks


Top Related