cs 621 artificial intelligence lecture 15 - 06/09/05 prof. pushpak bhattacharyya

23
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya

Upload: aaralyn

Post on 25-Jan-2016

36 views

Category:

Documents


4 download

DESCRIPTION

CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya. Application of Noisy Channel, Channel Entropy. Noisy Channel. S. R. S = {s 1 , s 2 … s q } R = {t 1 , t 2 … t q }. SPEECH RECOGNITION ( ASR – Automatic SR) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 1

Application of Noisy Channel, Channel Entropy

CS 621 Artificial Intelligence

Lecture 15 - 06/09/05

Prof. Pushpak Bhattacharyya

Page 2: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 2

S = {s1 , s2 … sq} R = {t1 , t2 … tq}

Noisy Channel

S R

SPEECH RECOGNITION

( ASR – Automatic SR)

- Signal processing (low level).

- Cognitive Processing (higher level categories).

Page 3: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 3

Noisy Channel Metaphor

Due to Jelinek (IBM) – 1970’s

Main field of study – speech.

Problem Definition

S = {Speech signals}

= {s1 , s2 … ss}

R = {w1 , w2 … wr}

{s1 , s2 … sp} {w1 , w2 … wq}

Page 4: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 4

Special and Easier case

Isolated word Recognition (IWR)

Complexity due to ‘Word Boundary’ will not arise.

Example : I got a plate

vs

I got up late

Page 5: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 5

Homophones: Words have same pronunciation.

Example: bear, beer :

Homographs: Words have same spellings but different meaning

Example: bank; River bank and finance bank

Homophones And Homographs

Page 6: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 6

World of sounds – speech signals

Phonetics Phonology

World of words Orthography

letters : Consonants

Vowels

World Of Sounds

Page 7: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 7

Since alphabet to sound mapping is not one to one

Vowels

Tomato

Tomaeto Tomaato

Page 8: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 8

Sound VariationsLexical variations

‘because’

‘cause because

Allophonic variations

‘because’

because becase

Page 9: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 9

Allophonic variations: More remarkable example

Do [ δ][U]

Go [G][0]

Page 10: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 10

Socio-cultural variationssomething

something somethin

formal informal

Dialectic variation Very – bheri in Bengal

apple – ieple in south eple in north aapel in bengal

Page 11: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 11

Orthography -- Phonology

complex problem

Very difficult to model using ‘Rule Governed’ system.

Page 12: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 12

Probabilistic ApproachW* = Best estimate for a word given S

N C

S W*

W* = ARGMAX [ P(w|s) ]

w belongs to set of words

Page 13: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 13

P(w|s) called the ‘parameter’ of the system.

Estimation Training

The probability values need to be estimated from

“SPEECH CORPORA”.

Record speech of many speakers.

Page 14: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 14

Look of Speech Corpora

Annotation – Unique pronunciation.

Signal

Apple

Page 15: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 15

Repository of Standard Sound Symbols

IPA – International Phonetic Association.

ARPABET – American’s Phonetic STD.

Page 16: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 16

t

Augment the Roman Alphabet with Greek symbols

e [Є] ‘ebb’

[i] ‘need’

top [ t] IPA

tool [θ] IPA

Page 17: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 17

Speech corpora are annotated with IPA/ARPABET symbols.

Indian Scenario

Hindi TIFR

Marathi IITB

Tamil IITM

Page 18: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 18

How to Estimate P(w|s) from speech corpora

count(w,s)/ count(s)Not done this way

Page 19: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 19

Apply Bayes Theorem

P(w|s) = P(w). P(s|w) / P(s)

W* = ARGMAX (P(w). P(s|w)) / P(s)

Page 20: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 20

W* =ARGMAX (P(w). P(s|w))

w belongs to Words

P(w) = Prior = Language model.

P(s|w) = Likelihood of W being pronounced as ‘s’.

= Acoustic Model.

Page 21: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 21

Acoustic Model

Pronunciation dictionary (Finite State Automata).

Manually Built - Costly Resource.

Example

s 1 2 3

4

56 0

t 0 maa t

ae0

Page 22: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 22

W* obtained from P(w) and P(w|s)

Language model ?

Rel. frequency of w in the corpora

Ref freq Ξ unigram model

P(knee) > P(need)

I _ _ _ _ _

Knee High probability

need Low probability

Page 23: CS 621 Artificial Intelligence Lecture 15 -  06/09/05 Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 23

Language Modelling by

N-grams

N – grams

N:

2 – bigrams.

3 – trigrams (Best empirically for English).