voice browser

24
Presented by Soumya Shuchi(14300211047) Srirupa Das(14300211048) Subhajit Karmakar(14300211049) Subhendu Paul(14300211050) Sumadhura Biswas(14300211051) Suman Bose(14300211052) Sumit Kr.Singh(14300211053) IT Dept. GNIT 1

Upload: suman-bose

Post on 20-Aug-2015

3.653 views

Category:

Internet


7 download

TRANSCRIPT

Presented by

Soumya Shuchi(14300211047)

Srirupa Das(14300211048)

Subhajit Karmakar(14300211049)

Subhendu Paul(14300211050)

Sumadhura Biswas(14300211051)

Suman Bose(14300211052)

Sumit Kr.Singh(14300211053)

IT Dept. GNIT

1

TABLE OF CONTENTS

What is voice browser

Motivation

Difference between graphical browser and voice browser

Possible applications

W3C

VoiceXML

Speech Recognition

Call control

TTS

Voice style sheets

Conclusion

IT Dept. GNIT

2

WHAT IS A VOICE BROWSER?

A voice browser is a software application that

presents an interactive voice user interface to

the user in a manner analogous to the

functioning of a web browser.

Expanding access to the Web.

Will allow any telephone to be used to access

appropriately designed Web-based services.

IT Dept. GNIT

3

IT Dept. GNIT

4

WHAT IS A VOICE BROWSER?

Server-based , Voice portals

Interaction via keypads, spoken

commands, listening to prerecorded

speech, synthetic speech and music.

An advantage to people with visual

impairment.

Mobile Web

Use of the hands during browsing might prove

inconvenient or impossible. Voice input is a

natural solution for such ands-busy situations.

Even in standard browser applications, using

voice input is simply more fun than the

alternatives.

Browser replaces the mouse in most instances

to enable hands-free browsing.

IT Dept. GNIT

5

WHY A VOICE BROWSER?

WHY A VOICE BROWSER?

Voice input provides direct "see and say" access

to links, eliminating the wrist strain associated

with holding the mouse for often hours at a time.

IT Dept. GNIT

6

Easy to use - for people with no knowledge

or fear of computers.

Voice Browsers are the next generation of

call centers, which will become Voice Web

portals to the company's services and

related websites, whether accessed via the

telephone network or via the Internet.

IT Dept. GNIT

7

MOTIVATION

Graphical browsing is more passive due to

the persistence of the visual information .

Voice browsing is more active since the user

has to issue commands.

Graphical Browsers can be client-based,

whereas Voice Browsers should be server-

based.

IT Dept. GNIT

8

GRAPHICAL & VOICE BROWSING

POSSIBLE APPLICATIONS

Accessing business information:

The corporate "front desk" which asks callers who or

what they want.

Automated telephone ordering service .

Airline arrival and departure information.

Home banking services.

Accessing public information:

Community information such as weather, traffic

condition, school closures, directions and events.

IT Dept. GNIT

9

CONTD..

Local, national and international news.

National and international stock market

information.

Business and e-commerce transactions.

Accessing personal information:

Voice mail.

Calendars, address and telephone lists .

Personal horoscope.

Personal newsletter.

To-do lists, shopping lists, and calorie

counters.

IT Dept. GNIT

10

W3C

The World Wide Web Consortium (W3C) develops

interoperable technologies (specifications,

guidelines, software, and tools) to lead the Web to

its full potential as a forum for information,

commerce, communication, and collective

understanding.

11

IT Dept. GNIT

W3C Speech Interface Framework

VoiceXML

Speech Recognition :

1.Speech Grammars 2.Stochastic (N-Gram) Language

Models 3.Semantic Interpretation 4.Pronunciation Lexicon

Call control

VOICEXML

VoiceXML is a dialog markup language designed

for telephony applications, where users are

restricted to voice and DTMF (touch tone) input.

There are other languages: VoXML, omniviewXML

text.html

text.vxml

Web

Server Internet

Browse

r

IT Dept. GNIT

12

VOICEXML – ARCHITECTURE

SPEECH RECOGNITION

DTMF

Grammars

Speech

Grammars

Stochastic

Language

Models

Semantic

Interpretation

Touch Tone USER

Speech

IT Dept. GNIT

14

DTMF GRAMMARS

Touch tone input is often used as an

alternative to speech recognition.

Especially useful in noisy conditions or

when the social context makes it awkward

to speak.

The W3C DTMF grammar format allows

authors to specify the expected sequence

of digits, and to bind them to the

appropriate results.

IT Dept. GNIT

15

SPEECH GRAMMARS

Speech Grammars allow authors to specify

rules covering the sequences of words that

users are expected to say in particular

contexts.

These contexual clues allow the

recognition engine to focus on likely

utterances, improving the chances of a

correct match.

IT Dept. GNIT

16

STOCHASTIC (N-GRAM) LANGUAGE MODELS

Speech Grammars are unuseful in case of

open-enden prompt(how can i help u).

The solution is to use a stochastic

language model. Such models specify the

probability that one word occurs following

certain others. The probabilities are

computed from a collection of utterances

collected from many users.

IT Dept. GNIT

17

SEMANTIC INTERPRETATION

The recognition process matches an

utterance to a speech grammar, building a

parse tree as a byproduct.

There are two approaches to harvesting

semantic results from the parse tree:

1. Annotating grammar rules with

semantic interpretation tags.

2. Representing the result in XML.

IT Dept. GNIT

18

PRONUNCIATION LEXICON

o Application developers sometimes need to ability to tune speech engines, whether for synthesis or recognition.

o W3C is developing a markup language for an open portable specification of pronunciation information using a standard phonetic alphabet.

o The most commonly needed pronunciations are for proper nouns such as surnames or business names.

IT Dept. GNIT

19

CALL CONTROL

Fine-grained control of speech (signal

processing) resources and telephony

resources in a VoiceXML telephony

platform.

Will enable application developers to use

markup to perform call screening, whisper

call waiting, call transfer, and more.

Can be used to transfer a user from one

voice browser to another on a competely

different machine.

IT Dept. GNIT

20

TEXT TO SPEECH SYNTHESIS:

1. Pre-processing

2. Text normalization

i) digit normalization

ii) date normalization

iii) abbreviation normalization

3. Parts of speech annotation

4. Pronunciation lexicon

5. Letter to sound rules

6. Synthesis

IT Dept. GNIT

21

VOICE STYLE SHEETS!

Volume

Rate

Pitch

Direction

Spelling out text letter by

letter

Speech fonts (male/female,

adult/child etc.)

Inserted text before and after

element content

Sound effects and music

Authors want

control over how

the document is

rendered. Aural

style sheets

provide basis for

Controlling a

range of features

IT Dept. GNIT

22

CONCLUSION

If voice browsers are meant to replace

human operator dialog, they must be fast

in response.

Speech Recognition / Interpretation /

Synthesis depend on implementation

When a user requests a certain document,

several related documents can be

downloaded for easier access.

IT Dept. GNIT

23

REFERENCES

www.w3.org/standards/webofdevices/voice

www.pcworld.com/article/230305/google

www.hwg.org/opcenter/w3c/voicebrowsers.html

IT Dept. GNIT

24