usability of continuous speech recognition programs

40
Usability of Continuous Speech Recognition Programs Hsin Eu Committee: Alan Hedge, Ph.D. Geri Gay, Ph.D. Design and Environmental Analysis Cornell University

Upload: coral

Post on 19-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Usability of Continuous Speech Recognition Programs. Hsin Eu Committee: Alan Hedge, Ph.D. Geri Gay, Ph.D. Design and Environmental Analysis Cornell University. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Usability of Continuous Speech Recognition Programs

Usability of ContinuousSpeech Recognition Programs

Hsin EuCommittee: Alan Hedge, Ph.D.

Geri Gay, Ph.D. Design and Environmental Analysis

Cornell University

Page 2: Usability of Continuous Speech Recognition Programs

2

Overview

Continuous speech recognition programs

were brought to market at the end of

1997, with claims that they were capable

of recognizing users’ continuous speech

and translating this into text processing

software accurately.

Page 3: Usability of Continuous Speech Recognition Programs

3

Research Goal

The research goal was to determine the

critical factors that affect the usability of

speech recognition programs in order to

generate universal guidelines for the

future design of continuous speech

recognition software.

Page 4: Usability of Continuous Speech Recognition Programs

4

Literature Review

1. Speech Recognition Technology

• Terminology

• History of Speech Recognition

• Components of Speech Recognition

• Factors Influence the Performance of Speech Recognition

Page 5: Usability of Continuous Speech Recognition Programs

5

Literature Review (Cont.)

2. Using Speech Recognition

• Strengths and Limitations

• Applications of Speech Recognition

Page 6: Usability of Continuous Speech Recognition Programs

6

Literature Review (Cont.)

3. Current Speech Recognition Software

• Setup, Training, and Dictation

• Features of Current Speech Recognition Programs

• Product Performance

Page 7: Usability of Continuous Speech Recognition Programs

7

Literature Review (Cont.)

4. Human Computer Interaction in Speech Recognition

• The Interaction between Users and Recognition Programs

• Program and Human Errors

• User Characteristics and Task Performance

Page 8: Usability of Continuous Speech Recognition Programs

8

Literature Review (Cont.)

Human Computer Interaction in Speech Recognition (cont.) • Guidelines for the Interface Design

(excerpted from McLeod, 1988)

- Procedures for developing and implementing an application to meet the needs of the users, including vocabulary design, feedback and error recovery strategies and training techniques.

Page 9: Usability of Continuous Speech Recognition Programs

9

Literature Review (Cont.)

• Guidelines for the Interface Design (excerpted from McLeod, 1988)

-Procedures for identifying and controlling sources of inter- and intra- person variability.

-Consideration of the implications of the technology on the organization of working groups.

-Techniques for assessing the usability of a recognition system, including overall task performance, physical and mental workload and users subjective responses.

Page 10: Usability of Continuous Speech Recognition Programs

10

Research I: Web Survey

Page 11: Usability of Continuous Speech Recognition Programs

11

Research I: Web Survey (Cont.)

I-1. Methods

• Subjects: 351 respondents (including 143 CSRP-users)

Gender CSRP- User Non- User TotalFemale 34 (23.9%) 88 (42.9%) 122 (35.2%)Male 108 (76.1%) 117 (57.1%) 225 (64.8%)Total 142 (100.0%) 205 (100.0%) 347 (100.0%)Age CSRP- User Non- User Total18-25 17 (12.0%) 59 (28.6%) 76 (21.9%)26-50 95 (66.9%) 116 (56.3%) 211 (60.6%)51 Plus 30 (21.1%) 31 (15.1%) 61 (17.5%)Total 142 (100.0%) 206 (100.0%) 348 (100.0%)

Page 12: Usability of Continuous Speech Recognition Programs

12

Research I: Web Survey (Cont.)

• Survey Instrument

Section A: General Computer Use13 questions/ 45 items,

completed by all respondents (approx. 3-5 minutes)

Section B: Usability of CSRP31 questions / 201 items,

completed by CSRP-users (approx. 15-20 minutes) • Procedure

Page 13: Usability of Continuous Speech Recognition Programs

13

Research I: Web Survey (Cont.)

I-2. Results and Discussion on Findings• General Computer Use

General Computer Use Subject Count(%)

Valid N

How long have you been using a computer?Less than 1 year 2 (0.6%)1-3 years 18 (5.1%) 347More than 3 years 327 (93.2%)

How many days a week do you use acomputer?

1-3 days 4 (1.2%)4-5 days 51 (14.6%) 3466-7 days 291 (82.9%)

Page 14: Usability of Continuous Speech Recognition Programs

14

Research I: Web Survey (Cont.)

• General Computer Use (Cont.)

The time a day computer use occurs in eachplace

Mean (hour) SD(hour)

Office 3.79 2.84School 0.66 1.62Home 2.16 1.96Other 0.25 1.16

Total time a day on average computer useoccurs

Users (%) Valid N

1-3 hours 48 (13.7%) 3514-6 hours 99 (28.2%)7-9 hours 165 (47%)More than 10 hours 39 (11.1%)

Page 15: Usability of Continuous Speech Recognition Programs

15

Research I: Web Survey (Cont.)

Tasks CSRP-users Non-users Significance Allrespondents

Composingdocuments

65.03 % 36.76 % T=8.233,df=332,p=.000

48.31 %

Database input 45.10 % 27.79 % T=4.780,df=286,p=.000

34.84 %

Computerimagemanipulation

18.87 % 11.59 % T=2.986,df=271,p=.003

14.54 %

Searchinginformation

48.53 % 30.82 % T=4.881,df=349,p=.000

38.03 %

Browsinginformation

49.30 % 30.29 % T=5.174,df=349,p=.000

38.03 %

Page 16: Usability of Continuous Speech Recognition Programs

16

Research I: Web Survey (Cont.)

• Usability of CSRP

Dragon NaturallySpeaking IBM Via Voice L&H VoiceXpress(Kurzweil)

⟨ Personal 2.0 with Corel ⟨ Executive ⟨ Standard⟨ Preferred 2.0 ⟨ Gold ⟨ Advanced⟨ Preferred 3.0 ⟨ Office ⟨ Professional⟨ Standard 3.0 ⟨ Home ⟨ for Medicine, General

Medicine Edition⟨ Professional ⟨ Topic ⟨ for Medicine, Specialty

Edition⟨ for Teens ⟨ IBM

MedSpeak/ Radiology⟨ Legal Suite ⟨ Professional/ Specialty

Vocabularies⟨ Medical Suite⟨ Developer Suite

Page 17: Usability of Continuous Speech Recognition Programs

17

Research I: Web Survey (Cont.)

• Usability of CSRP

CSRP Use Users (%) Valid N

How long have/ had you used your CSRP?1-6 months 64 (44.8%)7-11 months 27 (18.9%) 1411-2 years 38 (26.6%)More than 2 years 12 (8.4%)

How many days a week do/ did you useyour CSRP?

1-2 days 42 (29.4%)3-5 days 28 (19.6%) 1366-7 days 66 (46.2%)

Page 18: Usability of Continuous Speech Recognition Programs

18

Research I: Web Survey (Cont.)

• Usability of CSRP (Cont.)

The time a day CSRP use occurs in eachplace

Mean (hr) SD (hr)

Office 1.50 1.76School 0.14 0.64Home 0.92 1.40Other 0.20 0.91

Total time a day on average CSRP useoccurs

Users (%) Valid N

1-3 hours 93 (65.1%)4-6 hours 28 (19.6%) 1437-9 hours 10(7.0%)More than 10 hours 1 (0.7%)

Page 19: Usability of Continuous Speech Recognition Programs

19

Research I: Web Survey (Cont.)

• Usability of CSRP (Cont.)

Dictation (speech to text) Average Score (SD)

2.76 (1.05)

2.92 (1.05)

2.79 (1.06)

2.82 (1.03)

Emails

Letters

Notes

Reports or papers

Slides 2.08 (1.24)

Navigation (control)

Within a specific program 2.45 (0.96)

Between programs 2.12 (1.05)

Editing/ correcting documents 2.42 (1.01)

Read back (text to speech)

Check content of documents 2.71 (1.13)

Review my works 2.72 (1.03)

Page 20: Usability of Continuous Speech Recognition Programs

20

Research I: Web Survey (Cont.)

• Usability of CSRP (Cont.)

CSRP Characteristics Average Score (SD)

Varieties of functions 2.92 (0.74)

Product accuracy 3.81 (0.63)

Vocabulary capacity 3.39 (0.70)

Dictation speed 3.42 (0.76)

Ability to expand vocabulary 3.53 (0.73)

Easy to use 3.44 (0.77)

Price 2.48 (0.86)

Compatibility with other software 3.30 (0.81)

User (technical) support 2.81 (0.86)

Program upgrade support 2.99 (0.86)

Page 21: Usability of Continuous Speech Recognition Programs

21

Research I: Web Survey (Cont.)

• Usability of CSRP (Cont.)

Aspects Average Score (SD)

Varieties of functions 2.66 (0.86)

Product accuracy 2.86 (1.03)

Vocabulary capacity 3.32 (0.84)

Dictation speed 2.92 (0.95)

Ability to expand vocabulary 3.33 (0.81)

Easy to use 2.99 (0.92)

Price 2.59 (1.01)

Compatibility with other software 2.76 (0.90)

User (technical) support 2.44 (1.01)

Program upgrade support 2.72 (0.97)

Overall satisfaction 2.91 (0.99)

Page 22: Usability of Continuous Speech Recognition Programs

22

Research I: Web Survey (Cont.)

• Usability of CSRP (Cont.)

Most preferred (%) 2nd most preferred (%) Valid NComposingdocuments

Voice (51.0%) Voice & Keyboard(25.9%)

137

Correctingmistakes indocuments

Keyboard (24.5%) Voice (23.1%) 138

Editingdocuments

Voice (23.8%) Keyboard (23.1%) 136

Database input Keyboard (42.7%) Voice (23.1%) 123

Page 23: Usability of Continuous Speech Recognition Programs

23

Research I: Web Survey (Cont.)

• Usability of CSRP (Cont.)

Most preferred (%) 2nd most preferred (%) Valid NComputerimagemanipulation

Mouse (30.1%) Keyboard & Mouse(19.6%)

113

Searching &browsing

Keyboard (25.2%) Mouse (16.8%) 132

Navigatingwithin aprogram

Voice (25.2%) Mouse (21.0%) 135

Navigatingbetweenprograms

Voice (21.7%) Mouse (21.0%) 133

Page 24: Usability of Continuous Speech Recognition Programs

24

Research I: Web Survey (Cont.)

• Usability of CSRP (Cont.)

DNS- Preferred3.0

DNS-Professional

Significance

Composingdocuments

59.35% 74.24% T=-2.063, df= 62,P<0.05

Databaseinput

45.48% 46.67% Not significant

Computerimagemanipulation

13.87% 23.44% No significant

Searchinginformation

37.10% 58.18% T=-2.874, df=62,p<0.01

Browsinginformation

40.00% 58.18% T=-2.203, df=62,p<0.05

Page 25: Usability of Continuous Speech Recognition Programs

25

Research I: Web Survey (Cont.)

I-3. Discussion• Limitations

- Survey distribution

• Future Research

- Survey length

- Survey format

- Qualitative information

Page 26: Usability of Continuous Speech Recognition Programs

26

Research II: Usability Testing

II-1. Methods• Subjects: 10 Cornell students

- 5 females and 5 males- 8 CSRP-novices and 2 CSRP-users- Age ranged 21-30

• Setting and Instruments- MVR computer lab- Dell Pentium II MMX PC/ Windows 98- Dragon NaturallySpeaking Preferred 3.0

Page 27: Usability of Continuous Speech Recognition Programs

27

Research II: Usability Testing (Cont.)

II-1. Methods (cont.)

• Procedure- Setup and training

Page 28: Usability of Continuous Speech Recognition Programs

28

Research II: Usability Testing (Cont.)

• Procedure (cont.) - Research design

Method of Transcription / Editing/ Readability of Document*Subj.#

Level ofExperienceon CSRP

Section 1 Section 2 Section 3

1 Novice Dictate/ Type/ Ec Type/ Type/ Ea Type/ Type/ D

2 Novice Type/ Type/ Ec Dictate/ Type/ Ea

3 Novice Dictate/ Type/ Ea Type/ Type/ Ec

4 Novice Type/ Type/ Ea Dictate/ Type/ D Dictate/ Type/ Ec

Page 29: Usability of Continuous Speech Recognition Programs

29

Research II: Usability Testing (Cont.)

- Research design (cont.)

Method of Transcription / Editing/ Readability of Document*Subj.#

Level ofExperienceon CSRP

Section 1 Section 2 Section 3

5 Some Dictate/ Type/ Ec Type/ Type/ Ea

6 Novice Type/ Type/ Ec Dictate/ Type/ Ea Dictate/ Type/ D

7 Novice Dictate/ Type/ Ea Type/ Type/ Ec

8 Novice Type/ Type/ Ea Dictate/ Type/ Ec

9 Much Dictate/ Voicing/ D Dictate/ Voicing/ Ea

10 Novice Dictate/ Type/ Ec Type/ Type/ D Type/ Type/ Ea

Page 30: Usability of Continuous Speech Recognition Programs

30

Research II: Usability Testing (Cont.)

II-1. Methods (cont.)

• Procedure (cont.)- Dependent variables

1. Transcription time2. Number of transcription

errors3. Editing time4. Total completion time

Page 31: Usability of Continuous Speech Recognition Programs

31

Research II: Usability Testing (Cont.)

II-2. Results and Discussion on Findings• Modality of Transcription

• Gender

Dictating Typing Significance

Transcription Time(sec/ word)

.526 (.105) 1.069 (2.60)t=-6.428, df=7,p=. 000

Number ofTranscription Errors(errors/ word)

.131 (.067) .041 (.022)t=-3.636, df=7,p=. 008

Type-Editing Time(sec/ word)

1.058 (.244) .577 (.198)t=-5.444, df=7,p=. 001

Total Completion Time(sec/ word)

1.584 (.209) 1.645 (.394) Not significant

Page 32: Usability of Continuous Speech Recognition Programs

32

Research II: Usability Testing (Cont.)

II-2. Results and Discussion on Findings (cont.)

• Modality of Editing

Modality of Editing, Easy Documents

Edit by Typing (N=8) Edit by Voicing (N=1)

Transcription Time(sec/ word)

0.526 (.105) .438

Number ofTranscription Errors(errors/ word)

.131 (.067) .128

Editing Time(sec/ word)

1.058 (.244) 1.203

Total Completion Time(sec/ word)

1.584 (.209) 1.641

Page 33: Usability of Continuous Speech Recognition Programs

33

Research II: Usability Testing (Cont.)

II-2. Results and Discussion on Findings (cont.)

• Modality of Editing (cont.)

Modality of Editing, Difficult Documents

Edit by Typing (N=2) Edit by Voicing (N=1)

Transcription Time(sec/ word)

0.611 (.017) .549

Number ofTranscription Errors(errors/ word)

.195 (.084) .111

Editing Time(sec/ word)

1.633 (.554) 2.062

Total Completion Time(sec/ word)

2.244 (.536) 2.611

Page 34: Usability of Continuous Speech Recognition Programs

34

Research II: Usability Testing (Cont.)

II-2. Results and Discussion on Findings (cont.)

• Experience on CSRP/DNSTranscription by Dictating

Experience of CSRP/DNS

None (N=8) Some (N=1) Much (N=1)

Transcription Time(sec/ word)

.526 (.105) .480 .438

Number ofTranscription Errors(errors/ word)

.131 (.067) .037 .128

Type-Editing Time(sec/ word)

1.058 (.244) .471 N. A.

Total Completion Time(sec/ word)

1.584 (.209) .951 N. A.

Page 35: Usability of Continuous Speech Recognition Programs

35

Research II: Usability Testing (Cont.)

II-2. Results and Discussion on Findings (cont.)

• Experience on CSRP/DNS (cont.)Transcription by Typing

Experience of CSRP/DNS

None (N=8) Some (N=1)

Transcription Time(sec/ word)

1.069 (2.60) 1.191

Number ofTranscription Errors(errors/ word)

.041 (.022) .036

Type-Editing Time(sec/ word)

.577 (.198) .414

Total Completion Time(sec/ word)

1.645 (.394) 1.606

Page 36: Usability of Continuous Speech Recognition Programs

36

Research II: Usability Testing (Cont.)

II-2. Results and Discussion on Findings (cont.)

• Readability of ArticlesTranscription by Typing

Readability of Article (N=2)

Easy DifficultSignificance

Transcription Time(sec/ word)

1.211 (.383) 1.525 (.440) Not significant

Number ofTranscription Errors(errors/ word)

.030 (.031) .051 (.003) Not significant

Type-Editing Time(sec/ word)

.510 (.332) .943 (.272) Not significant

Total Completion Time(sec/ word)

1.721 (.716) 2.467 (.713)t= -397.339,df=1, p= .002

Page 37: Usability of Continuous Speech Recognition Programs

37

Research II: Usability Testing (Cont.)

II-3. Discussion • Compare Findings to Previous Research

Research Tested program Accuracy (%) Corrected words per minute

Present study DNS Preferred 3.0 < 95% 29.8

Karat et al., 1999 DNS Preferred 2.0 N. A. 25.1

Poor, 1998 DNS Preferred 2.0 < 95% N. A.

Linderholm, 1998 DNS Preferred 3.0 N. A. 43.0

Page 38: Usability of Continuous Speech Recognition Programs

38

Research II: Usability Testing (Cont.)

II-3. Discussion (cont.)

• Limitations - Sample size

• Future Research

- CSRP-users

- Testing time

- Article readability

- Human performance v.s. program performance

Page 39: Usability of Continuous Speech Recognition Programs

39

Conclusion

- Program accuracy- Program reliability- Requirement of user-dependent training - Requirement of memorization- Ease of error correction - Ability to learn from mistakes- Accommodation for people with disabilities- Hardware compatibility- Environmental noise level

• Critical Factors that affect CSRP usability

Page 40: Usability of Continuous Speech Recognition Programs

40

Conclusion (Cont.)

A continuous speech recognition program should- have high program accuracy- have high program reliability- eliminate the requirement of user-

dependent training - reduce the requirement of memorization- maximize the ease of error correction - have the ability to learn from mistakes- accommodate the needs of people with

disabilities- provide a wide range of hardware

compatibility- minimize the sensitivity to environmental

noise

• Guidelines for Future Design