a prototype personal dictation system adam janin [email protected]
Post on 15-Jan-2016
223 views
TRANSCRIPT
A Prototype Personal Dictation System
Adam [email protected]
Final Goal – A Portable Meeting Recorder
Record impromptu meetings in a natural environment.
Detect multiple speakers.Allow correction and annotation.Support indexing and searching.Self-contained (using IRAM).
Intermediate Goal – A Personal Dictation System
Record a single user dictating text.Allow correction and editing.Hosted system:
ASR runs on workstation. GUI runs on Pilot. Communicate via wired network. Close-talking mic. Limited domain (Broadcast News).
Asides...
Why not Wizard of Oz? Structure of correction mechanism is
recognizer specific. Develop infrastructure. Produce a working demo.
Informal user study, mostly with speech researchers.
Architecture
Palm Pilot
Correct transcripts
Edit transcripts
Create new text
Sun Workstation
Audio frontend
Speech recognizer
Correction server
Correcting and Editing
Correcting – informing the recognizer that it has made an error. If recognizer has a good idea of alternatives,
it may be faster to correct than to edit. Recognizer can adapt to user and
vocabulary.
Editing – changing the output. “That’s not what I meant to say”. Text vs. speech input.
Correction Methods: Background
Lattice contains recognizer’s best guesses.
More compact than N-best lists.
Contains word order and timing.
1). the records …2). a rack ...3). the wreck or …4). a record ...
Correction Methods: Selecting Hypotheses
User corrects “records”.
1). the records …2). a rack ...3). the wreck or …4). a record ...
System picks all words that overlap in time.
Presents in order from most likely to least.
Note: full overlap is probably not optimal.
Correction Methods: Rescoring
User corrects “records” to “record”.
1). the records …2). a rack ...3). the wreck or …4). a record ...
Unexpected changes!
Select only paths with “record”.
Rescore lattice.
Editing
Allows user to add or edit text arbitrarily.
Must synchronize with correction server.
Edit vs. Correct is currently implemented modally with push buttons on-screen.
Gestural interface for correcting and editing would be preferable.
Details...
Correction allows for words not in lattice.
Tap to correct worked better than press-and-hold.
System updates text when user pauses.
Doesn’t handle punctuation, paragraphs, etc.
Correction is fast, but dictation is slow.
Future Work
“Real” user studies.Experiment more with correction
mechanisms.Implement editing synchronization.Implement gestures.Move to wireless network and mic.Add punctuation, paragraphs, etc.