A COMPARISON OF HAND-CRAFTED SEMANTIC GRAMMARS VERSUS STATISTICAL NATURAL LANGUAGE PARSING IN DOMAIN-
SPECIFIC VOICE TRANSCRIPTION Curry Guinn
Dave Crist
Haley Werth
Outline
Probabilistic language models» N-grams
The EPA project Experiments
Probabilistic Language Processing: What is it?
Assume a note is given to a bank teller, which the teller reads as I have a gub. (cf. Woody Allen)
NLP to the rescue ….» gub is not a word» gun, gum, Gus, and gull are words, but gun
has a higher probability in the context of a bank
Real Word Spelling Errors
They are leaving in about fifteen minuets to go to her house.
The study was conducted mainly be John Black. Hopefully, all with continue smoothly in my
absence. Can they lave him my messages? I need to notified the bank of…. He is trying to fine out.
Letter-based Language Models
Shannon’s Game Guess the next letter:
Letter-based Language Models
Shannon’s Game Guess the next letter: W
Letter-based Language Models
Shannon’s Game Guess the next letter: Wh
Shannon’s Game Guess the next letter: Wha
Letter-based Language Models
Shannon’s Game Guess the next letter: What
Letter-based Language Models
Shannon’s Game Guess the next letter: What d
Letter-based Language Models
Shannon’s Game Guess the next letter: What do
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter
is?
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter
is? Guess the next word:
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter
is? Guess the next word: What
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter
is? Guess the next word: What do
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter
is? Guess the next word: What do you
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter
is? Guess the next word: What do you think
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter
is? Guess the next word: What do you think the
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter
is? Guess the next word: What do you think the next
Letter-based Language Models
Shannon’s Game Guess the next letter: What do you think the next letter is? Guess the next word: What do you think the next word is?
Letter-based Language Models
Word-based Language Models
A model that enables one to compute the probability, or likelihood, of a sentence S, P(S).
Simple: Every word follows every other word w/ equal probability (0-gram)» Assume |V| is the size of the vocabulary V» Likelihood of sentence S of length n is = 1/|V| × 1/|
V| … × 1/|V| » If English has 100,000 words, probability of each
next word is 1/100000 = .00001
Word Prediction: Simple vs. Smart
•Smarter: probability of each next word is related to word frequency (unigram)
– Likelihood of sentence S = P(w1) × P(w2) × … × P(wn)
– Assumes probability of each word is independent of probabilities of other words.
•Even smarter: Look at probability given previous words (N-gram)
– Likelihood of sentence S = P(w1) × P(w2|w1) × … × P(wn|wn-1)
– Assumes probability of each word is dependent on probabilities of other words.
Training and Testing
Probabilities come from a training corpus, which is used to design the model.» Overly narrow corpus: probabilities don't
generalize» Overly general corpus: probabilities don't reflect
task or domain A separate test corpus is used to evaluate the model,
typically using standard metrics» Held out test set
Simple N-Grams
An N-gram model uses the previous N-1 words to predict the next one:
» P(wn | wn-N+1 wn-N+2… wn-1 ) unigrams: P(dog) bigrams: P(dog | big) trigrams: P(dog | the big) quadrigrams: P(dog | chasing the big)
The EPA task
Detailed diary of a single individual’s daily activity and location
Methods of collecting the data:» External Observer» Camera» Self-reporting
– Paper diary– Handheld menu-driven diary– Spoken diary
Spoken Diary
From an utterance like “I am in the kitchen cooking spaghetti”, map that utterance into» Activity(cooking)» Location(kitchen)
Text abstraction Technique
» Build a grammar » Example
Sample Semantic Grammar
ACTIVITY_LOCATION -> ACTIVITY' LOCATION' : CHAD(ACTIVITY',LOCATION') .ACTIVITY_LOCATION -> LOCATION' ACTIVITY' : CHAD(ACTIVITY',LOCATION') .ACTIVITY_LOCATION -> ACTIVITY' : CHAD(ACTIVITY', null) .ACTIVITY_LOCATION -> LOCATION' : CHAD(null,LOCATION') .LOCATION -> IAM LOCx' : LOCx' .LOCATION -> LOCx' : LOCx' .IAM -> IAM1 .IAM -> IAM1 just .IAM -> IAM1 going to .IAM -> IAM1 getting ready to .IAM -> IAM1 still .LOC2 -> HOUSE_LOC' : HOUSE_LOC' .LOC2 -> OUTSIDE_LOC' : OUTSIDE_LOC' .LOC2 -> WORK_LOC' : WORK_LOC' .LOC2 -> OTHER_LOC' : OTHER_LOC' .HOUSE_LOC -> kitchen : kitchen_code .HOUSE_LOC -> bedroom : bedroom_code .HOUSE_LOC -> living room : living_room_code .HOUSE_LOC -> house : house_code .HOUSE_LOC -> garage : garage_code .HOUSE_LOC -> home : house_code .HOUSE_LOC -> bathroom : bathroom_code .HOUSE_LOC -> den : den_code .HOUSE_LOC -> dining room : dining_room_code .HOUSE_LOC -> basement : basement_code .HOUSE_LOC -> attic : attic_code .OUTSIDE_LOC -> yard : yard_code .
Statistical Natural Language Parsing
Use unigram, bigram and trigram probabilities Use Bayes’ rule to obtain these probabilities: P(A|B) = P(B|A) *
P(A)/ P(B)
The formula P(“kitchen”|30121 Kitchen) is computed by determining the percentage of times the word “kitchen” appears in diary entries that have been transcribed in the category 30121 Kitchen.
P(30121 Kitchen) is the probability that a diary entry is of the semantic category 30121 Kitchen.
P(“kitchen”) is the probability that “kitchen” appears in any diary entry.
Bayes’ rule can be extended to take into account each word in the input string.
”)P(“kitchen
Kitchen) P(30121 * Kitchen) 30121 |”P(“kitchen
“kitchen”) |Kitchen P(30121
The Experiment
Digital Voice Recorder + Heart Rate Monitor» Heart rate monitor will beep if the rate
changes by more than 15 beats per minute between measurements (every 2 minutes)
SubjectsID Sex Occupation Age Education
1 FManages Internet
Company52 Some College
2 F Grocery Deli Worker 18 Some College
3 M Construction Worker 35 High School
4 F Database Coordinator 29 Graduate Degree
5 FCoordinator for Non-
profit56 Some College
6 M Unemployed 50 High School
7 M Retired 76 High School
8 M Disabled 62 High School
9 MEnvironmentTechnician
56 Graduate Degree
Recordings Per Day
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7
Day of Study
Rec
ord
ing
s P
er D
ay
Heart Rate Change Indicator Tones and Subject
Compliance
S Number of Tones Per Day (Avg.)
% of Times Subject Made a Diary Entry Corresponding to a Tone
1 22.1 45%
2 41.8 29%
3 32.5 36%
4 33.0 55%
5 33.3 36%
6 15.6 40%
7 32.5 37%
8 26.0 22%
9 22.7 31%
Per Word Speech Recognition
P Per Word Recognition Rate (%)
1 63
2 54
3 59
4 61
5 29
6 17
7 45
8 49
9 56
Semantic Grammar Location/Activity Encoding Precision and Recall
Word Rec. Rate
Location Activity
Precision Recall Precision Recall
1 63 93 70 84 57
2 54 91 61 81 55
3 59 94 69 92 60
4 61 86 72 95 62
5 29 66 15 75 16
6 17 55 13 51 14
7 45 70 50 70 48
8 49 71 55 79 54
9 56 85 70 84 66
Av. 48.1 79 52.7 79 48
Word Recognition Accuracy’s Effect on Semantic Grammar
Precision and Recall
0
20
40
60
80
100
0 20 40 60 80
Word Recognition Accuracy
Per
cen
tag
e
Location Precision Location Recall
Activity Precision Activity Recall
Statistical Processing Accuracy
Activity Accuracy
Location Accuracy
Hand-transcribed
86.7% 87.5%
Using speech Recognition
48.3% 49.0%
Word Recognition Affects Statistical Semantic Categorization
Rec. Rate %
Location Activity
Accuracy % Accuracy %
1 63 77 69
2 54 43 48
3 59 56 63
4 61 61 71
5 29 22 26
6 17 23 23
7 45 42 38
8 49 43 46
9 56 68 52
Av. 48.1 48 49
Per Word Recognition Rate Versus Statistical Semantic Encoding Accuracy
0
20
40
60
80
100
0 10 20 30 40 50 60 70
Per Word Recognition Rate
Per
cen
tag
e
Activity Precision Location Precision
Time, Activity, Location, Exertion Data Gathering
Platform
Voice In
Speech Out
WirelessInterface
USBiBean
WirelessReceiver
Exertion (HR)
Motion (Accel)
Act/Loc (Picture)
Location (GPS)
Location (RFID)
Voice & Sound(Earpiece)
FormsActivi ty, Location,
Diet, Products
Voice In
Speech Out
WirelessInterface
USBiBean
WirelessReceiver
Exertion (HR)
Motion (Accel)
Act/Loc (Picture)
Location (GPS)
Location (RFID)
Voice & Sound(Earpiece)
FormsActivi ty, Location,
Diet, Products
Research Topics
Currently, guesses for the current activity and location are computed independently of each other» They are not independent!
Currently, guesses are based on the current utterance.» However, the current activity/location is not
independent from previous activity/locations. How do we fuse data from other sources
(gps, beacons, heart rate monitor, etc.)?