improving upon semantic classification of spoken diary entries using pragmatic context information...
TRANSCRIPT
Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic
Context Information
Daniel J. Rayburn Reeves Curry I. Guinn
University of North Carolina Wilmington
Overview
Introduction Problem definition Hypotheses
– Hypothesis 1: Using Context– Hypothesis 2: Using Thresholds
Limitations and future Work
EPA Chemical Exposure Study
Create models of exposure to various chemicals
Activity/Location/Time/Energy expenditure database
Requires data
Database
Necessary data from study:– Date/Time– Location– Activity
Activity and location representation: CHAD– Consolidated Human Activity Database – Designed by EPA– Single representation for location and activity
Background on Data collection
Recall Data
Real-time Paper Diaries
Direct Observation
Digital voice diaries
Sony Voice Recorder
Subject recorded daily locations/activities
1220 utterances
Transcribed and classified
Database Sample
Time Recorded Utterance CHAD Location CHAD Activity
8:57 AM in the bedroom starting housework 30125 - Bedroom 11200 - Indoor chores
8:59 AM carrying clothes to the laundry room30128 - Utility room /
Laundry room11410 - Wash clothes
9:00 AM the bedroom getting more clothes 30125 - Bedroom 11410 - Wash clothes
9:05 AM loading the washing machine in the laundry room30128 - Utility room /
Laundry room11410 - Wash clothes
9:06 AMsitting down going to watch twenty minutes of
Regis30122 - Living room /
family room17223 - Watch TV
9:23 AMI'm going to be brushing the dog in the family
room30122 - Living room /
family room11800 - Care for
pets/animals
Problem Definition
Difficulties in human encoding:– Error prone– Inefficient– Expensive
Computer classification assistance Possible Solution:
– statistical language processing to perform text abstraction
Solution Strategies – Word-only system
Word n-grams at utterance level to identify the most likely semantic categories– Probabilistic relationship between words
N-grams
Diary entry substrings Word relationships These relationships used in word-only n-
gram model Example: “I am walking to the store”
– Trigram: “I am walking”– Bigram: “am walking”
Leave one out testing
Problems with single data set– Database small size– Single test/training set bias– More data sets with better diversity
Leave-one-out testing– 1 test set = 1 day of recordings from 1 subject– 42 training/testing sets in all
Word-only system results
Leave-one out test sets
– Location: 65.5% correct– Activity: 55.3% correct
Hypothesis 1
Word + context system– Performing statistical NLP text abstraction using
multi-diary entry contextual information will improve the disambiguation of human speech diary entries over the word-only n-gram model applied to single diary entries in the word-only study.
Reasoning for using context
Information human used when encoding Relationship between activities and locations
– Relationship between current location and current activity
– Relationship between current location and previous location
Previous context information
Past context information helps disambiguate Diary Entry: “in the office at the computer”
– Correct Location: Study or Home Office– Previous Location: Living room / family room– Top 3 Location Word-only Choices (w/ probability)
0.904 - Office building/bank/post office 0.217 - Public building/library/museum /theater 0.053 - Public garage / parking
6 context relationships
Current location given:– Current activity– Previous activity– Previous location
Current activity given– Current location– Previous location– Previous activity
Context incorporation
How much do we weight the words in the utterance versus the context information?
We assumed a linear combination of weights
We applied a brute force search of coefficients to achieve the optimal results
Average activity results
Word-only– 55.3%
Word+context– 66.1%
% improvement– 19.5%
Weights– Word-only: 0.354– Previous Location: 0.177– Current Activity: 0.201– Previous Activity: 0.268
0.7598360.654918
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
word-only word & context
Average location results
Word-only– 65.5%
Word+context– 76.0%
% improvement– 16.0%
Weights– Word-only: 0.294– Previous Location: 0.146– Current Activity: 0.286– Previous Activity: 0.274
0.5532790.660656
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
word-only word & context
Hypothesis 2: Thresholds
Threshold System:– “Thresholds can be found experimentally in the data to
balance trade-offs between precision and recall.” Threshold
– A level at which the computer can classify diary entries with a certain level of precision
– Level will be computed using precision and recall Guesses
– Computer can either classify or not classify– If classifies, considered a guess– Ex: SAT tests
Threshold Example
Difference of top 2 scores– “going to lay [sic] in bed for 20 to 30 minutes”
Correct Location: 30125 – Bedroom Top Score: 30122 - Living room / family room: 0.6448 Second Score: 30125 – Bedroom: 0.6296 Relative Difference: (0.6448 - 0.6296) / 0.6448 = 0.0235
Precision & Recall
Precision– The accuracy of the computer system when it
encoded a diary entry Recall
– The number of total diary entries the computer made a correct guess on relative to the entire data set
Relationship between– Generally as precision goes up recall goes down
Example: Precision and Recall
Student takes 10 question test– Guesses at 7 questions– Answers 6 questions right
Precision– 86%, 6 out of 7 attempted answers correct
Recall– 60%, 6 answers correct out of all questions
Appropriate threshold levels
Done experimentally– Step size of 0.05
Attempt to determine tradeoff between precision and recall
Relationship between scores– Different between top 2 scores
Threshold results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
threshold
% o
f pre
cisi
on a
nd r
ecal
l
Activity Precision Activity Recall Location Precision Location Recall
Limitations
Optimal Classifier– Neural Network and Markov modeling
Database– Increased size
Context Information– Utilize more information from context
Questions?