![Page 1: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/1.jpg)
Lexical Differences in Autobiographical Narratives from SchizophrenicPatients and Healthy Controls
Kai Hong, Christian G. Kohler, Mary E. March,
Amber A. Parker, Ani Nenkova
University of Pennsylvania
![Page 2: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/2.jpg)
Our Task
Identifying significant differences in lexical use from narratives by Patients vs Controls
Perform automatic classification
Identify a small subset of highly distinguishing features
How prediction accuracy varies with emotion type
![Page 3: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/3.jpg)
Observations on lexical use# occurrences in narratives
Subjects Patient Controldog/dogs 28 1money 41 4sorry 0 7relationship 0 9
Self reference – “ I “Total occurring times: 1291 times vs 626 timesRatio after normalization by #words: 5.5% vs 4.3%
![Page 4: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/4.jpg)
Dataset 201 stories from 39 subjects
- Patients: 120 stories, 23 patients- Controls: 81 stories, 16 controls
Five emotions: Anger Sad Happy Disgust Fear
Talk about past experience (moderately, mildly, extremely) in their lives
30 – 90 seconds to finish the story
![Page 5: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/5.jpg)
Length of Stories
• No big difference when Patients vs Controls• Some difference between emotions.
Average # WordsPatients 192
Controls 181
P-value: 0.4254
![Page 6: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/6.jpg)
Workflow
Narratives (Training)
Features
Lexical Feature Extraction
![Page 7: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/7.jpg)
Features Basic Feature - Few, easy to compute general features Lexical Features and Repetitions - Sparse and many LIWC, Diction - Based on dictionary, More general Two-tailed T-test for significant features - 169 out of 6057 significant
![Page 8: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/8.jpg)
Basic Features• Patients have more: sentences/document, words/document
• Control have more: letters/word, words/sentence, tokens/vocabulary
Control > SCH P-valueletters/word 0.003
words/sentence 0.001tokens/vocabulary 0.153
SCH > Control P-valuesentences/ doc 0.038
words/doc 0.460
![Page 9: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/9.jpg)
Repetitions• Example: One day um , my um , my sister had brought my , her niece , her daughter , my sister had brought her daughter uh to watch my dog right .
• Repetition: Calculate the frequency that one word appeared repeatedly within some window size (5).
• Repetition of Words and punctuations
![Page 10: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/10.jpg)
Repetitions: Significance?
Rep-Word0
0.02
0.04
0.06
SCHNC
Rep-Punc0
0.010.020.030.040.050.06
SCHNC
P-value < 0.001 P-value < 0.001
Significant: - Rep-word
Significant: - Rep-punctuation
![Page 11: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/11.jpg)
Lexical Features
• Words - Frequency in narratives
• Repetition of specific words - The presence of repetition about one word (0/1)
• Example: She was , she was a huge , she was very , very wonderful.
![Page 12: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/12.jpg)
More common in Schizophrenia
P-value Features< 1e-3 I couldn’t extremely mildly money0.001 – 0.01 extreme feeling moderately my took
way ?0.01 – 0.05 ain’t alone at aw before
behind became care chance confused
• First personal pronoun: I, my• money• Feelings • some adverbs: mildly, moderately, extremely• ?
![Page 13: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/13.jpg)
More common in Schizophrenia
• Focus on family (grandfather, sister, son)• dog/ dogs
P-value Features0.01 – 0.05 December dog dogs forty friends
god got grandfather guess guyhand hanging hearing hundred increasedlooking loved mental met mildmoderate myself outside paper passedpiece remember sister son standstand stop story take takenthrowing trouble use wake wanna
![Page 14: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/14.jpg)
More common in Control
P-value Features
< 1e-3 comma0.001 – 0.01 really sorry very0.01 – 0.05 able actually are basically be
being get’s in late notrelationship result she’s sleep telltheir there’s weeks
• Third person plural: their • sorry• Some adjectives and adverbs: actually, basically,
really, very
![Page 15: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/15.jpg)
Significant Rep+Lexical Features
• Patients: more repetition of and, um, I, a, was.
Schizophrenia Status P-valueRep-and SCH < 0.001Rep-um SCH 0.008
Rep-I SCH < 0.001Rep-a SCH 0.011
Rep-was SCH 0.018
![Page 16: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/16.jpg)
Significant Rep+Lexical Features• Patients: more repetition of and, um, I, a, was.
• Control: more repetition of comma, very.
Schizophrenia Status P-valueRep-and SCH < 0.001Rep-um SCH 0.008
Rep-I SCH < 0.001Rep-a SCH 0.011
Rep-was SCH 0.018
Control Status P-valueRep-, NC 0.001
Rep-very NC 0.007
![Page 17: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/17.jpg)
LIWC Method: - Degree for usage of different categories of words - Dictionary based approach - 69 dictionaries
Example: Cried - sadness, negative emotion, overall effect, verb, past-tense verb
Previous Use - writing styles, physical and emotional pain (Tausczik and Pennebaker, 2010)
![Page 18: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/18.jpg)
LIWC – Significant Features
Category #words Example Status P-valueI 12 I, me, mine SCH < 0.001personal pronoun
70 I, them, itself, you SCH 0.029
insight 195 Think, know, consider
SCH 0.026
adverb 69 Very, really, quickly
NC 0.001
exclusive words
17 But, without, exclusive
NC 0.005
Inhibition 111 Block, constrain, stop
NC 0.019
More common for Patients & Control
![Page 19: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/19.jpg)
DictionMethod: - Also dictionary-based approach - 28 small categories, 5 master variables
Master variables (major categories) - Realism, Optimism, Certainty, Activity, Commonality.
Example: Certainty = [Tenacity + Leveling + Collectives + Insistence] - [Numerical Terms + Ambivalence + Self Reference + Variety]
![Page 20: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/20.jpg)
Diction – Significant Features
Category Status P-valueself reference SCH < 0.001
cognitive terms SCH 0.014
past SCH 0.036insistence SCH 0.046satisfaction SCH 0.047
![Page 21: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/21.jpg)
Diction – Significant Features
Category Status P-valueself reference SCH < 0.001word-mean-length NC < 0.001realism NC < 0.001diversity NC 0.005familiarity NC 0.019cognitive terms SCH 0.014cooperation NC 0.027past SCH 0.036insistence SCH 0.046satisfaction SCH 0.047
![Page 22: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/22.jpg)
Workflow
Narratives(Training)
Features
Lexical Feature Extraction
![Page 23: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/23.jpg)
Workflow
Narratives(Training)
Features Selected Features
Narratives(Training)
Lexical Feature Extraction
Feature Selection
![Page 24: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/24.jpg)
Feature Selection Two-tailed T-test for real valued features
- Thresholds: 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.15
Signal to noise - Using Challenge Learning Object Package (CLOP)
![Page 25: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/25.jpg)
Experimental Setup Leave-one-subject-out (39 times) Subject Status = Story Status
![Page 26: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/26.jpg)
Experimental Setup
Voting: stories -> subjects
Evaluation metrics: Accuracy and F-measure - by stories - by subjects
![Page 27: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/27.jpg)
Workflow
Narratives(Training)
Features Selected Features
Narratives(Training)
Narratives(Testing)
SVM-light +
Control
Patients
Lexical Feature Extraction
Feature Selection
Narratives(Training) Voting
?
![Page 28: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/28.jpg)
Performance by T-testMuch higher than Random
P-value by Story by Subject # Features
0.05 62.7 64.1 169
![Page 29: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/29.jpg)
Performance by T-testMore noise when relaxing threshold
P-value by Story by Subject # Features0.15 59.0 58.9 4500.1 61.7 64.1 341
0.05 62.7 64.1 169
![Page 30: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/30.jpg)
Performance by T-test Better performance when tighten the threshold Best Performance when threshold = 0.001
P-value by Story by Subject # Features0.15 59.0 58.9 4500.1 61.7 64.1 341
0.05 62.7 64.1 1690.01 57.7 65.4 44
0.005 64.2 71.6 320.001 65.7 75.6 18
0.0005 61.7 66.7 14
![Page 31: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/31.jpg)
Performance changing with feature size Best performance achieved when feature = 25 Signal to noise selection
![Page 32: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/32.jpg)
Best Performance by Signal-to-noise
Achieved when #Features = 25 - Accuracy for story 64.7%, accuracy for subject: 76.9% - Patient Recall: 91.3%
Schizophrenia Control General
P(%) R(%) F(%) P(%) R(%) F(%) Accuracy Macro-F
75.0 91.3 82.4 81.8 56.3 66.7 76.9 74.6
![Page 33: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/33.jpg)
Status Prediction by Emotion
Accuracy (%) Signal-to-noise (25) T-test (0.05) T-test (0.001)
Happy 66.7 59.0 71.8
Disgust 63.4 61.0 51.2
Anger 61.0 70.7 70.7
Fear 60.0 55.0 67.5
Sad 72.5 60.0 67.5
Story 64.7 62.9 65.7
Patient 76.9 64.1 74.4
Same training data Predict on different emotions Different approaches and settings
![Page 34: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/34.jpg)
Number of features on different thresholds
0.1 0.05 0.010
50
100
150AngerSadHappyDisgustFear
p-value
# Features
More features -> more distinguishing
From one emotion
T-test
![Page 35: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/35.jpg)
Emotion related analysis
Emotion Schizophrenia ControlHappy ambivalent doDisgust dogs, health communicationAnger argued praiseFear money accidentSad satisfaction working
Higher value in each emotion
![Page 36: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/36.jpg)
Conclusion Analyze distinguishing power of different features - Basic features - Lexical features, repetitions - LIWC - Diction 25 features: top performance (65%, 77%) - p-value feature selection - signal-to-noise feature selection Different emotions have different distinguishing power - anger, sad > happy > fear, disgust
![Page 37: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/37.jpg)
Thank you !!!
![Page 38: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/38.jpg)
Backup Slides
![Page 39: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/39.jpg)
Related work LMs to detect language dominance and language impairment
(Gabani et al, 2009) Speech related features for autism patients (Heeman et al, 2010) Syntax features for mild cognitive impairment (Roark et al, 2011) Syntactic complexity features for autism (Prud’hommeaux et al, 2011) Lexical features to recognize different personalities (Gill et al,
2009; Mairesse et al, 2006) Predict adherence to treatment and syndrome scale in
Schizophrenia through conversations (Howes, et, al, 2012)
![Page 40: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/40.jpg)
Language Model
Using Unigram, Bigram, Trigram
Use Pos-Tag and Lexical
Simply using Laplace smoothing,
![Page 41: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/41.jpg)
LMs Performance
By Story(%) Schizophrenia-F Control-F AccuracyRandom 54.4 44.6 50.02-gram 62.5 44.4 55.22-gram-pos 62.2 53.3 58.2
By Subject(%) Schizophrenia-F Control-F AccuracyRandom 54.1 45.0 50.02-gram 62.5 50.0 58.92-gram-pos 62.2 54.5 61.5
![Page 42: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/42.jpg)
Feature Normalization
Approach 1: - Get the average from training data.
Approach 2: - Get the maximum and minimum from training data. - Projection into [0,1].
![Page 43: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/43.jpg)
Motivating Applications
Track patient status between visits
Early automatic diagnosis and screening
![Page 44: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/44.jpg)
Best Performance by Signal-to-noise Achieved when #Features = 25 - Accuracy for story 64.7%, accuracy for subject: 76.9% - Patient Recall: 91.3%
Schizophrenia Control General
Measurement P(%) R(%) F(%) P(%) R(%) F(%) Accuracy Macro-F
Story Majority 59.7 100 74.8 0 0 0 59.7 37.425-
Features68.7 75.0 71.7 57.1 49.4 52.9 64.7 62.3
Sub-ject
Majority 59.0 100 74.2 0 0 0 59.0 37.125-
Features75.0 91.3 82.4 81.8 56.3 66.7 76.9 74.6
(All) Average 59.7 50 54.4 40.5 50 44.6 50.0 49.5
![Page 45: Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls Kai Hong, Christian G. Kohler, Mary E. March, Amber](https://reader035.vdocuments.us/reader035/viewer/2022062314/56649de35503460f94ada036/html5/thumbnails/45.jpg)
Diction Definitions• Cognitive terms: Modes of discovery, Mental challenges, Institutional
learning practices, Intellection: intuitional, retionalistic, calculative.• Reality: [Familiarity + Spatial Awareness + Temporal Awareness +
Present Concern + Human Interest + Concreteness] -[Past Concern + Complexity]
• Diversity: Neutral: inconsistent, contrasting; Positive: exceptional, unique; Negative: Extremist
• Cooperation: work relations, interactions, associations, job-related tasks, personal involvement, etc. (sisterhood, friendship, teamwork, consolidate, relationship)
• Familiarity: consisting of a selected number of C.K. Ogden’s (1968) operation words which he calculates to be the most common words in the English language