predicting depression from social medialeginusmartin.com › portfolio › presentations ›...
TRANSCRIPT
Predicting Depression via Social MediaMunmun De Choudhury, Michael Gamon, Scott Counts and Eric Horvitz
Martin Leginus
Depression
Lifetime prevalence varies from 3% in Japan to 17% in the USASometimes, people are not aware that are depressed e.g., slow onset of depression
Social networks
Can we use social networks to detect mental diseases?
Social media
• Fine‐grained signals of user behavior over longer period of time
• Symptoms of mental disorders are more observable in comparison to other diseases i.e., social engagement, emotion, language or linguistic styles
• Authors focus on Major Depressive Disorder (MDD).• Symptoms are: low mood, low‐self esteem, loss of interest, negative perception of the world
Aim: Predict given user activities on social media whether he/she is likely to be depressed?
How to get a ground truth data?
Crowdsourcing• Amazon’s Mechanical Turk service
• Crowd workers filled in two depression screening tests• http://www.bcbsm.com/pdf/Depression_CES‐D.pdf• http://www.thecommunityhouse.org/wp‐content/uploads/2012/01/Beck‐Depression‐Inventory‐and‐Scoring‐Key1.pdf
• Self‐reported questions:• Had you been diagnosed with clinical depression?• If yes, what was the estimated onset?• Are you depressed or taking any antidepressants at the moment?
Participants should have a public Twitter profile
• 1583 participants (price ~ 1400$) • only 40% agreed to share their Twitter feeds • further cleaning results in 476 users (243 men, 233 women)• 171 users scored positive for depression
Having a job again makes me happy. Less time to be depressed and eat all day while watching sad movies.“Are you okay?” Yes…. I understand that I am upset and hope‐less and nothing can help me… I’m okay… but I am not
alright“empty” feelings I WAS JUST TALKING ABOUT HOW I I HAVE EMOTION OH MY GOODNESS I FEEL AWFUL
I want someone to hold me and be there for me when I’m sad.Reloading twitter till I pass out. *lonely* *anxious* *butthurt* *frustrated* *dead*
How to get a ground truth data?
Characteristic attributes
• Engagement• Egocentric social graph• Emotion• Depression language• Linguistic language
43 different features which are calculated daily over the period of one year.
EngagementVolume ‐ # of posts per day made by the userReply posts ‐ proportion of user @reply posts – indicates social interactionRetweets ‐ proportion of user retweets – indicates information sharingLinks ‐ # of shared urls over a dayQuestion‐centric ‐ # of posts which try to seek or derive information from the
Twitter usersInsomnia index ‐ difference between # of posts during ”night” and ”day” window
EngagementVolume ‐ # of posts per day made by the userReply posts ‐ proportion of user @reply posts – indicates social interaction
Egocentric social graphSet of nodes from user’s two‐hop neighborhood
Measuring the following:• # of incoming or outgoing posts• Reciprocity – # of user responds to communication started by other user• Prestige ratio – a ratio of # of messages sent to user u, to the # of messages
targeted to user v• Graph density, clustering coefficient, size of the graph, • embeddedness, # of ego components
user u
user v
user y
user x
user w
user yy
user xx
Edge between two users implies a communication with @replies during a given day
Egocentric social graphEgonetwork measures Depressed class Non‐depressed class
#followers/inlinks 26.9 (σ=78.3) 45.32 (σ=90.74)
#followees/outlinks 19.2 (σ=52.4) 40.06 (σ=63.25)
Reciprocity 0.77 (σ=0.09) 1.364 (σ=0.186)
Prestige ratio 0.98 (σ=0.13) 0.613 (σ=0.277)
Graph density 0.01 (σ=0.03) 0.019 (σ=0.051)
Clustering coefficient 0.02 (σ=0.05) 0.011 (σ=0.072)
2‐hop neighborhood 104 (σ=82.42) 198.4 (σ=110.3)
Embeddedness 0.38 (σ=0.14) 0.226 (σ=0.192)
#ego components 15.3 (σ=3.25) 7.851 (σ=6.294)
EmotionPsycho linguistic resource LIWC to measure positive or negative affect
ANEW lexicon used for computing activation and dominance• Activation describes a physical intensity in an emotion (terrified is greater than scared)• Dominance refers to the degree of control in an emotion (anger is dominant, fear is submissive)
Linguistic styleUsing linguistic resource LIWC for recognizing 22 specific linguistic styles:
articles, auxiliary verbs, conjunctions, adverbs, personal pronouns, prepositions, functional words, assent, negation, certainty and quantifiers
Emotion + Linguistic style
Depression languageDepression lexicon built from Yahoo answers on Mental Health ~ 900k Q&A pairs
Association for each word and regex ”depress*” calculated using• Pointwise mutual information• Log likelihood ratio
Top 1000 words with the highest tf‐idf
Antidepressant usage – list of antidepressants from Wikipedia used to construct drugs lexicon
Depression languageTheme Unigrams
Symptoms anxiety, withdrawal, severe, delusions, adhd, weight, insomnia, drowsiness, suicidal, appe‐tite, dizziness, nausea, episodes, attacks, sleep, seizures, addictive, weaned, swings, dysfunc‐tion, blurred, irritability, headache, fatigue, imbalance, nervousness, psychosis, drowsy
Disclosure fun, play, helped, god, answer, wants, leave, beautiful, suffer, sorry, tolerance, agree, hate, helpful, haha, enjoy, social, talk, save, win, care, love, like, hold, cope, amazing, discuss
Treatment medication, side‐effects, doctor, doses, effective, prescribed, therapy, inhibitor, stimulant, antidepressant, patients, neurotransmitters, prescriptions, psychotherapy, diagnosis, clinical, pills, chemical, counteract, toxicity, hospitalization, sedative, 150mg, 40mg, drugs
Relationships, life home, woman, she, him, girl, game, men, friends, sexual, boy, someone, movie, favorite, jesus, house, music, religion, her, songs, party, bible, relationship, hell, young, style, church, lord, father, season, heaven, dating
Predicting depressive behavior
Feature vectors
For each feature, the following four features are computed
• Mean frequency• Variance• Mean momentum• Entropy
188 features = 43 attributes x 4 + 4 demographic features
Principal component analysis to reduce number of features
Classifier
• Support Vector Machine classifier• Radial‐basis kernel• 10‐fold cross validation and 100 randomized experimental runs
Results
precision recall acc. (+ve) acc. (mean)
engagement 0.542 0.439 53.212% 55.328%
ego‐network 0.627 0.495 58.375% 61.246%
emotion 0.642 0.523 61.249% 64.325%
linguistic style 0.683 0.576 65.124% 68.415%
depression language 0.655 0.592 66.256% 69.244%
demographics 0.452 0.406 47.914% 51.323%
all features 0.705 0.614 68.247% 71.209%
reduced dimensions 0.742 0.629 70.351% 72.384%
Results
Discussion
• Implications• Privacy issues
Conclusion and future work
• 43 different attributes that characterize depressed users of social media
• Crowdsourced golden standard• Forecast of depression before reported onset
Questions