mining auditory hallucinations from unsolicited twitter posts

26
Mining auditory hallucinations from unsolicited Twitter posts M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research Portorož, May 2016

Upload: maksim-belousov

Post on 23-Jan-2017

61 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Mining auditory hallucinations from unsolicited Twitter

postsM. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic

University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research

Portorož, May 2016

Mining auditory hallucinations from

unsolicited Twitter posts

schizophrenia

hearing voices

mental

psychosissymptom

sound

health

M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic

University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research

Portorož, May 2016

Mining auditory hallucinations from

unsolicited Twitterposts

social network

brief message

fewer than 140 characters

310M active usersshare opinions

spontaneous unforced

unasked-for

M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic

University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research

Portorož, May 2016

Mining auditory hallucinations from unsolicited Twitter

posts

knowledge discovery

exploratory

patternsunseen

dataanalysis

M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic

University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research

Portorož, May 2016

Mining auditory hallucinations from unsolicited Twitter

posts

schizophrenia

hearing voices

mental

psychosissymptom

sound

health

knowledge discoverypatternsunseen

social network

brief message

fewer than 140 characters

320M active usersshare opinions

spontaneous unforced

unasked-for

M. Belousov, M. Dinev, R. M. Morris, N. Berry, S. Bucci, G. Nenadic

University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research

Portorož, May 2016

Research aim

Q: Is it feasible to generate useful datasets from unsolicited Twitter posts regarding auditory hallucinatory experiences to support psychological investigations?

6

Research aim

Q: Is it feasible to generate useful datasets from unsolicited Twitter posts regarding auditory hallucinatory experiences to support psychological investigations?

6

A: Classification model that can predict whether a given post is related to hallucinatory experiences.

Potentially related posts7

I am hearing a scary voice right now, I don’t know if it’s in my head or in television.. Crazy

All twitter posts were paraphrased to preserve anonymity

If hallucinating is thought of as hearing voices that are not actually real, then these painkillers are causing me to hallucinate like mad ✅

Unrelated posts8

My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me

All twitter posts were paraphrased to preserve anonymity

So I was convinced I was hearing stuff. It was so funny because the noise was coming from the kitchen but I thought I was hallucinating ❌

Iterative workflow9

Define search queries

Collect unique posts from Twitter

Annotate posts & Explore data

Predict relatedness of posts to hallucinatory experiences

Analyse data

Redefine search queries

Data collection10

Search query

hallucinating hearing

(“hear things” OR “hearing things”) “in my head”

hearing scary things “in my head”

(hear OR hearing) (“other people” OR “other ppls” OR “other ppl”) thoughts

(voice OR voices) (commenting OR criticising) (scary OR frightening OR “everything I do”)

(hear OR hearing) (voice OR voices) (god OR angel OR allah OR spirit OR soul OR “holy spirit” OR djinn OR jinn)

(hear OR hearing) (voice OR voices) (scary OR devil OR demon OR daemon OR evil OR “evil spirit”)

List of defined search queries for Twitter Search API

Data annotation11

• Two research psychologists manually annotated posts:

• Assign classes: related or unrelated to hallucinations

• Highlight specific phrases to describe their decisions

• Later highlighted words and phrases were utilised to identify characteristics of each classification category

Data annotation process

RESULT: 401 annotated examples: 94 related to hallucinatory experiences

• The observed IAA was 0.85 on 41 examples (10% of the final annotated set)

Data exploration: semantic classes12

• Relative (father, friend)

• Communication Tool (phone)

• Audio Device (headphones, TV)

• Drug (cannabis, painkillers)

• Audio Recording (voicemail)

• Possible Hallucination(seeing things, in my head)

• Audio & Visual Media, Apps (song, YouTube, Siri)

• Religious Term (prayer)

• Emotional Support (helpline)

• Own Voice Indicator (my voice, our own voice)

• Fear Expression (scared, creepy)

• Abusive Language (sh*t, hell)

• Stigmatising Language(crazy, insane)

Text classification pipeline13

Im hearing a scary voice rn,idk if it’s in my head or in TV..craazy

Information Extraction

ClassificationText

Preprocessing

corrected text

structured text

raw(unstructured)

text

structured text

label

label: related to hallucinatory experience

I am hearing a scary voice right now, I don’t know if

audio device

it’s in my head or in television.. Crazy

stigmatising lang.

fear expr.

possible hallucination

O V V D A N R R O V V P

AL P D N & P N

POS tagset from Gimpel et al. (2011): O - personal pronoun, V - verb, D - determiner, etc.

Information extraction14

My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me

Neg. sentimentRelative [1] NE (person) [1] POS Tags

NE (misc) [1]

* Stanford NER using 4-class model trained on the CoNLL 2003 data

*

Information extraction14

My grandmom is watching Deliver Us From Evil and I can hear this weird high-pitched voice and want Ralph Sarchie to hold me

Neg. sentimentRelative [1] NE (person) [1]

key phrase extraction

POS Tags

hear this weird high-pitched voiceNeg. sentimentWeird / strange [1] POS Tags

V D A A N

NE (misc) [1]

* Stanford NER using 4-class model trained on the CoNLL 2003 data

*

Groups of features15

Feature group Features

Mentions of semantic classes mentions of each semantic class

Key phrases sentiment polarity, sem. classes, POS tags

Part-of-speech tags nouns, verbs, adjectives, etc.

Sentiment polarity positive, negative or neutral

Popularity of the post likes, retweets

Use of nonstandard language spelling mistakes, abbreviations

Number of Twitter entities URLs, #hashtags, @mentions

Named entities persons, locations, organisations

Lexical distribution sentences, words, characters

Classification scenario• 401 labelled examples: 94 related; 307 unrelated

• Three different types of classification methods:

• Naive Bayes (probabilistic model)

• Support Vector Machine (geometric model)

• AdaBoost (boosting of the tree model)

• Compare performance with simple baseline: tf-idf features

16

Evaluation17

Based on ten experiments of stratified 10-fold cross validation Baseline features outperform only with SVM, difference is non-significant (p-value=0.375)

Classification performance of various classification methods on two different sets of features

NB

SVM

AdaBoost

F2-score

0 0.225 0.45 0.675 0.9

0.711

0.751

0.486

0.772

0.743

0.831

Proposed featuresBaseline features

🏆

Contribution of features18

Features F2-score

Mentions of semantic classes *0.769▼

Key phrases *0.788▼

Part-of-speech tags 0.817▼

Sentiment polarity *0.818▼

Popularity of the post 0.828▼

Use of nonstandard language 0.831▬

Number of Twitter entities 0.832▲

Named entities 0.832▲

Lexical distribution 0.833▲

All features 0.831▲

* Statistically significant differences are marked with asterisk

Error analysis (highlights)19

Text Predicted Actual

I do not hear voices, I am not paranoid

✅Related

❌Unrelated

I’m hallucinating I’m hearing hawks! Oh hang on, it is just the television

✅Related

❌Unrelated

The voices which I hear every night tell me to do it

❌Unrelated

✅Related

All twitter posts were paraphrased to preserve anonymity

Generating dataset for analysis

1. Take best-performed classification model

2. Predict relatedness for unlabelled examples

3. Combine with 401 labelled (annotated) examples

RESULT: 4957 examples: 546 potentially related to hallucinatory experiences *

20

* e.g. Wiles et. al (2006) national survey only 62 cases identified

Preliminary data analysis21

Related

Unrelated

0 25 50 75 100

72%

19%

28%

81%• Negative sentiments significantly associated with posts that indicated the occurrence of auditory hallucinations

Preliminary data analysis21

Related

Unrelated

0 25 50 75 100

72%

19%

28%

81%• Negative sentiments significantly associated with posts that indicated the occurrence of auditory hallucinations

• Posts linked to auditory hallucinations had a higher proportionate distributionbetween the hours of 11pm and 5am

Summary• Experimental methodology to harvest and mine

datasets from unsolicited Twitter posts to identify potential psychotic(-like) experiences.

• Classification model that can relatively accurate predict the relatedness of posts to auditory hallucinations

• Preliminary data analysis that identified interesting patterns in sentiment polarity and posting time

• Future research: investigate expressions of sleep in Twitter users’ who report a diagnosis of a psychosis-related disorder

22

23

Questions?Acknowledgements

Centre for Doctoral Training, School of Computer Science, University of Manchester Health eResearch Centre (HeRC), The Farr Institute of Health Informatics Research

School of Psychological Sciences, University of Manchester