2016 datascience emotion analysis - english version

Post on 12-Apr-2017

273 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Emotion Analysis for Big Data

NTHU CS, Yi-Shin Chen

Hello!I am Yi-Shin Chen

Currently in NTHU CS

Intelligent Data Engineering and Application Lab (IDEA Lab)

You can find me at:yishin@gmail.com

2

3

We Promote Diversity at

More than 50 % students come from other countries

Belize

France

St Lucia

Honduras

India China

Japan

Taiwan

Indonesia

São Tomé

4

1.Why Emotion Analysis

There are few personal reasons

5

I don’t understand woman!! Their words are very vague and ambiguous”

From Carlos Argueta, my first foreign Ph.D. graduate

He’s the one to select the topic of sentiment analysis.And the first suffering from depression in our lab

Children are BewilderingThey don't say and they cannot say.

6

7

2.Emotion AnalysisLet's see what others did/do

8

Natural Language Processing

▷Analyze Part-of-Speech (POS) tagging▷Understand word meaning▷Analyze the relationships between words

Need dictionaries & semantic relationshipsWord positions affect statement meaningsNeed different data for different languages

This is the best thing happened in my life.Det. Det. NN PNPre.Verb VerbAdj Difficult

9

Data Mining/Machine Learning

▷Collect massive data▷Manually annotate training data▷Analyze data with classifiers

Recollect training data for different languages

Low recall rates (<<25%) Easier?

10

3.Learning from Experience

Difference between Reality and Practice

11

Emotion Embedded in Trivia

▷Most trivia are ignored in previous works

• Stop Words are the first batch to be removed→ E.g., often, above, again

• Determiner, pronoun are usually ignored• Most nouns are considered unimportant

My mom always said school is more important

😒 Angry 😂 Sad 👶 Joy

12

Emotional Mistakes

▷Mistakes everywhere• Some are careless

→ E.g., Luve you

• Some are intentional→ E.g., I’m soooooooo happppppy

▷Mistakes are not recorded in dictionaries• How to annotate mistakes?

→ Annotation cost A LOT!

13

Children are our mentors

Mumbling from a mom

▷My one-year-old kid can detect my emotion• Without seeing my face• I did not change my tone• How come she is always right?

▷Guessing• She did not know grammar• She did not memorize any dictionary• My statements might have a lot of mistakes

Goal

Multi-lingual

14

4.Overcome Challenges

Insufficient Research Fund

15

Free Resources

▷Free Data• As long as they can be legally accessed

▷Open source software

16

Philosophy Slow Life

▷ Our students are often delayed by various reasons▷ Not follow the trends

• Usually against common sense in academic

No POS TaggingNo dictionaryMultilingual

😱

Failure Success POS TaggingMultiple dictionariesOne language

17

Teamwork

▷ Implementation team• Coding• More coding

▷Dreaming team• Reading papers• Design

▷Boasting team• Writing papers• Generating presentation

▷Anonymous

18

CrowdsourcingMerriam-Webster: Obtaining needed services, ideas, or content by soliciting contributions from a large group of people, especially an online community

Cost $$$

19

Subconscious Crowdsourcing

▷Crowdsourcing in subconscious• Free

• Extract the subconscious from daily-life records→ Ex1: “computers/companies/product-support/apple” in

delicious tag

→ Ex2: “Trump” “Nickname generator” in search log

→ Ex3: “School day again #sad” in Twitter

Chun-Hao Chang, Elvis Saravia and Yi-Shin Chen, Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media, The 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016), San Francisco, CA, USA, 18 - 21 August, 2016

20

5.Case1 : Analyze Emotions from

TextUtilize subconscious emotion patterns

21

Subconscious Emotion Big Data

▷Twitter, a good public source

Throwing my phone always calms me down #anger

My sister always makes things look much more worse than they seem >:[ #anger

Why my brother always crabby !?!? #rude #youranadult #anger #issues

WHY DOES MY COMPUTER ALWAYS FREEZE??? NEVER FAILS. #anger

Im wanna crazy,if my life always sucks like this. #anger

Hashtag and emoticon can represent emotion well; hence can be treated as annotated answers

22

Collect Emotion Data

23

Collect Emotion Data

24

Collect Emotion Data Wait! Need

Control Group

25

Not-Emotion Data

26

Not-Emotion Data

27

Not-Emotion Data

28

Preprocessing Steps

▷Hints: Remove troublesome oneso Too short

→ Too short to get important featureso Contain too many hashtags

→ Too much information to processo Are retweets

→ Increase the complexityo Have URLs

→ Too trouble to collect the page datao Convert user mentions to <usermention> and hashtags to

<hashtag>→ Remove the identification. We should not peek answers!

Big Data

anyway

29

Basic Guidelines

▷ Identify the common and differences between

the experimental and control groups• Analyze the frequency of words

→ TF•IDF (Term frequency, inverse document frequency)

• Analyze the co-occurrence between words/patterns

→ Co-occurrence

• Analyze the importance between words

→ CentralityGraph

30

Graph Construction

▷Construct two graphs• E.g.

→ Emotion one: I love the World of Warcraft new game → Not-emotion one: 3,000 killed in the world by ebola

Iof

Warcraftnew

game

WorldLove

the0.9

0.84

0.650.12

0.12

0.530.67

0.45

3,000world

byebola

the

killed in

0.49

0.870.93

0.83

0.55

0.25

31

Graph Processes

▷Remove the common ones between two

graphs• Leave the significant ones only appear in the

emotion graph▷Analyze the centrality of words

• Betweenness, Closeness, Eigenvector, Degree, Katz→ Can use the free/open software, e.g, Gaphi, GraphDB

▷Analyze the cluster degrees• Clustering Coefficient

GraphKey patterns

32

Essence Only

Only key phrases

→emotion patterns

33

Ranking Emotion Patterns

▷ Ranking the emotion patterns for each emotion• Frequency, exclusiveness, diversity• One ranked list for each emotion

SadJoy Anger

34

Emotion Pattern Samples

SadJoy Anger

finally * mytomorrow !!! * <hashtag> birthday .+ * yay ! :) * ! princess ** hehe prom dress *

memories * * without my sucks * <hashtag> * tonight :( * anymore .. felt so *. :( * * :((

my * alwaysshut the * teachers * people say * -.- * understand why *why are *with these *

35

Precision

Naïve Bayes SVM NRCWE Our Approach0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Acc

urac

y

LIWCNo LIWC

36

Feedback for Products

37

商品喜好分析

38

5.Case2: Analyze Emotion

Status for individualsWho is bi-polar disorder?

Who is borderline personal disorder?

39

Collect Patient Data

Support Group

40

Collect Patient Data

Followers

41

Collect Patient Data

42

Collect Patient Data

43

Collect Patient Data Wait! Control Group

Needed

44

Collect Data from Ordinary People

45

Collect Data from Ordinary People

46

Collect Data from Ordinary People

47

Basic Guidelines

▷ Identify the common and differences between

the experimental and control groups• Word/pattern frequency

• Emotion related data (e.g., flipping rates, occurrence rates)

• Social interaction (e.g., retweet, reply)

• Lifestyle (e.g., online time, stay-up or not)

• Age and genderFeatures

48

Apply Classifiers

▷ By utilize the extracted features

▷ Various classifiers• Neural Networks

• Naïve Bayes and Bayesian Belief Networks

• Support Vector Machines

• Random forest

49

Precisions

50

Possible Applications

51

Possible Applications

52

Possible Applications

53

Possible Applications

54

Election Analysis?

55

Election Analysis?

56

Election Analysis?

57

Election Analysis?

58

Election Analysis?

More in the future…

Thank you.Contact me at:yishin@gmail.com

top related