hierarchical emotion classification and emotion component analysis on chinese micro-blog posts hua...

32
Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1 , Weiwei Yang 1 , Jiushuo Wang 1, 2 1 State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University 2 School of Information Science and Engineering, Hebei University of Science and Technology Expert Systems with Applications 2015 報報報 報報報 2015/8/18

Upload: augustus-manning

Post on 01-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Hierarchical emotion classification and emotion

component analysis on chinese micro-blog postsHua Xu1, Weiwei Yang1, Jiushuo Wang1, 2

1State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and

Technology, Department of Computer Science and Technology, Tsinghua University

2School of Information Science and Engineering, Hebei University of Science and Technology

Expert Systems with Applications 2015報告者:劉憶年 2015/8/18

Page 2: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Outline

Introduction

Related work

Emotion classification

Emotion component analysis

Experiment results and analysis

Application

Conclusion

2

Page 3: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Introduction (1/4)

For years, researchers are trying to classify the emotions in text automatically.

The views and attitudes, of course, often contain emotions.

Micro-blog posts directly reflect users’ opinions.

The length of posts brings challenges to emotion classification and requires more effective methods to extract features. Besides, Internet slang is not easy to cope with either because it does not follow language rules.

Emotion, definitionly, is a subjective thought or feeling like happy, angry, etc, while sentiment addresses the objective positive and negative attitudes. It is possible that a post contains sentiment but no emotions.

1. The phone broke within two days. 3

Page 4: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Introduction (2/4)

Currently, most researchers are focusing on sentiment analysis and emotion classification on six basic coarse-grained emotion classes, which consist of happy, surprise, angry, disgusted, fear and sad. However, coarse-grained emotions cannot depict the emotions in text perfectly.

2. This car is not so easy to drive as the ad says. I am so

disappointed.

In order to better describe emotions, fine-grained emotions need to be added to coarse-grained emotion categories, which forms hierarchy. Besides, adopting fine-grained emotions greatly increases the number of classes, which brings difficulty for flat classifiers, so hierarchical classification is required.

4

Page 5: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Introduction (3/4)

So far, the corpus of most work is in English. Not many papers’ results are based on Chinese.

Psychological emotion dictionary, Internet slang dictionary and emoticon dictionary are employed to segment posts and form the feature space, which is then selected by a combination of χ2-test, word frequency and pointwise mutual information (PMI), in order to retain effective features. Finally, we employ support vector regression (SVR) and rule sets, which are generated by PMI values, to get the classification results, which, as reported later, are very encouraging.

5

Page 6: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Introduction (4/4)

In this paper, a four-level fine-grained emotion hierarchy with 19 basic emotions is adopted. However, posts usually contain more than one kind of emotions.

So we propose an emotion component analysis (ECA) algorithm to detect the principal emotions in posts and calculate the corresponding ratios according to the classification results, which, more specifically, according to distances between regression values and class thresholds.

6

Page 7: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Related work (1/3)

Although probability-based algorithms are quite useful, machine learning approach is more preferred by researchers nowadays.

In order to better classify text, researchers spend time constructing and improving emotion lexicons.

Emotion lexicons bring magnificent improvement to emotion classification on text.

In addition to classification algorithms and emotion lexicons, corpus is also an option. Some researchers try to classify emotions on blog posts.

7

Page 8: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Related work (2/3)

The flat classification can classify the examples directly relative to the hierarchical classification. While the hierarchical classification classifies the examples from top to bottom according to the pre-determined multi-layer classification system and gets the final classification result in the bottom. The flat classification is mostly adopted, which brings difficulty for classifiers to distinguish between the examples belong to its class and other classes when given a large dataset.

Recent years, as micro-blog is used more and more widely, micro-blog posts become a new source of corpus for emotion classification.

Besides, experiments on other kinds of corpus are also reported, e.g. e-mails, novels and Japanese dialog systems. 8

Page 9: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Related work (3/3)

Our contributions are different. We hierarchically classify Chinese micro-blog posts into 19 fine-grained emotion classes with machine learning approach and propose an ECA algorithm based on the regression values.

In the process of segmentation, a psychological emotion dictionary is adopted in this paper for improving the effect of the algorithm, which has important scientific values both on social network knowledge discovery and data mining.

9

Page 10: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion classification -- Hierarchy

This hierarchy contains 19 fine-grained emotion classes at the bottom level and 20 leaf nodes if considering neutral, which denotes the non-emotional class.

10

Page 11: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion classification -- Preprocessing

Usernames.– However, this part is surely non-emotional, so we take it away by

detecting @ symbol and remove it together with the username.

Topics.– Every user can take part in discussions under a certain topic. To

participate, users only need to include the topic in posts denoted by two # symbols, e.g. #Emotion Analysis#.

Links.– Users can include links in their posts. The links will be converted

into short links by the micro-blog platform to reduce occupied space.

Position information.– Micro-blog platforms allow users to add position information at

the end of posts, which will not help in emotion classification.

11

Page 12: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion classification -- Feature extraction

In all, emoticon features can express some more complex emotions, so extracting the emoticons features is important.

By mining the POS features, we employ ICTCLAS package to segment posts and then extract adjectives, nouns, verbs, etc to form the feature space. Meanwhile, two semantic rules are applied. The first one is to extract repeated exclamation marks (!) and question marks (?). The second one is to put negative words and adjacent adjectives together, such as phrases have opposite meanings from the original adjectives. However, there may be adverbs between them, we set a distance threshold at 3 according to Chinese language habit.

12

Page 13: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion classification -- Feature selection (1/3)

More than 20,000 words are extracted in the last step, so it is necessary to select effective features from the original feature space. Here χ2-test, which is implemented by Weka, together with word frequency and PMI are adopted.

(1)

13

p(t)p(c)c) p(t,

log c) PMI(t,

Page 14: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion classification -- Feature selection (2/3)

14

Page 15: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion classification -- Feature selection (3/3)χ2-test can pick out the words that are highly correlated with classes. However, it can be affected by the frequency of the words, so word frequency ratio is adopted as auxiliary information.

The selection of low-frequency words depends on PMI, as it is less sensitive to word frequencies. The words with higher PMI values than positive threshold are all picked out to form the low-frequency word set.

15

Page 16: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion classification -- Classification (1/2)

SVR allows us to dynamically select the classification threshold, rather than a fixed one in SVM.

16

Page 17: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion classification -- Classification (2/2)

The class with maximum distance between regression value and threshold is selected as the final result, as it is the most confident one.

17

Page 18: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion component analysis (1/2)

Usually, a micro-blog post contains more than one kind of emotions, so only classification results can not accurately reflect the emotion components. Based on the confidence concept in multi-class classification, we propose an ECA algorithm to detect the principal emotions and calculate ratios in the post.

3. This flower is picked at the side of road and brings me

good mood. If you can find such little nice things in

daily life, you will be a happy guy.

18

Page 19: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Emotion component analysis (2/2)

19

Page 20: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Experiment results and analysis -- Dataset (1/2)

As there is no benchmark dataset for fine-grained emotion classification, we chose 9960 original Chinese micro-blog posts from Sina Weibo randomly and crawled them as dataset for keeping the authenticity and practicality of the posts.

Two annotators finish the annotation separately. Disagreed annotations make up about 35%. This is acceptable considering the lack of clear boundaries between emotions and the existence of emotion combinations. Disagreed annotations are resolved by the first author, who chooses one of the competing labels as the final label.

20

Page 21: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Experiment results and analysis -- Dataset (2/2)

21

Page 22: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Experiment results and analysis -- Experimental group setting

In the psychological emotion dictionary, there are more than 52,000 words, and we put these words into 6 groups. Each group can describe one kind of emotions. These emotions are happy, distressed, surprised, fearful, angry and disgusted.

22

Page 23: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Experiment results and analysis -- Level results (1/2)

23

Page 24: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Experiment results and analysis -- Level results (2/2)

It proves the effect of our feature selection method, by which many noisy features are taken away and highly correlated features are retained.

It turns out that the psychological emotion dictionary does have positive effect for classification, as it’s the only difference between them.

It turns out that all classifiers perform well and good performance of the whole model can be expected.

24

Page 25: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Experiment results and analysis -- Hierarchical results (1/2)

In hierarchical classification, each test example is classified from the top level successively to the bottom level.

25

Page 26: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Experiment results and analysis -- Hierarchical results (2/2)

In flat classification, it is not easy for each classifier to distinguish between the examples belong to its class and other classes when given the whole dataset. Hierarchical classification, on the contrary, takes away most of irrelevant examples by upper-level classifiers and makes it easier for lower-level ones to classify.

26

Page 27: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Experiment results and analysis -- ECA results

We adopt human judgement to judge the ECA results.

Generally, if the analysis result of a post is supported by more than half of judgers, we would consider it plausible.

27

Page 28: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Application

First, we will apply our algorithm to consumer behavior analysis.

Second, we can also apply our algorithm to the effect analysis of commercial promotion.

Third, it is possible for us to track the emotion changes characteristics of micro-blog users, so that we can track their happiness and the happiness index of certain areas and so on.

28

Page 29: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Conclusion (1/4)

This paper focuses on emotion classification and emotion component analysis on Chinese micro-blog posts. We get good classification results on our dataset by applying several optimization methods, which are proved effective by the comparison between groups. We also propose an ECA algorithm, which can detect the four principal emotions in posts and calculate portions.

29

Page 30: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Conclusion (2/4)

First, in the application area of social management, the government can find some existing problems by analyzing public emotions in social media. Second, in the process of segmentation, a psychological emotion dictionary is adopted in this paper for improving the effect of the algorithm, which has important scientific values both on social network knowledge discovery and data mining. Third, many researchers are now focusing on positive / negative or coarse-grained basic emotion classification with 6–7 classes, while in this classification procedure, a four-level fine-grained emotion hierarchy with 19 basic emotions is adopted.

30

Page 31: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Conclusion (3/4)

First, this paper employs ICTCLAS package to segment Chinese posts, but because of more oral expressions in blogs, the effect of Chinese word segmentation is not very well. Second, due to the complexity of feature space in the process of classification, we need to perfect the algorithm of feature extraction and feature selection. Third, our ECA algorithm is designed based on the limited factors, although it has certain rationality, it could be improved in the future.

31

Page 32: Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts Hua Xu 1, Weiwei Yang 1, Jiushuo Wang 1, 2 1 State Key Laboratory

Conclusion (4/4)

First, we will focus on making up a new dictionary, which contains more emotional words and slang on micro-blog, so that the effect of feature extraction can be improved. Second, we will also try to improve our ECA algorithm by adding more factors in order to get better analysis results, such as redesigning the calculation formula, normalizing the classification value and son on. Third, as for the sarcasm expressions on micro-blogs posts, they involve the problem of more deep semantic analysis, scenario analysis and contextual analysis, and we will put them as our further research content.

First, our research could be applied to precision marketing for product recommendation. Second, our research can also be used to develop a system of opinion analysis system. 32