sentiment in german-language news and blogs, and the...

29
Sentiment in German-language News and Blogs, and the DAX Robert Remus 1,2 Khurshid Ahmad 2 Gerhard Heyer 1 1 Fakult¨ at f¨ ur Mathematik und Informatik Universit¨ at Leipzig, Germany 2 School of Computer Science and Statistics Trinity College Dublin, Ireland Text Mining Services, 2009 1 / 20

Upload: dangtuyen

Post on 08-Aug-2019

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Sentiment in German-language News and Blogs,

and the DAX

Robert Remus1,2 Khurshid Ahmad2 Gerhard Heyer1

1Fakultat fur Mathematik und InformatikUniversitat Leipzig, Germany

2School of Computer Science and StatisticsTrinity College Dublin, Ireland

Text Mining Services, 2009

1 / 20

Page 2: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

2 / 20

Page 3: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

2 / 20

Page 4: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

2 / 20

Page 5: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

2 / 20

Page 6: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Preamble: Assumptions

1. Our approach is data-driven and relies on the assumption, thatsentiment as a human quality is expressed in text and can beidentified by a machine using a frequency analysis of words as anapproximation

2. Moreover we assume that there is a possible relation betweenpublications on economics and finance and movements in financialmarkets

3 / 20

Page 7: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Preamble: Assumptions

1. Our approach is data-driven and relies on the assumption, thatsentiment as a human quality is expressed in text and can beidentified by a machine using a frequency analysis of words as anapproximation

2. Moreover we assume that there is a possible relation betweenpublications on economics and finance and movements in financialmarkets

3 / 20

Page 8: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

4 / 20

Page 9: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Our Corpus of News and Blogs

• The corpus is diachronically organised

• The news articles and blog posts were published and postedrespectively between 2006–2008

Corpus Items Word types Word tokens

News 8,812 3,911,104 137,343Blogs 1,719 431,722 33,325

5 / 20

Page 10: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Our Corpus of News and Blogs

• The corpus is diachronically organised

• The news articles and blog posts were published and postedrespectively between 2006–2008

Corpus Items Word types Word tokens

News 8,812 3,911,104 137,343Blogs 1,719 431,722 33,325

5 / 20

Page 11: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

6 / 20

Page 12: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Our German Dictionary of Affect I

• Our study mainly uses the terms categorized as Pos or Neg inHarvard University’s General Inquirer lexicon

• These terms were translated into German by a mixture of humanand machine translation, manually revised and extended by addinginflections afterwards, resulting in a German dictionary of thefollowing size:

Polarity Words

Positive 9,301Negative 10,697

7 / 20

Page 13: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Our German Dictionary of Affect I

• Our study mainly uses the terms categorized as Pos or Neg inHarvard University’s General Inquirer lexicon

• These terms were translated into German by a mixture of humanand machine translation, manually revised and extended by addinginflections afterwards, resulting in a German dictionary of thefollowing size:

Polarity Words

Positive 9,301Negative 10,697

7 / 20

Page 14: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Our German Dictionary of Affect II

• The frequency of occurrence of negative and positive terms followsa Zipf-like distribution

• Viewed anually their overall contribution to the news corpusremains constant at around 4% for positive terms and between2–3% for negative terms

8 / 20

Page 15: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Our German Dictionary of Affect II

• The frequency of occurrence of negative and positive terms followsa Zipf-like distribution

• Viewed anually their overall contribution to the news corpusremains constant at around 4% for positive terms and between2–3% for negative terms

8 / 20

Page 16: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Our German Dictionary of Affect III

The 10 most frequent “positive” terms (News corpus, 2006–2008)

Word GI equivalent frelviele plenty 0.08%viel plenty 0.08%gut good 0.07%grossen great 0.05%macht to create 0.05%grosse great 0.05%geben to give 0.04%angebot offer 0.04%teil deal 0.04%erhalten to obtain 0.03%

0.53%

9 / 20

Page 17: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Our German Dictionary of Affect IV

The 10 most frequent “negative” terms (News corpus, 2006–2008)

Word GI equivalent frelgegen against 0.15%ende — 0.09%fall fall 0.05%streik strike 0.04%krise crisis 0.04%kosten cost 0.04%finanzkrise — 0.04%knapp short 0.03%streiks strike 0.03%trotz defiance 0.03%

0.54%

10 / 20

Page 18: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

11 / 20

Page 19: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

DAX 30

• Our study uses the DAX 301, that comprises the 30 largest andmost actively traded German companies, which are listed in theFrankfurt Stock Exchange

1Deutscher Aktien IndeX 30

12 / 20

Page 20: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

13 / 20

Page 21: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Results: Stylised Variables I

Time series Min Max 104×Mean

DAX 30 -0.09 0.11 -2.03News Positive -1.82 1.66 1.65

News Negative -1.54 1.69 8.29Blogs Positive -2.2 2.08 5.34

Blogs Negative -2.18 2.48 16.18

14 / 20

Page 22: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Results: Stylised Variables II

Time series 102×Std Dev Skewness Kurtosis

DAX 30 1.61 0.15 11.14News Positive 28.21 -0.07 5.45

News Negative 35.44 -0.03 1.56Blogs Positive 45.02 -0.02 1.81

Blogs Negative 68.96 0.04 0.24

15 / 20

Page 23: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Outline

Preamble

A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30

ResultsStylised VariablesNon-Normal Distribution

Summary

16 / 20

Page 24: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Results: Non-Normal Distribution I

Probability distribution Normal DAXbetween (St. Dev.)

0 to 0.25 19.74% 33.46%0.25 to 0.5 18.55% 22.31%

0.5 to 1 29.98% 27.17%1 to 1.5 18.37% 10.24%1.5 to 2 8.81% 3.02%

2 to 3 4.28% 1.44%3+ 0.27% 2.36%

100% 100%

17 / 20

Page 25: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Results: Non-Normal Distribution II

Probability distribution Normal Newsbetween (St. Dev.) Positive Negative

0 to 0.25 19.74% 24.74% 22.77%0.25 to 0.5 18.55% 23.61% 20.41%

0.5 to 1 29.98% 30.01% 29.82%1 to 1.5 18.37% 11.67% 14.77%1.5 to 2 8.81% 4.70% 7.15%

2 to 3 4.28% 3.57% 4.14%3+ 0.27% 1.69% 0.94%

100% 100% 100%

18 / 20

Page 26: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Results: Non-Normal Distribution III

Probabiliy distribution Normal Blogsbetween Positive Negative

0 to 0.25 19.74% 19.53% 20.12%0.25 to 0.5 18.55% 19.88% 19.41%

0.5 to 1 29.98% 33.14% 31.12%1 to 1.5 18.37% 15.74% 15.74%1.5 to 2 8.81% 7.34% 8.76%

2 to 3 4.28% 3.55% 4.38%3+ 0.27% 0.83% 0.47%

100% 100% 100%

19 / 20

Page 27: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Summary

It has been shown that

• the distributions of returns of affect content in news and blogs arenot normal

• the returns, i.e. the changes, of affect content in German-language◦ news are higher than in the DAX◦ blogs are higher than in German-language news

• the volatility, i.e. the fluctuation of affect content inGerman-language◦ news is much higher than in the DAX◦ blogs is much higher than in German-language news

20 / 20

Page 28: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Summary

It has been shown that

• the distributions of returns of affect content in news and blogs arenot normal

• the returns, i.e. the changes, of affect content in German-language◦ news are higher than in the DAX◦ blogs are higher than in German-language news

• the volatility, i.e. the fluctuation of affect content inGerman-language◦ news is much higher than in the DAX◦ blogs is much higher than in German-language news

20 / 20

Page 29: Sentiment in German-language News and Blogs, and the DAXasv.informatik.uni-leipzig.de/media_asset/link/7/remus_tms.pdfSentiment in German-language News and Blogs, and the DAX Robert

Summary

It has been shown that

• the distributions of returns of affect content in news and blogs arenot normal

• the returns, i.e. the changes, of affect content in German-language◦ news are higher than in the DAX◦ blogs are higher than in German-language news

• the volatility, i.e. the fluctuation of affect content inGerman-language◦ news is much higher than in the DAX◦ blogs is much higher than in German-language news

20 / 20