sentiment in german-language news and blogs, and the...
TRANSCRIPT
Sentiment in German-language News and Blogs,
and the DAX
Robert Remus1,2 Khurshid Ahmad2 Gerhard Heyer1
1Fakultat fur Mathematik und InformatikUniversitat Leipzig, Germany
2School of Computer Science and StatisticsTrinity College Dublin, Ireland
Text Mining Services, 2009
1 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
2 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
2 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
2 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
2 / 20
Preamble: Assumptions
1. Our approach is data-driven and relies on the assumption, thatsentiment as a human quality is expressed in text and can beidentified by a machine using a frequency analysis of words as anapproximation
2. Moreover we assume that there is a possible relation betweenpublications on economics and finance and movements in financialmarkets
3 / 20
Preamble: Assumptions
1. Our approach is data-driven and relies on the assumption, thatsentiment as a human quality is expressed in text and can beidentified by a machine using a frequency analysis of words as anapproximation
2. Moreover we assume that there is a possible relation betweenpublications on economics and finance and movements in financialmarkets
3 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
4 / 20
Our Corpus of News and Blogs
• The corpus is diachronically organised
• The news articles and blog posts were published and postedrespectively between 2006–2008
Corpus Items Word types Word tokens
News 8,812 3,911,104 137,343Blogs 1,719 431,722 33,325
5 / 20
Our Corpus of News and Blogs
• The corpus is diachronically organised
• The news articles and blog posts were published and postedrespectively between 2006–2008
Corpus Items Word types Word tokens
News 8,812 3,911,104 137,343Blogs 1,719 431,722 33,325
5 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
6 / 20
Our German Dictionary of Affect I
• Our study mainly uses the terms categorized as Pos or Neg inHarvard University’s General Inquirer lexicon
• These terms were translated into German by a mixture of humanand machine translation, manually revised and extended by addinginflections afterwards, resulting in a German dictionary of thefollowing size:
Polarity Words
Positive 9,301Negative 10,697
7 / 20
Our German Dictionary of Affect I
• Our study mainly uses the terms categorized as Pos or Neg inHarvard University’s General Inquirer lexicon
• These terms were translated into German by a mixture of humanand machine translation, manually revised and extended by addinginflections afterwards, resulting in a German dictionary of thefollowing size:
Polarity Words
Positive 9,301Negative 10,697
7 / 20
Our German Dictionary of Affect II
• The frequency of occurrence of negative and positive terms followsa Zipf-like distribution
• Viewed anually their overall contribution to the news corpusremains constant at around 4% for positive terms and between2–3% for negative terms
8 / 20
Our German Dictionary of Affect II
• The frequency of occurrence of negative and positive terms followsa Zipf-like distribution
• Viewed anually their overall contribution to the news corpusremains constant at around 4% for positive terms and between2–3% for negative terms
8 / 20
Our German Dictionary of Affect III
The 10 most frequent “positive” terms (News corpus, 2006–2008)
Word GI equivalent frelviele plenty 0.08%viel plenty 0.08%gut good 0.07%grossen great 0.05%macht to create 0.05%grosse great 0.05%geben to give 0.04%angebot offer 0.04%teil deal 0.04%erhalten to obtain 0.03%
0.53%
9 / 20
Our German Dictionary of Affect IV
The 10 most frequent “negative” terms (News corpus, 2006–2008)
Word GI equivalent frelgegen against 0.15%ende — 0.09%fall fall 0.05%streik strike 0.04%krise crisis 0.04%kosten cost 0.04%finanzkrise — 0.04%knapp short 0.03%streiks strike 0.03%trotz defiance 0.03%
0.54%
10 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
11 / 20
DAX 30
• Our study uses the DAX 301, that comprises the 30 largest andmost actively traded German companies, which are listed in theFrankfurt Stock Exchange
1Deutscher Aktien IndeX 30
12 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
13 / 20
Results: Stylised Variables I
Time series Min Max 104×Mean
DAX 30 -0.09 0.11 -2.03News Positive -1.82 1.66 1.65
News Negative -1.54 1.69 8.29Blogs Positive -2.2 2.08 5.34
Blogs Negative -2.18 2.48 16.18
14 / 20
Results: Stylised Variables II
Time series 102×Std Dev Skewness Kurtosis
DAX 30 1.61 0.15 11.14News Positive 28.21 -0.07 5.45
News Negative 35.44 -0.03 1.56Blogs Positive 45.02 -0.02 1.81
Blogs Negative 68.96 0.04 0.24
15 / 20
Outline
Preamble
A German Case StudyOur Corpus of News and BlogsOur German Dictionary of AffectDAX 30
ResultsStylised VariablesNon-Normal Distribution
Summary
16 / 20
Results: Non-Normal Distribution I
Probability distribution Normal DAXbetween (St. Dev.)
0 to 0.25 19.74% 33.46%0.25 to 0.5 18.55% 22.31%
0.5 to 1 29.98% 27.17%1 to 1.5 18.37% 10.24%1.5 to 2 8.81% 3.02%
2 to 3 4.28% 1.44%3+ 0.27% 2.36%
100% 100%
17 / 20
Results: Non-Normal Distribution II
Probability distribution Normal Newsbetween (St. Dev.) Positive Negative
0 to 0.25 19.74% 24.74% 22.77%0.25 to 0.5 18.55% 23.61% 20.41%
0.5 to 1 29.98% 30.01% 29.82%1 to 1.5 18.37% 11.67% 14.77%1.5 to 2 8.81% 4.70% 7.15%
2 to 3 4.28% 3.57% 4.14%3+ 0.27% 1.69% 0.94%
100% 100% 100%
18 / 20
Results: Non-Normal Distribution III
Probabiliy distribution Normal Blogsbetween Positive Negative
0 to 0.25 19.74% 19.53% 20.12%0.25 to 0.5 18.55% 19.88% 19.41%
0.5 to 1 29.98% 33.14% 31.12%1 to 1.5 18.37% 15.74% 15.74%1.5 to 2 8.81% 7.34% 8.76%
2 to 3 4.28% 3.55% 4.38%3+ 0.27% 0.83% 0.47%
100% 100% 100%
19 / 20
Summary
It has been shown that
• the distributions of returns of affect content in news and blogs arenot normal
• the returns, i.e. the changes, of affect content in German-language◦ news are higher than in the DAX◦ blogs are higher than in German-language news
• the volatility, i.e. the fluctuation of affect content inGerman-language◦ news is much higher than in the DAX◦ blogs is much higher than in German-language news
20 / 20
Summary
It has been shown that
• the distributions of returns of affect content in news and blogs arenot normal
• the returns, i.e. the changes, of affect content in German-language◦ news are higher than in the DAX◦ blogs are higher than in German-language news
• the volatility, i.e. the fluctuation of affect content inGerman-language◦ news is much higher than in the DAX◦ blogs is much higher than in German-language news
20 / 20
Summary
It has been shown that
• the distributions of returns of affect content in news and blogs arenot normal
• the returns, i.e. the changes, of affect content in German-language◦ news are higher than in the DAX◦ blogs are higher than in German-language news
• the volatility, i.e. the fluctuation of affect content inGerman-language◦ news is much higher than in the DAX◦ blogs is much higher than in German-language news
20 / 20