qap analysis of company co-mention networkru.discrete-mathematics.org/.../201805/waw/sidorov.pdfqap...
TRANSCRIPT
QAP Analysis of Company Co-Mention Network
S. P. Sidorov, A. R. Faizliev, V. A. Balash, A. A. Gudkov,
A. Z. Chekmareva, M. Levshunov, S. V. Mironov
Saratov State University, Saratov, Russian Federation
WAW-2018
The work was supported by grant RFBR (project 16-01-00507)
Summary 1
1 Introduction to News Analytics
2 Data Description
3 Empirical Results
1. Introduction to News Analytics
Changes in the World of Finance 2
1973 � International Monetary Market 24/7.
1973-2016 � the volume of foreign exchange trading has been growing morethan 100 times.
1992 � electronic trading, CME and Globex.
2001 � algorithmic trading, IBM.
2001 � news analytics tools.
News Analytics 3
The most well-known providers of news analytics and data are:
RavenPack ( http://www.ravenpack.com/).
Media Sentiment (www.mediasentiment.com/).
Thomson Reuters News Analytics (http://thomsonreuters.com).
News Analytics 4
News analytics can be described as a measurement of the following quantitative andqualitative characteristics of news:
1 The nature of news;
2 The impact of news;
3 The relevance;
4 The novelty.
News data can be obtained from various sources:
News sources of news agencies
Pre-news
social media (blogs, social networks, etc.)
News
expected;
unexpected.
Extract from Raven Pack data sheet 5
Figure : Extract from Raven Pack data
2. Company Co-Mention Network
Motivation 6
Company co-mention network � undirected weighted graph.
Two companies are considered in a relationship if there has been published apublicly available news mentioning them.
A company can be a node and the act of mentioning two companies in onenews can be considered as a link between them.
Motivation 7
Company co-mention network � undirected weighted graph.
Two companies are considered in a relationship if there has been published apublicly available news mentioning them.
A company can be a node and the act of mentioning two companies in onenews can be considered as a link between them.
Motivation 8
Company co-mention network � undirected weighted graph.
Two companies are considered in a relationship if there has been published apublicly available news mentioning them.
A company can be a node and the act of mentioning two companies in onenews can be considered as a link between them.
Motivation 9
Company co-mention network � undirected weighted graph.
Two companies are considered in a relationship if there has been published apublicly available news mentioning them.
A company can be a node and the act of mentioning two companies in onenews can be considered as a link between them.
Questions 10
In our research we would like to �nd answers to the following questions.
1 What type of the degree distribution exhibits the company co-mention network? What type of thefunctional form has the clustering-degree relation for the network?Many networks have similar degree distributions (Albert, 2002), (Dorogovtsev, 2002b), (Newman,2003), (Albert, 2005), (Boccaletti, 2006). It turned out that most of real networks have degreedistributions that are scale-free (Albert, 2002).
2 Does the analysis of company co-mention network identify groups? Hypothesis: the networkanalysis of the company co-mention network reproduces the sector structure of the economy. Eachgroup or cluster of companies might be associated with a particular sector, for example, banking,communication or oil production.
3 Do the frequency, degree centrality, closeness centrality, betweenness centrality and eigenvectorcentrality of companies vary within clusters?Within each group (or cluster) of companies, our research is going to �nd those who are morecentral than others. It is assumed that the more central a company is within a network, the morein�uential and more important must be news about it.
Our research questions are close to ones from the paper (Kim and Barnett, 2008) which investigatedwhether the patterns of author co-citation can describe the structure of the �eld of communication.
Questions 11
In our research we would like to �nd answers to the following questions.
1 What type of the degree distribution exhibits the company co-mention network? What type of thefunctional form has the clustering-degree relation for the network?Many networks have similar degree distributions (Albert, 2002), (Dorogovtsev, 2002b), (Newman,2003), (Albert, 2005), (Boccaletti, 2006). It turned out that most of real networks have degreedistributions that are scale-free (Albert, 2002).
2 Does the analysis of company co-mention network identify groups? Hypothesis: the networkanalysis of the company co-mention network reproduces the sector structure of the economy. Eachgroup or cluster of companies might be associated with a particular sector, for example, banking,communication or oil production.
3 Do the frequency, degree centrality, closeness centrality, betweenness centrality and eigenvectorcentrality of companies vary within clusters?Within each group (or cluster) of companies, our research is going to �nd those who are morecentral than others. It is assumed that the more central a company is within a network, the morein�uential and more important must be news about it.
Our research questions are close to ones from the paper (Kim and Barnett, 2008) which investigatedwhether the patterns of author co-citation can describe the structure of the �eld of communication.
Questions 12
In our research we would like to �nd answers to the following questions.
1 What type of the degree distribution exhibits the company co-mention network? What type of thefunctional form has the clustering-degree relation for the network?Many networks have similar degree distributions (Albert, 2002), (Dorogovtsev, 2002b), (Newman,2003), (Albert, 2005), (Boccaletti, 2006). It turned out that most of real networks have degreedistributions that are scale-free (Albert, 2002).
2 Does the analysis of company co-mention network identify groups? Hypothesis: the networkanalysis of the company co-mention network reproduces the sector structure of the economy. Eachgroup or cluster of companies might be associated with a particular sector, for example, banking,communication or oil production.
3 Do the frequency, degree centrality, closeness centrality, betweenness centrality and eigenvectorcentrality of companies vary within clusters?Within each group (or cluster) of companies, our research is going to �nd those who are morecentral than others. It is assumed that the more central a company is within a network, the morein�uential and more important must be news about it.
Our research questions are close to ones from the paper (Kim and Barnett, 2008) which investigatedwhether the patterns of author co-citation can describe the structure of the �eld of communication.
Questions 13
In our research we would like to �nd answers to the following questions.
1 What type of the degree distribution exhibits the company co-mention network? What type of thefunctional form has the clustering-degree relation for the network?Many networks have similar degree distributions (Albert, 2002), (Dorogovtsev, 2002b), (Newman,2003), (Albert, 2005), (Boccaletti, 2006). It turned out that most of real networks have degreedistributions that are scale-free (Albert, 2002).
2 Does the analysis of company co-mention network identify groups? Hypothesis: the networkanalysis of the company co-mention network reproduces the sector structure of the economy. Eachgroup or cluster of companies might be associated with a particular sector, for example, banking,communication or oil production.
3 Do the frequency, degree centrality, closeness centrality, betweenness centrality and eigenvectorcentrality of companies vary within clusters?Within each group (or cluster) of companies, our research is going to �nd those who are morecentral than others. It is assumed that the more central a company is within a network, the morein�uential and more important must be news about it.
Our research questions are close to ones from the paper (Kim and Barnett, 2008) which investigatedwhether the patterns of author co-citation can describe the structure of the �eld of communication.
Data 14
Company co-mention analysis can be carried out as follows:
1 full texts of all economic and �nancial news published during a period of timeare collected;
2 for each news full text, a list of companies it mentions is gathered;
3 for all set of available news, a weighted co-mention count is determined foreach pair of co-mentioned companies;
4 the weighted co-mention counts are accumulated into a symmetricco-mention matrix;
5 the co-mention matrix is analysed statistically, and the results are visualizedand interpreted.
Data 15
Company co-mention analysis can be carried out as follows:
1 full texts of all economic and �nancial news published during a period of timeare collected;
2 for each news full text, a list of companies it mentions is gathered;
3 for all set of available news, a weighted co-mention count is determined foreach pair of co-mentioned companies;
4 the weighted co-mention counts are accumulated into a symmetricco-mention matrix;
5 the co-mention matrix is analysed statistically, and the results are visualizedand interpreted.
Data 16
Company co-mention analysis can be carried out as follows:
1 full texts of all economic and �nancial news published during a period of timeare collected;
2 for each news full text, a list of companies it mentions is gathered;
3 for all set of available news, a weighted co-mention count is determined foreach pair of co-mentioned companies;
4 the weighted co-mention counts are accumulated into a symmetricco-mention matrix;
5 the co-mention matrix is analysed statistically, and the results are visualizedand interpreted.
Data 17
Company co-mention analysis can be carried out as follows:
1 full texts of all economic and �nancial news published during a period of timeare collected;
2 for each news full text, a list of companies it mentions is gathered;
3 for all set of available news, a weighted co-mention count is determined foreach pair of co-mentioned companies;
4 the weighted co-mention counts are accumulated into a symmetricco-mention matrix;
5 the co-mention matrix is analysed statistically, and the results are visualizedand interpreted.
Data 18
Company co-mention analysis can be carried out as follows:
1 full texts of all economic and �nancial news published during a period of timeare collected;
2 for each news full text, a list of companies it mentions is gathered;
3 for all set of available news, a weighted co-mention count is determined foreach pair of co-mentioned companies;
4 the weighted co-mention counts are accumulated into a symmetricco-mention matrix;
5 the co-mention matrix is analysed statistically, and the results are visualizedand interpreted.
Methodology 19
Table : News and its mentioned companies
c1 c2 c3 c4 c5 c6 c7
N1 + + +
N2 + +
N3 + + +
N4 + +
N5 + +
N6 + +
N7 + + +
N8 + +
Methodology 20
Table : The matrix of weights
c1 c2 c3 c4 c5 c6 c7
c1 0 3 1 0 0 0 0
c2 3 0 1 0 0 0 0
c3 1 1 0 1 0 0 0
c4 0 0 1 0 2 3 0
c5 0 0 0 2 0 2 0
c6 0 0 0 3 2 0 1
c7 0 0 0 0 0 1 0
Methodology 21
1
2
3
4
5
6
7
3
1
11
2 2
31
3. Data Description
Data 22
From February 1, 2015 to February 28, 2015 (i.e. 20 trading days)
Table : Descriptive statistics
δ, day 1
n 28
Sum 234736
Mean 8383.43
Minimum 262
Maximum 15069
St. deviation 5415,21
Median 10653
Daily dynamics of amount of news items 23
1 5 10 15 20 25 300
0.5
1
1.5
·104
days
Theamountofnew
sitem
sper
day
Figure : Daily dynamics of the news �ow in February, 2015
Dynamics of news �ow intensity (February 2, 2015) 24
0 516 1032 14400
50
100
150
200
250
min
Figure : Dynamics of news �ow intensity (February 2, 2015) with 1-min window
Empirical Results. Sectors of Economics25
All companies we considered were divided into four sectors of economy in following way:
Products (2007 companies) Consumer discretionary,Consumer staples,Industrials
Resources (1398 companies) Energy,Resources (1398 companies) Energy,Raw materials
Services (2375 companies) Financials,Health care,Telecommunications,Utilities
IT Sectors (548 companies) Information technology
Key company analysis according to sectors of economics 26Empirical Results. Key company analysis according to sectors of economics
Table : Companies with higher frequency for four sectors of economy.Sector: Products.
Key company analysis according to sectors of economics 27Empirical Results. Key company analysis according to sectors of economics
Table : Companies with higher frequency for four sectors of economy.Sector: Resources
Key company analysis according to sectors of economics 28Empirical Results. Key company analysis according to sectors of economics
Table : Companies with higher frequency for four sectors of economy.Sector: Services
Key company analysis according to sectors of economics 29Empirical Results. Key company analysis according to sectors of economics
Table : Companies with higher frequency for four sectors of economy.Sector: IT sector
30Empirical Results. Sectors of Economics
Table : Degree distribution for four sectors of economy
Subgraphs Degree exponent γ R2
Products 1.06 0.82
Resources 0.91 0.79Resources 0.91 0.79
Services 1.14 0.84
Information Technology 0.64 0.65
31Empirical Results. Sectors of Economics
Table : Local clustering coefficient for four sectors of the economy
Subgraphs Degree exponent γ R2
Products 0.74 0.58
Resources 0.44 0.46Resources 0.44 0.46
Services 0.70 0.61
Information Technology 0.61 0.54
Empirical Results. Stock Exchanges32
We considered separately companies that are trading on theDifferent stock exchanges.
Stock Exchanges Edges, % Co-mentions,%N 36.15 38.23O 21.54 20.02L 7.21 7.85
TO 8.25 7.58
Table : Edges and Co-mentions by SE
TO 8.25 7.58DE 3.79 4.03PA 3.76 3.99MI 1.97 2.16HK 0.66 0.63T 0.76 0.75
AS 1.41 1.39MC 1.01 1.02VX 1.1 1.21NS 0.7 0.4… … …
Other 6.45 10.73
33Empirical Results. Stock exchanges (SE) contiguity
SE N O L TO DE PA MI HK T Total
N 59.1 32.1 1.4 1.8 0.3 0.4 0.2 0.2 0.5 96.0O 53.8 38.3 1.2 1.8 0.2 0.3 0.2 0.2 0.4 96.3L
Table : Ratio of edges that connected vertices by SE, %
L 7.2 3.5 31.7 0.9 8.4 10.9 3.5 0.4 0.4 66.9TO 7.8 4.7 0.8 80.6 0.1 0.1 0.1 0.1 0.2 94.6DE 2.8 1.2 16.0 0.3 40.6 9.2 2.8 0.1 1.1 74.2PA 3.7 1.6 20.9 0.3 9.3 25.5 3.9 0.3 0.4 66.0MI 3.8 2.0 12.7 0.5 5.5 7.5 47.7 0.5 0.3 80.5HK 11.8 6.3 4.0 1.2 0.7 1.7 1.3 42.0 2.0 71.0T 21.7 10.0 3.6 2.3 5.3 2.2 0.9 1.7 31.7 79.4
34Empirical Results. Stock exchanges (SE) contiguity
SE N O L TO DE PA MI HK T Total
N 62.8 29.4 1.5 1.5 0.3 0.3 0.2 0.1 0.5 96.7O 56.1 37.5 1.0 1.4 0.2 0.2 0.2 0.2 0.3 97.0L
Table : Ratio of co-mentions (sum of edges weights) by SE, %
L 7.4 2.6 35.8 0.9 7.8 10.3 3.1 0.5 0.3 68.9TO 7.6 3.8 1.0 82.5 0.1 0.1 0.1 0.1 0.2 95.6DE 3.1 0.8 15.1 0.2 41.9 9.7 3.2 0.1 1.0 75.1PA 3.0 0.9 20.3 0.3 9.8 27.5 4.5 0.2 0.4 66.9MI 3.5 1.5 11.4 0.3 6.0 8.3 50.4 0.3 0.7 82.5HK 8.6 4.8 6.8 1.0 0.5 1.5 1.0 34.4 1.3 59.9T 25.2 9.1 2.7 2.4 5.2 2.2 2.0 1.1 35.4 85.3
35Empirical Results. Stock exchanges
Table : Degree distribution for three stock exchanges
Subgraphs Degree exponent γ R2
Precisely, we analyzed companies that traded on London SE (488 companies), New-York SE (1,715 companies) and Tokyo SE (586 companies).
Subgraphs Degree exponent γ R
London SE 1.10 0.81
New-York SE 1.18 0.84
Tokyo SE 1.18 0.44
36Empirical Results. Stock exchanges
Table : Local clustering coefficient for three stock exchanges
Subgraphs Exponent β R2
Precisely, we analyzed companies that traded on London SE (488 companies), New-York SE (1,715 companies) and Tokyo SE (586 companies).
London SE 0.68 0.55
New-York SE 0.68 0.74
Tokyo SE 0.78 0.74
37Empirical Results. Degree distribution
Figure : The degree distribution of New-York companies co-mention network
Networked Map of Companies 38Networked Map of Companies of economics
Figure : Networked Map of Companies
Networked Map of Companies 39QAP Correlation and Regression Analysis
We use QAP Correlation Analysis to identify correlations: between the co-mention network and companies’ sector affiliation; between the co-mention network and stock exchange affiliation.
Sector affiliation network
Stock Exchanges network
Table : Quadratic Assignment Procedure. Correlation and p-values, 500 replication
network network
Co-mentions network r=0.053(0.000)
r=0.020(0.000)
Networked Map of Companies 40QAP Correlation and Regression Analysis
Figure : Estimated density after QAP replication for sector affiliation network
Networked Map of Companies 41QAP Correlation and Regression Analysis
We use QAP Correlation Analysis to identify correlations: between the co-mention network and companies’ sector affiliation; between the co-mention network and stock exchange affiliation.
Coefficients: Estimate P-value
Table : QAP Regression analysis
Intercept 0.003 0.000
Sector affiliation 0.015 0.000
Stock exchange affiliation 0.033 0.00
Networked Map of Companies 42QAP Correlation and Regression Analysis
Autocorrelation for co-mention matrix time series
Month January February March April May
Table : Correlation coefficient for subgraps of 200 largest companies
January 1.00
February 0.65 1.00
March 0.53 0.62 1.00
April 0.67 0.58 0.52 1
May 0.42 0.49 0.55 0.61 1
Networked Map of Companies 43QAP Correlation and Regression Analysis
ijij
ijij
SametC
SametC
ije
ety
210
210
)(
)(
1)1(
QAP Logit model
C (t) – # of Co-mentions at time t;Cij(t) – # of Co-mentions at time t;Sameij – Same Stock Exchange.
Networked Map of Companies 44QAP Correlation and Regression Analysis
Coefficients: Estimate P-value
Table : QAP Logit model (Subgraphs of the 500 Largest Companies)
Intercept -2.378 0.000
Co-mentions (January) 0.198 0.000
Same Stock Exchange 0.5983507 0.00
. 45
THANKSTHANKS