1 applications of news analytics in finance: a review...leela mitra and gautam mitra abstract a...

40
Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter. In particular, we review the multiple facets of current research and some of the major applications. It is widely recognized news plays a key role in financial markets. The sources and volumes of news continue to grow. New technologies that enable automatic or semi-automatic news collection, extraction, aggregation and categorization are emerging. Further machine-learning techniques are used to process the textual input of news stories to determine quantitative sentiment scores. We consider the various types of news available and how these are processed to form inputs to financial models. We report applications of news, for prediction of abnormal returns, for trading strategies, for diagnostic applications as well as the use of news for risk control. 1.1 INTRODUCTION News (north, east, west, south) streams in from all parts of the globe. There is a strong yet complex relationship between market sentiment and news. The arrival of news continually updates an investor’s understanding and knowledge of the market and influences investor sentiment. There is a growing body of research literature that argues media influences investor sentiment, hence asset prices, asset price volatility and risk (Tetlock, 2007; Da, Engleberg, and Gao, 2009; Barber and Odean, this volume, Chapter 7; diBartolomeo and Warrick, 2005; Mitra, Mitra, and diBartolomeo, 2009; Dzielinski, Rieger, and Talpsepp, this volume, Chapter 11). Traders and other market participants digest news rapidly, revising and rebalancing their asset positions accordingly. Most traders have access to newswires at their desks. As markets react rapidly to news, effective models which incorporate news data are highly sought after. This is not only for trading and fund management, but also for risk control. Major news events can have a significant impact on the market environment and investor sentiment, resulting in rapid changes to the risk structure and risk characteristics of traded assets. Though the relevance of news is widely acknowledged, how to incorporate this effectively, in 1 Applications of news analytics in finance: A review The Handbook of News Analytics in Finance Edited by L. Mitra and G. Mitra 2011 John Wiley & Sons # COPYRIGHTED MATERIAL

Upload: others

Post on 08-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

Leela Mitra and Gautam Mitra

ABSTRACT

A review of news analytics and its applications in finance is given in this chapter. Inparticular, we review the multiple facets of current research and some of the majorapplications. It is widely recognized news plays a key role in financial markets. Thesources and volumes of news continue to grow. New technologies that enable automaticor semi-automatic news collection, extraction, aggregation and categorization areemerging. Further machine-learning techniques are used to process the textual inputof news stories to determine quantitative sentiment scores. We consider the varioustypes of news available and how these are processed to form inputs to financial models.We report applications of news, for prediction of abnormal returns, for tradingstrategies, for diagnostic applications as well as the use of news for risk control.

1.1 INTRODUCTION

News (north, east, west, south) streams in from all parts of the globe. There is a strongyet complex relationship between market sentiment and news. The arrival of newscontinually updates an investor’s understanding and knowledge of the market andinfluences investor sentiment. There is a growing body of research literature that arguesmedia influences investor sentiment, hence asset prices, asset price volatility and risk(Tetlock, 2007; Da, Engleberg, and Gao, 2009; Barber and Odean, this volume, Chapter7; diBartolomeo and Warrick, 2005; Mitra, Mitra, and diBartolomeo, 2009; Dzielinski,Rieger, and Talpsepp, this volume, Chapter 11). Traders and other market participantsdigest news rapidly, revising and rebalancing their asset positions accordingly. Mosttraders have access to newswires at their desks. As markets react rapidly to news,effective models which incorporate news data are highly sought after. This is not onlyfor trading and fund management, but also for risk control. Major news events can havea significant impact on the market environment and investor sentiment, resulting inrapid changes to the risk structure and risk characteristics of traded assets. Though therelevance of news is widely acknowledged, how to incorporate this effectively, in

1

Applications of news analytics in finance:

A review

The Handbook of News Analytics in Finance Edited by L. Mitra and G. Mitra2011 John Wiley & Sons#

COPYRIG

HTED M

ATERIAL

Page 2: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

quantitative models and more generally within the investment decision-making process,is a very open question.

In considering how news impacts markets, Barber and Odean (this volume, Chapter7) note ‘‘significant news will often affect investors’ beliefs and portfolio goals hetero-geneously, resulting in more investors trading than is usual’’ (high trading volume). It iswell known that volume increases on days with information releases (Bamber, Barron,and Stober 1997; Karpoff, 1987; Busse and Green, 2004). Important news frequentlyresults in large positive or negative returns. Ryan and Taffler (2002) find for large firms asignificant portion (65%) of large price changes and volume movements can be linked topublicly available news releases. Sometimes investors may find it difficult to interpretnews resulting in high trading volume without significant price change.

Financial news can be split into regular synchronous announcements (expected news)and event-driven asynchronous announcements (unexpected news). Textual news isfrequently unstructured, qualitative data. It is characterized as being non-numericand hard to quantify. Unlike analysis based on quantified market data, textual newsdata contain information about the effect of an event and the possible causes of an event.It is natural to expect that the application of these news data will lead to improvedanalysis (such as predictions of returns and volatility). However, extracting this informa-tion in a form that can be applied to the investment decision-making process is extremelychallenging.

News has always been a key source of investment information. The volumes andsources of news are growing rapidly. In increasingly competitive markets investors andtraders need to select and analyse the relevant news, from the vast amounts available tothem, in order to make ‘‘good’’ and timely decisions. A human’s (or even a group ofhumans’) ability to process this news is limited. As computational capacity grows,technologies are emerging which allow us to extract, aggregate and categorize largevolumes of news effectively. Such technology might be applied for quantitative modelconstruction for both high-frequency trading and low-frequency fund rebalancing.Automated news analysis can form a key component driving algorithmic tradingdesks’ strategies and execution, and the traders who use this technology can shortenthe time it takes them to react to breaking stories (that is, reduce latency times).News Analytics (NA) technology can also be used to aid traditional non-quantitativefund managers in monitoring the market sentiment for particular stocks, companies,brands and sectors. These technologies are deployed to automate filtering, monitoringand aggregation of news. These technology aids free managers from the minutiaeof repetitive analysis, such that they are able to better target their reading andresearch. These technologies reduce the burden of routine monitoring for fundamentalmanagers.

The basic idea behind these NA technologies is to automate human thinking andreasoning. Traders, speculators and private investors anticipate the direction of assetreturns as well as the size and the level of uncertainty (volatility) before making aninvestment decision. They carefully read recent economic and financial news to gain apicture of the current situation. Using their knowledge of how markets behaved in thepast under different situations, people will implicitly match the current situation withthose situations in the past most similar to the current one. News analytics seeks tointroduce technology to automate or semi-automate this approach. By automating thejudgement process, the human decision maker can act on a larger, hence more diversi-

2 The Handbook of News Analytics in Finance

Page 3: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

fied, collection of assets. These decisions are also taken more promptly (reducinglatency). Automation or semi-automation of the human judgement process widensthe limits of the investment process. Leinweber (2009) refers to this process asintelligence amplification (IA).As shown in Figure 1.1 news data are an additional source of information that can be

harnessed to enhance (traditional) investment analysis. Yet it is important to recognizethat NA in finance is a multi-disciplinary field which draws on financial economics,financial engineering, behavioural finance and artificial intelligence (in particular,natural language processing). Expertise in these respective areas needs to becombined effectively for the development of successful applications in this area. Sophis-ticated machine-learning algorithms applied without an understanding of thestructure and dynamics of financial markets and the use of realistic trading assumptionscan lead to applications with little commercial use (see Mittermayer and Knolmayer,2006).The remainder of the chapter is organized as follows. In Section 1.2 we consider the

different sources of news and information flows which can be applied for updating(quantitative) investor beliefs and knowledge. Section 1.2.2 covers several aspects ofpre-analysis to be considered when using news in trading systems and quantitativemodels. In Section 1.3 we consider how qualitative text can be converted to quantifiedmetrics which can form inputs to quantitative models. In Section 1.4 we present news-based models; in particular, we consider the computational architecture (Section 1.4.1),applications for trading and fund management (Section 1.4.2) and applications for

Applications of news analytics in finance: A review 3

Figure 1.1. A simple representation of news analytics in financial decision making.

Page 4: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

risk management (Section 1.4.3). In Section 1.4.4 desirable industry applications areoutlined. The summary conclusions are presented in Section 1.5.

1.2 NEWS DATA

1.2.1 Data sources

In this section we consider the different sources of news and information flows which canbe applied for updating (quantitative) investor beliefs and knowledge. Leinweber (2009)distinguishes four broad classifications of news (informational flows).

1. News This refers to mainstream media and comprises the news stories producedby reputable sources. These are broadcast via newspapers, radio and television.They are also delivered to traders’ desks on newswire services. Online versions ofnewspapers are also progressively growing in volume and number.

2. Pre-news This refers to the source data that reporters research before they writenews articles. It comes from primary information sources such as Securities andExchange Commission reports and filings, court documents and governmentagencies. It also includes scheduled announcements such as macroeconomic news,industry statistics, company earnings reports and other corporate news.

3. Rumours These are blogs and websites that broadcast ‘‘news’’ and are lessreputable than news and pre-news sources. The quality of these vary significantly.Some may be blogs associated with highly reputable news providers and reporters(for example, the blog of BBC’s Robert Peston). At the other end of the scale someblogs may lack any substance and may be entirely fueled by rumour.

4. Social media These websites fall at the lowest end of the reputation scale. Barriersto entry are extremely low and the ability to publish ‘‘information’’ easy. These canbe dangerously inaccurate sources of information. However, if carefully applied(with consideration of human behaviour and agendas) there may be some value tobe gleaned from these. At a minimum they may help us identify future volatility.

Individual investors pay relatively more attention to the second two sources of newsthan institutional investors (Dzielinski, Rieger, and Talpsepp, this volume, Chapter 11;Das and Chen, 2007). Information from the web may be less reliable than mainstreamnews. However, there may be ‘‘collective intelligence’’ information to be gleaned. Thatis, if a large group of people have no ulterior motives, then their collective opinion maybe useful (Leinweber, 2009, Ch. 10). The SEC does monitor message boards. So there issome, though perhaps far from perfect, checking of information published. This shouldconstrain message board posters actions to some extent.

There are services which facilitate retrieval of news data from the web. For example,Google Trends is a free but limited service which provides an historical weekly time-series of the popularity of any given search term. This search engine reports theproportion of positive, negative and neutral stories returned for a given search.

The Securities and Exchange Commission (SEC) provides a lot of useful pre-news.It covers all publicly traded companies (in the US). The Electronic Data Gathering,Analysis and Retrieval (EDGAR) system was introduced in 1996 giving basic access tofilings via the web (see http://www.sec.gov/edgar.shtml). Premium accessgave tools for analysis of filing information and priority earlier access to the data. In

4 The Handbook of News Analytics in Finance

Page 5: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

2002 filing information was released to the public in real time. Filings remain unstruc-tured text files without semantic web and XML output, though the SEC are in theprocess of upgrading their information dissemination. High-end resellers electronicallydissect and sell on relevant component parts of filings. Managers are obliged to disclosea significant amount of information about a company via SEC filings. This informationis naturally valuable to investors. Leinweber introduces the term ‘‘molecular search: theidea of looking for patterns and changes in groups of documents.’’ Such analysis/information are scrutinized by researchers/ analysts to identify unusual corporateactivity and potential investment opportunities. However, mining the large volume offilings, to find relationships, is challenging. Engleberg and Sankaraguruswamy (2007)note the EDGAR database has 605 different forms and there were 4; 249; 586 filingsbetween 1994 and 2006. Connotate provides services which allows customized auto-mated collection of SEC filing information for customers (fund managers and traders).Engleberg and Sankaraguruswamy (2007) consider how to use a web crawler to mineSEC filing information through EDGAR.As stated in Section 1.1, financial news can be split into regular synchronous

announcements (scheduled or expected news) and event-driven asynchronous announce-ments (unscheduled or unexpected news). Mainstream news, rumours, and social medianormally arrive asynchronously in an unstructured textual form. A substantial portionof pre-news arrives at pre-scheduled times and generally in a structured form.Scheduled (news) announcements often have a well-defined numerical and textual

content and may be classified as structured data. These include macroeconomicannouncements and earnings announcements. Macroeconomic news, particularly eco-nomic indicators from the major economies, is widely used in automated trading. It hasan impact in the largest and most liquid markets, such as foreign exchange, governmentdebt and futures markets. Firms often execute large and rapid trading strategies. Thesenews events are normally well documented, thus thorough backtesting of strategies isfeasible. Since indicators are released on a precise schedule, market participants can bewell prepared to deal with them. These strategies often lead to firms fighting to be first tothe market; speed and accuracy are the major determinants of success. However, thetechnology requirements to capitalize on events is substantial. Content publishers oftenspecialize in a few data items and hence trading firms often multisource their data.Thomson Reuters, Dow Jones, and Market News International are a few leadingcontent service providers in this space.Earnings are a key driving force behind stock prices. Scheduled earnings

announcement information is also widely anticipated and used within trading strategies.The pace of response to announcements has accelerated greatly in recent years (seeLeinweber, 2009, pp. 104–105). Wall Street Horizon and Media Sentiment (see Munz,2010) provide services in this space. These technologies allow traders to respond quicklyand effectively to earnings announcements.Event-driven asynchronous news streams in unexpectedly over time. These news items

usually arrive as textual, unstructured, qualitative data. They are characterized as beingnon-numeric and difficult to process quickly and quantitatively. Unlike analysis basedon quantified market data, textual news data contain information about the effect of anevent and the possible causes of an event. However, to be applied in trading systems andquantitative models they need to be converted to a quantitative input time-series. Thiscould be a simple binary series where the occurrence of a particular event or the

Applications of news analytics in finance: A review 5

Page 6: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

publication of a news article about a particular topic is indicated by a one and theabsence of the event by a zero. Alternatively, we can try to quantify other aspects ofnews over time. For example, we could measure news flow (volume of news) or we coulddetermine scores (measures) based on the language sentiment of text or determine scores(measures) based on the market’s response to particular language.

It is important to have access to historical data for effective model development andbacktesting. Commercial news data vendors normally provide large historical archivesfor this purpose. The details of historic news data for global equities provided byRavenPack and Thomson Reuters NewsScope are summarized in Section 1.A (theappendix on p. 25). In the appendix we have summarized some essential informationtaken from the RavenPack News Analytics—Dow Jones Edition (RavenPack, 2010)and Thomson Reuters NewsScope Sentiment Engine (Thomson Reuters, 2009).

1.2.2 Pre-analysis of news data

Collecting, cleaning and analysing news data is challenging. Major news providerscollect and translate headlines and text from a wide range of worldwide sources. Forexample, the Factiva database provided by Dow Jones holds data from 400 sourcesranging from electronic newswires, newspapers and magazines.

We note there are differences in the volume of news data available for differentcompanies. Larger companies (with more liquid stock) tend to have higher newscoverage/news flow. Moniz, Brar, and Davis (2009) observe that the top quintileaccounts for 40% of all news articles and the bottom quintile for only 5%. Cahan,Jussa, and Luo (2009) also find news coverage is higher for larger cap companies (seeFigure 1.2).

Classification of news items is important.Major newswire providers tag incoming newsstories. A reporter entering a story on to the news systems will often manually tag it with

6 The Handbook of News Analytics in Finance

Figure 1.2. Number of news items vs. log market capitalization (taken from Cahan, Jussa, and

Luo, 2009).

Page 7: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

relevant codes. Further, machine-learning algorithms may also be applied to identifyrelevant tags for a story. These tags turn the unstructured stories into a basic machine-readable form. The tags are often stored in XML format. They reveal the story’s topicareas and other important metadata. For example, they may include information aboutwhich company a story is about. Tagged stories held by major newswire providers arealso accurately time-stamped. The SEC is pushing to have companies file their reportsusing XBRL (eXtensible Business Reporting Language). Rich Site Summary (RSS)feeds (an XML format for web content) allow customized, automated analysis of newsevents from multiple online sources.Tagged news stories provide us with hundreds of different types of events, so that we

can effectively use these stories. We need to distinguish what types of news are relevantfor a given model (application). Further, the market may react differently to differenttypes of news. For example, Moniz, Brar, and Davis (2009) find the market seems toreact more strongly to corporate earnings-related news than corporate strategic news.They postulate that it is harder to quantify and incorporate strategic news into valuationmodels, hence it is harder for the market to react appropriately to such news.Machine-readable XML news feeds can turn news events into exploitable trading

signals since they can be used relatively easily to backtest and execute event study-basedstrategies (see Kothari and Warner, 2005; Campbell, Lo, and MacKinlay, 1996 for in-depth reviews of event study methodology). Leinweber (this volume, Chapter 6) usesThomson Reuters tagged news data to investigate several news-based event strategies.Elementized news feeds mean the variety of event data available is increasing signifi-cantly. News providers also provide archives of historic tagged news which can be usedfor backtesting and strategy validation. News event algorithmic trading is reported to begaining acceptance in industry (Schmerken, 2006).To apply news effectively in asset management and trading decisions we need to be

able to identify news which is both relevant and current. This is particularly true forintraday applications, where algorithms need to respond quickly to accurate informa-tion. We need to be able to identify an ‘‘information event’’; that is, we need to be able todistinguish those stories which are reporting on old news (previously reported stories)from genuinely ‘‘new’’ news. As would be expected, Moniz, Brar, and Davis (2009) findmarkets react strongly when ‘‘new’’ news is released.Tetlock, Saar-Tsechansky, and Macskassy (2008) undertake an event study which

illustrates the impact of news on cumulative abnormal returns (CARs). They use350,000 news stories about S&P 500 companies appearing in the Wall Street Journaland Dow Jones News Service from 1984 to 2004. Each story’s (language) sentiment isdetermined using the General Inquirer and a story is classified as either positive ornegative. The CARs for each story classification type relative to the date of thenews release are shown in Figure 1.3. There seems to be a connection between anews story’s release and CARs. However, there also seems to be some ‘‘informationleakage’’ since CARs seem to react before the date of the story’s release. Leinweber(2009) considers that this may be due to the inclusion of me-too stories that refer backto an original release of ‘‘new’’ news. This highlights that, though textual news mayhave an obvious connection with returns, it needs to be processed carefully andeffectively.In order to deal with potential noise, Reuters identifies relevance scores for different

news articles. Such scores measure how pertinent an article is to a particular company

Applications of news analytics in finance: A review 7

Page 8: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

and helps prevent erroneous links between stories and entities. In particular, afterfiltering by relevance as measured by RavenPack, Hafez (2009a) obtains a 3� improve-ment in correlations between a calculated market sentiment measure and out-of-samplereturns. Both Reuters and RavenPack include measures for article novelty (uniqueness)which determines repetition among articles and how many similar articles there are for aparticular company. In addition, RavenPack (2010) measures event novelty based onmore than 200 event categories that are automatically detected in the news. This allowsthe user to consider not only the first instance of a company event but also to measurehow much media attention it receives.

Several studies also report strong seasonality in news flow at hourly, daily and weeklyfrequencies (Lo, 2008; Hafez, 2009b; Moniz, Brar, and Davis, 2009). A valuable aspectof pre-analysis of news data is to identify periods of unexpected news flow levels, fromperiods of variation due to seasonality, in order to identify periods where significantlevels of information are flowing into the market. Hafez (2009b) investigates the season-ality patterns of news arrival. Figures 1.4 and 1.5 show the intraday pattern. He notesthat larger volumes of news flow arrive just before the opening of the European, US, andAsian trading sessions. On the intra-week level we can see little news flow takes place atthe weekends. During the week, the peak of news flow occurs on Wednesday andThursday, while the trough falls on Friday. Lo also notes that the median number ofweekday Reuters news alerts is usually between 1,500 and 2,000, while the median forthe entire weekend drops to around 130.

The time of the day when news is released has also been found to be relevant inunderstanding the connection between market variables and news. Robertson, Geva,

8 The Handbook of News Analytics in Finance

Figure 1.3. CARs start to respond several days before relevant news is published.

Page 9: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

and Wolff (2006) find that there is a greater likelihood of events that lead to risingvolatility at the start of the day. Boyd, Hu, and Jagannathan (2005) find that marketconditions can influence the types of news that are reported. They report that interestrate information dominates in expansionary periods. In contrast, information aboutfuture corporate dividends dominates when the markets are contracting.As would be expected the informational content of news has a large influence on how

markets react to news (Blasco et al., 2005; Boyd, Hu, and Jagannathan, 2005; Liang,2005; Tetlock, 2007). We discuss how to extract the informational content of news (thatis, the sentiment) in Section 1.3. It has been recognized that stock returns react morestrongly to ‘‘negative’’ news than ‘‘positive’’ (Tetlock, 2007). There also tends to be apositive sentiment bias; that is, there is a larger volume of ‘‘positive’’ news to ‘‘negative’’news. Das and Chen (2007) find that a histogram of normalized stock message boardsentiment is positively skewed. There are days when messages about a stock are ex-tremely optimistic but there is not a similar level of expression of pessimistic views.RavenPack (2010) also find a positive sentiment bias in company-specific news. This biasis more marked in bull markets than bear markets. They report a ratio of 2 : 1 of positivesentiment to negative sentiment stories in bull markets.The relationship between different news stories is also an important consideration.

Companies may make several announcements that fall under different classificationson the same day. These may or may not be related and may be related to varyingdegrees. For example, a company may announce a profit warning, resignation of itsCEO and provide guidance on its sales outlook. The dependence or independencebetween different news stories is a consideration.

Applications of news analytics in finance: A review 9

Figure 1.4. Seasonality—intraday pattern.

Page 10: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

1.3 TURNING QUALITATIVE TEXT INTO QUANTIFIED

METRICS AND TIME-SERIES

A salient aspect of news analysis is to discover the informational content of news.Converting qualitative text into a machine-readable form is a challenging task. Wemay wish to distinguish whether a story’s informational content is positive or negative;that is, determine its sentiment. We may go further and try to identify ‘‘by how much’’the story is positive or negative. In doing this we may try to assign a quantified sentimentscore or index to each story. A major difficulty in this process is identifying the context inwhich a story’s language is to be judged. Sentiment may be defined in terms of howpositively or negatively a human (or group of humans) interprets a story; that is, theemotive content of the story for that human. In particular, standards can be definedusing experts to classify stories. Some of RavenPack’s classifiers are calibrated usinglanguage training sets developed by finance experts. Further, dictionary-based algo-rithms which use psychology-based interpretations of words may be used. Since differentgroups of people are affected by events differently and have different interpretations ofthe same events, conflicts may arise. Moniz, Brar, and Davis (2009) gives an example ofthe term ‘‘dividend cuts’’. This may be classified as a negative term by a dictionary-basedalgorithm. In contrast, it may be interpreted positively by market analysts who maybelieve this indicates the company is saving money and is better positioned to repay itsdebts. Loughran and McDonald (forthcoming) also consider how context affects inter-pretation of the tone of text. They note a psychological dictionary like the Harvard-IV-4may classify words as negative when they do not have a negative financial meaning.They develop an alternative negative word list that better reflects the tone of financialtext.

10 The Handbook of News Analytics in Finance

Figure 1.5. Seasonality—intraweek pattern.

Page 11: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

An attractive alternative is to use market-based measures to interpret and define theimportance of news. The markets’ relative change in returns or volatility for a particularasset or asset class, lagged against a relevant news story, can be used to define thesentiment (informational content) of the news story. This approach intrinsically assumesthat the market has responded to the news story. Lo (2008) uses this approach forcreating the Reuters Newscope Event Indices. He creates separate indices for marketresponses to news, in terms of (i) returns and (ii) volatility. So he assumes that sentimentmeasured in the context of these two variables is different. This approach is quitepragmatic and is focused on using the news content directly in the context that themodeller is interested in. Lavrenko et al. (2000), Moniz, Brar, and Davis (2009),Peramunetilleke and Wong (2002) and Luss and d’Aspremont (2009) also usemarket-based measures in determining the ‘‘sentiment’’ of news. SemLab (seeVreijling/SemLab, 2010) provides a tool which allows the user to filter news itemsand examine each item’s impact on market variables. Using this interactive tool, theuser is able to define their own tailored context of ‘‘sentiment’’.Given a definition of sentiment, machine learning and natural language techniques are

frequently used to determine the sentiment of new incoming stories. Hence we candetermine sentiment scores over time as news arrives. Such sentiment scores then allowus to develop systematic investment and risk management processes. Linking thesesentiment scores to the asset returns, trading volumes and volatility or, in other words,discovering the connection between news analysis and the financial analytics and thefinancial analytics models is a leading challenge in this domain of application.The definition of market sentiment is very much context-dependent. In general, we are

interested in discovering the ‘‘informational content of news’’. In this review chapter, forthe purpose of (quantitative) modelling applications, we use the two terms ‘‘newssentiment’’ and ‘‘informational content of news’’ interchangeably, and in this sectionwe discuss some of the leading methods of computing/quantifying ‘‘sentiment’’ andother related measures.We review below Das and Chen (2007) and Lo (2008). The former uses natural

language processing and machine learning whereas the latter applies a market-basedmeasure. Both papers cover the following items:

1. A definition of the context of sentiment.2. Application of algorithms (natural language, machine learning, and linear

regression) to calibrate and define sentiment scores.3. Validation of the effectiveness of the scores by comparing their relationship with

relevant asset returns, volumes or volatility.

Das and Chen (2007) use statistical and natural language techniques to extract investorsentiment from stock message boards and generate sentiment indices. They apply theirmethod for 24 technology stocks present in the Morgan Stanley High Tech (MSH)Index. A web scraper program is used to download tech sector message board messages.Five algorithms, each with different conceptual underpinnings, are used to classify eachmessage. A voting scheme is then applied to all five classifiers.Three supplementary databases are used in classification algorithms.

1. Dictionary is used for determining the nature of the word. For example, is it a noun,adjective or adverb?

Applications of news analytics in finance: A review 11

Page 12: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

2. Lexicon is a collection of hand-picked finance words which form the variables forstatistical inference within the algorithms.

3. Grammar is the training corpus of base messages used in determining in-samplestatistical information. This information is then applied for use on out-of-samplemessages.

The lexicon and grammar jointly determine the context of the sentiment. Each of theclassifiers relies on a different approach to message interpretation. They are all analytic,hence computationally efficient.

1. Naive classifier (NC) is based on a word count of positive and negative connotationwords. Each word in the lexicon is identified as being positive, negative or neutral.A parsing algorithm negates words if the context requires it. The net word count ofall lexicon-matched words is taken. If this value is greater than one, we sign themessage as a buy. If the value is less than one the message is a sell. All others areneutral.

2. Vector distance classifier Each of the D words in the lexicon is assigned adimension in vector space. The full lexicon then represents a D-dimensional unithypercube and every message can be described as a word vector in this space(m 2 <D). Each hand-tagged message in the training corpus (grammar) is convertedinto a vector Gj (grammar rule). Each (training) message is pre-classified as positive,negative or neutral. We note that Das and Chen use the terms Buy/Positive,Sell/Negative, and Neutral/Null interchangably. Each new message is classifiedby comparison with the cluster of pre-trained vectors (grammar rules) and isassigned the same classification as that vector with which it has the smallest angle.This angle gives a measure of closeness.

3. Discriminant-based classification NC weights all words within the lexicon equally.The discriminant-based classification method replaces this simple word count with aweighted word count. The weights are based on a simple discriminant function(Fisher Discriminant Statistic). This function is constructed to determine how wella particular lexicon word discriminates between the different message categories({Buy; Sell;Null}). The function is determined using the pre-classified messageswithin the grammar. Each word in a message is assigned a signed value, basedon its sign in the lexicon multiplied by the discriminant value. Then, as for NC, a netword count is taken. If this value is greater than�0.01, we sign the message as a buy.If the value is less than �0.01 the message is a sell. All others are neutral.

4. Adjective–adverb phrase classifier is based on the assumption that phrases which useadjectives and adverbs emphasize sentiment and require greater weight. This classi-fier also uses a word count but uses only those words within phrases containingadjectives and adverbs. A ‘‘tagger’’ extracts noun phrases with adjectives andadverbs. A lexicon is used to determine whether these significant phrases indicatepositive or negative sentiment. The net count is again considered to determinewhether the message has negative or positive overall sentiment.

5. Bayesian Classifier is a multivariate application of Bayes Theorem. It uses theprobability a particular word falls within a certain classification and is henceindifferent to the structure of language. We consider three categoriesC ¼ 3 ci i ¼ 1; . . . ;C. Denote each message mj j ¼ 1; . . . ;M. The set of lexicalwords is F ¼ fwkgDk¼1. The total number of lexical words is D. We can determine a

12 The Handbook of News Analytics in Finance

Page 13: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

count of the number of times each lexical item appears in each message nðmj ;wkÞ.Given the class of each message in the training set we can determine the frequencywith which a lexical word appears in a particular class. We are then able to computethe conditional probability of an incoming message j falling in category i, PrðmjjciÞ,from word-based frequencies. PrðciÞ is set to the proportion of messages in thetraining set classified in class ci. For a new message we are able to compute theprobability it falls within class ci given its component lexicon words, that is PðcijmjÞ,through an application of Bayes Theorem. The message is classified as being fromthe category with the highest probability.

A voting scheme is then applied to all five classifiers. The final classification is based onachieving a majority amongst the five classifiers. If there is no majority the message isnot classified. This reduces the number of messages classified but enhances classificationaccuracy.Das and Chen also introduce a method to detect message ambiguity. Messages posted

on stock message boards are often highly ambiguous. The grammar is often poor andmany of the words do not appear in standard dictionaries. They note ‘‘Ambiguity isrelated to the absence of ‘aboutness’.’’ The General Inquirer has been developed byHarvard University for content analyses of textual data and has been applied todetermine an independent optimism score for each message. By using a differentdefinition of sentiment it is ensured there is no bias to a particular algorithm. Theoptimism score is the difference between the number of optimistic and pessimistic wordsas a percentage of the total words in the body of the text. This score allows us to rank therelative sentiment of all stories within a classification group. For example, we can rankthe relative optimism of all stories which have been classified by their scheme as positive.The mean and standard deviation of the optimism score for different classification types({Buy; Sell;Null}) can be calculated. They filter in and consider only optimisticallyscored stories in the positive category. For example, only those stories with optimismscores above the mean value plus one standard deviation are considered. Similarly, theyfilter in and consider only the most highly pessimistic scores in the negative category.Once the classified stories are further filtered for ambiguity, it is found that the numberof false positives dramatically decline.After the sentiment for each message is determined using the voting algorithm, a daily

sentiment index is compiled. The classified messages up to 4 pm each day are used tocreate the aggregate daily sentiment for each stock. A buy (sell) message increments(decrements) the index by one. These indices are further aggregated across all stocks toobtain an aggregate sentiment for the technology portfolio. A disagreement measure isalso constructed

DISAG ¼���1�

���B� S

Bþ S

���

��� ð1:1Þ

B (S) is the number of buy (sell) messages. This measure lies between 0 (no disagreement)and 1 (high disagreement) and is computed as a daily time-series. The daily MSH indexand component stock values are also collected. In addition, trading volatility andvolume of stocks are calculated and message volume recorded. All the time-series arenormalized.

Applications of news analytics in finance: A review 13

Page 14: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

Das and Chen check that the constructed sentiment indices have a relationship withrelevant asset variables. The relationship between the MSH index and the aggregatesentiment index is investigated. The authors plot the two against each other and showthat these two series do seem to track each other. The sentiment index is found to behighly autocorrelated out to two trading weeks. Regression analysis is undertaken toinvestigate the relationship. They conclude sentiment does offer some explanatorypower for the level of the index. However, autocorrelation makes it difficult to establishthe empirical nature of the relationship.

Das and Chen undertake regression analysis between the individual stock level andthe individual stock sentiment level and find there is a significant relationship (thet-statistic of the coefficient falls within a significant level). The relationship betweenfirst differences is much weaker. We cannot conclude there is a strong predictive abilityon forecasting individual stock returns. Sentiment and stock levels are not unrelated, butdetermining the precise nature of the relationship is difficult.

The authors also provide a graphical display of the relationship between the sentimentmeasure, disagreement measure, message volume, trading volume and volatility.Sentiment is inversely related to disagreement. As disagreement increases, sentimentfalls. Sentiment is correlated to high posting volume. As discussion increases, thisindicates optimism about that stock is rising. There is a strong relationship betweenmessage volume and volatility. This is consistent with Antweiler and Frank (2004).Trading volume and volatility are strongly related to each other.

Lo (2008) develops the Reuters NewsScope Event Indices (NEIs) which areconstructed to have ‘‘predictive’’ power for particular asset returns and (realized)volatility. They are constructed in an integrated framework where news, returns andvolatility are used in calibrating the indices. The white paper (dated November 2007)considers specifically indices for foreign exchange. However, the method can be appliedto other asset classes.

Lo uses news alerts in developing his sentiment indices. These are quick news flasheswhich are issued when a newsworthy event occurs. They are both timely and relevant.An example of a Reuters NewsScope alert

TimeStamp 02 AUG 2007 04 :44 :26.155Alert Tsunami Warning Issued for Japan’s Western Hokkaido CoastJP ASIA NEWS DIS LEN RTRS

The alert comprises three items (i) TimeStamp, (ii) a short headline, and (iii) tagsand metadata. The tags are machine-readable and will often contain information aboutthe topic area. The headlines lend themselves well to machine analysis since they areconcise and formed from a small vocabulary. Lo notes the purpose of the indices is torapidly identify and report market moving information. Once constructed he undertakes(event study) experiments to validate their quality, developing metrics which havethe potential to indicate whether the indices are able to predict significant marketmovements.

Framework for real-time news analytics

We consider here the framework for developing Reuters NEIs. For a given asset classand related topic area the following parameters are used:

14 The Handbook of News Analytics in Finance

Page 15: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

(1) List of keywords and phrases with real-valued weights; ðW1; �1Þ; . . . ; ðWk; �kÞ.(2) A rolling ‘‘sentiment’’ window of size r (say 5=10 minutes).(3) A rolling calibration window of size R (say 90 days).

Initially a raw score is created.We have ðW1; �1Þ; . . . ; ðWk; �kÞ, where W1 is the first keyword and �1 is the weighting

for the first keyword.The raw score at time t is assigned by considering the time period ðt� r; t�.ðw1; . . . ;wkÞ is the vector of keyword frequencies in ðt� r; t�; that is, wi is the numberof times keyword Wi occurred in the last r minutes. The raw score is defined as

st �X

i

�iwi ð1:2Þ

The raw score will tend to be high when the news volume is high. A normalized score istherefore produced using the rolling calibration window. At all times t for the R days inthe calibration window, we record

(i) the raw score st that would have been assigned,(ii) the news volume n½t�r;tÞ; that is, the number of words that were observed in the time

interval ½t� r; tÞ.

The normalized score is determined by comparing the current raw score against thedistribution of raw scores in the calibration window, where the news volume equalledthe current news volume. This means we only consider those raw scores where the newsvolume equals the current news volume.

St �jft0 2 ½t� R; tÞ : n½t0�r;t0Þ ¼ n½t�r;tÞ&st0 < stgjjft0 2 ½t� R; tÞ : n½t0�r;t0Þ ¼ n½t�r;tÞgj

ð1:3Þ

We notice the numerator is a subset of the denominator, hence St � 1. If St ¼ 0:92, wecan say that 92% of the time when news volume is at the current level, the raw score isless than it currently is. Lo creates an alternative score based on topic codes. Instead ofcounting word frequencies, the fraction of news alerts (in the last r minutes) tagged withparticular topic codes are used.Naturally, the scoring method is dependent on the list of keywords/topic areas

(W1; . . . ;Wk) and the real-valued weights (�1; . . . ; �k). The lists of keywords/topics werecreated by selecting the major news categories that related to the asset class (foreignexchange) and creating lists, by hand, of words and topic areas that suggest newsrelevant to the categories. A tool was created to extract news from periods where highscores were assigned. This news was then manually inspected, so that the developercould determine whether the keywords (topics) were legitimate or needed adjusting.The optimal weights (�1; . . . ; �k) for the intraday return sentiment index were

determined by regressing the word (topic) frequencies against the intraday asset returns.Similarly the (optimal) weights for the intraday volatility sentiment index were deter-mined by regressing the word (topic) frequencies against the intraday (de-seasonalized)realized volatility. Volatility was observed to show strong seasonality on intraday time-scales, hence this series was de-seasonalized prior to derivation of the weights. Returnsdid not exhibit any seasonality. The time-series are given on an intraday basis, hence tokeep the data manageable a random subset of the observations is used in calibration.

Applications of news analytics in finance: A review 15

Page 16: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

Lo notes the determination of weights can be expressed as a more general classificationproblem. Other techniques might be applied; in particular, machine-learning algorithmssuch as the perceptron algorithm or support vector machines. He suggests further studyis required to find the best approach, but the standard linear regression approach doesperform well.

To establish that the final NEIs have empirical significance, Lo undertakes detailedevent study analysis. He uses the NEI series to define an event. An event is defined totake place when the index exceeds a certain threshold (say 0.995). He then removes anyevents that follow in less than one hour of another event. This guards against identifying‘‘new’’ events which are actually based on old news. The behaviour of exchange ratesbefore and after these events is then studied. Two time-series are considered: the logreturns and the deseasonalized squared log returns. He then tests the null hypothesisthat the distribution of log returns/deseasonalized squared log returns are the samebefore and after the events. He uses samples of one hour centred on the events.

We can visually assess the impact of events on the volatility of the EUR/USDexchange rate.

(1) Figure 1.6 shows the averaged volatility event window. The pre-event (averaged)volatility is shown by a bold line, and the post-event (averaged) volatility is shownby a faint line. There is a peak at the centre where there is a significant increase involatility.

(2) Figure 1.7 shows the density function of pre-event samples and post-event samplesof deseasonalized squared log returns. The shift to the right indicates an upwardshift in volatility on average.

As well as visual inspection, statistical tests can be introduced to compare the pre-eventand post-event samples. A t-test can be used to test equality of the means in the twosamples. Levene’s test can be used to determine whether there has been a change instandard deviation. The �2 goodness-of-fit test can be used to determine whether the twosamples are likely to have come from different distributions.

Indices and FX implied volatility

Lo finds that the event studies confirm that the constructed event indices, on average,

16 The Handbook of News Analytics in Finance

Figure 1.6. Pre- and post-event squared returns: shown as bold and faint lines.

Page 17: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

impact realized foreign exchange volatility. He further considers the relationship of theindices to implied volatility. The NEI volatility indices are constructed to predictvolatility over 30-minute periods. Implied volatility gives the markets’ expectations ofvolatility over a much longer horizon, typically 30 days. Event study analysis betweenimplied volatility and the NEI volatility indices shows no evidence of a relationship.Lo feels that implied volatility and the indices may function as complementary sourcesof information for risk management, since they intrinsically focus on different timehorizons.

1.4 MODELS AND APPLICATIONS

News analytics in finance is the use of technology and algorithms to process news withinthe investment management process. It allows investors to update their beliefs about thefuture market environment more effectively. This technology may be geared towardshuman decision support or it may be used to create automated quantitative strategies.The use of news data in addition to historic market data makes models more proactiveand less reactive. The applications broadly fall into two areas: trading and risk control.

1.4.1 Information flow and computational architecture

News analytics in finance focuses on improving IT-based legacy system applications.These improvements come through research and development directed at automating/semi-automating programmed trading, fund rebalancing and risk control applications.The established good practice of applying these analytics in the traditional manual

approach are as follows. News stories and announcements arrive synchronously andasynchronously. In the market, asset (stocks, commodities, FX rates, etc.) prices move(market reactions). The professionals digest these items of information and accordinglymake trading decisions, investment decisions and recompute their risk exposures.

Applications of news analytics in finance: A review 17

Figure 1.7. Distribution of pre- and post-event squared returns in bold and faint lines.

Page 18: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

The information flow and the (semi) automation of the corresponding IS architectureis set out in Figure 1.8. There are two streams of information which flow simultaneously:news data and market data. Pre-analysis is applied to news data; it is further filtered andprocessed by classifiers to compute relevant metrics. This is consolidated with themarket data of prices and together they constitute the classical datamart which feedsinto whatever relevant model-based applications are sought. A key aspect of theseapplications is that they set out to provide technology-enabled support to professionaldecision makers and thereby achieve intelligence amplification (Leinweber, 2009).

1.4.2 Trading and fund management

Generally traders and quantitative fund managers seek to identify and exploit assetmispricings before they are corrected in order to generate alpha. Most simply they mayuse (quantified) news data to rank stocks and identify which stocks are relativelyattractive (unattractive). They may then buy (sell) the highest (lowest) ranking stocks,thereby rebalancing a portfolio composed of desired weights on the selectedstocks. Similarly, the news data may be used to identify trading signals for particularstocks. Alternatively, analysts may use factor models to process new sources of newsdata. (Factor models, which are applied to give updated estimates of future asset returnsand volatility, allow us to determine an optimal future portfolio to hold; that is, they tellus which assets to hold and also in what proportions.) Analysts may also use news datato identify and exploit behavioural biases in investor attitude/reactions which result dueto the market and analyst misreaction to new information. In particular, this can arisedue to delayed information diffusion or due to investor inattention and limited ability toprocess all relevant information instantaneously.

Stock picking and ranking

Li (2006) uses a simple ranking procedure to identify stocks with positive and negative(financial language) sentiment. He examines form 10-K Securities and Exchange Com-mission (SEC) filings for non-financial firms between 1994 and 2005. He creates a ‘‘risksentiment measure’’ which is formed by counting the number of times the words risk,

18 The Handbook of News Analytics in Finance

Figure 1.8. Information flow and computational architecture

Page 19: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

risks, risky, uncertain, uncertainty and uncertainties occur within the managementdiscussion and analysis sections. A strategy which goes long in stocks with a low-risksentiment measure and short stocks with a high-risk sentiment measure is found toproduce a reasonable level of returns. Leinweber (2009) notes it is rumoured similarapproaches are being applied. The performance of the strategy has deteriorated in recentyears, possibly due to wider use of such strategies.Moniz, Brar, and Davis (2009) focuses on turning news signals into a trading strategy.

Equity analysts collect, process and disseminate information on companies to investors.In particular, they use their research to form earnings forecasts for companies. Earningsmomentum strategies thus become a proxy for corporate news flow. Moniz notes thesestrategies do not explicitly identify the piece of information that has triggered the changein earnings forecast. He investigates whether news leads earnings revisions. He finds thatnews data can be used to reinforce proxies for news already incorporated in models andthat a strategy based on earnings momentum reinforced by news flow is found to beeffective.Event studies based on news events can also provide the cue to fund managers to

identify potentially underpriced/overpriced stocks (see the discussion in Section 1.2.2).

Factor models

The Efficient Market Hypothesis (EMH) asserts that financial markets are‘‘informationally efficient’’ so prices of traded assets reflect all known informationand update instantaneously to reflect new information. Further, it is assumed thatagents act rationally. It is widely accepted within the fund management and tradingcommunity that the EMH, particularly in its strong form, does not hold. In the long run,markets may be efficient. But ‘‘The long run is a misleading guide to current affairs.In the long run we are all dead,’’ as John Maynard Keynes said. In the shorter termtraders and quantitative fund managers seek to identify and exploit asset mispricings,before these prices correct themselves, in order to generate alpha. In undertaking thisprocess they often seek to gain a competitive advantage by applying improved anddifferentiating sources of data and information.The Capital Asset Pricing Model (CAPM) is the classical approach to pricing equities

(Sharpe, 1964; Lintner, 1965). Any asset’s return can be split into a component that iscorrelated with the market’s return and a residual component that is uncorrelated withthe market. Under the CAPM, it is assumed that the expected return for the residualcomponent is zero and any stock’s expected return is dependent only on the expectedreturn of the market. The CAPM states that only risk (uncertainty) due to marketvariability should be priced. Residual risk can be diversified and therefore should notbe compensated.The Arbitrage Pricing Theory (APT) (introduced by Ross, 1976) extends the CAPM

to a more general linear model where additional sources of information to marketreturns are considered. Under the APT (multifactor models) an asset’s expected returnis represented as a linear sum of several ‘‘risk’’ (uncertainty) factors that are common toall assets and an asset-specific component. The APT states the investor should becompensated for their exposure to all sources of (non-diversifiable) risk.Active portfolio managers seek to incorporate their investment insight to ‘‘beat the

market’’. An accurate description of asset price uncertainty is key to the ability to

Applications of news analytics in finance: A review 19

Page 20: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

outperform the market. Tetlock, Saar-Tsechansky, and Macskassy (2008) note that aninvestor’s perception about future asset returns is determined by their knowledge aboutthe company and its prospects; that is, by their ‘‘information sets’’. They note that theseare determined from three main sources: analyst forecasts, quantifiable publicly dis-closed accounting variables and linguistic descriptions of the firm’s current and futureprofit-generating activities. If the first two sources of information are incomplete orbiased, the third may give us relevant information for equity prices.

Multifactor models are now widely used by fund managers in constructing alpha-generating strategies (Rosenberg, Reid, and Lanstein, 1985). Identifying the relevantfactors (and betas) is a measure of skill. Fund managers are always seeking new sourcesof advantage. This can be data and factors which translate to ‘‘quantitative knowledge’’.‘‘Profits may be viewed as the economic rents which accrue to [the] competitive advan-tage of . . . superior information, superior technology, financial innovation’’ (Lo, 1997).A ‘‘quantcentration’’ effect is frequently observed. That is, since most fund managershave access to the same sources of data, it is difficult to distinguish between their modelsand performance. Cahan, Jussa, and Luo (2009) find that news sentiment scores pro-vided by RavenPack act as an orthogonal factor to traditional quantitative factorscurrently used. Hence they add a diversification benefit to traditional factor models.In particular, they note the value of this source of information during the Credit Crisis,when determining fundamentals (which traditional quant factors are based on) wasproblematic.

Behavioural biases

Behavioural economists challenge the assumption that market agents act rationally.Instead, they propose that individuals display certain biased behaviour, such as lossaversion (Kahneman and Tversky, 1979), overconfidence (Barber and Odean, 2001),overreaction (DeBondt and Thaler, 1985) and mental accounting (Tversky andKahneman, 1981). Due to individual behavioural biases investors systematically deviatefrom optimal trading behaviour (Daniel, Hirshleifer, and Teoh, 2002; Hirshleifer, 2001;Odean and Barber, 1998). Behavioural economists use these biases to explain abnormalreturns, rather than risk-based explanations. Naturally, investor behaviour is dependenton individual and group psychology. Some of the research within behavioural financeseeks to understand the mechanisms of human investor behaviour, drawing heavily onthe fields of neuroscience and psychology (see, e.g., Peterson, 2007). Lo (2004) proposesa new framework—the Adaptive Market Hypothesis (AMH)—which seeks to reconcilemarket efficiency with behavioural alternatives. This is an evolutionary model, whereindividuals adapt to a changing environment via simple heuristics.

As noted before, the relationship between news and markets is complex. A number ofstudies consider how investors react to news releases; in particular, the behavioural andcognitive biases in their reactions to news. Quantitative investors often seek to system-atically exploit the anomalies observed in prices arising from investor behavioural biases(Moniz, Brar, and Davis, 2009; Barber and Odean, this volume, Chapter 7; Seasholesand Wu, 2004). There is a commercial fund called MarketPsy which employs strategiesthat exploit ‘‘collective investor misbehaviour’’ (see http://marketpsy.com/).

Barber and Odean consider evidence for the behavioural bias that individual investorshave a tendency to buy attention-grabbing stocks. Attention-grabbing stocks are defined

20 The Handbook of News Analytics in Finance

Page 21: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

as ones that display abnormal trading volumes, extreme one-day returns or are men-tioned on the Dow Jones News Service. In contrast, professional managers who arebetter equipped to assess a wider range of stocks are less prone to buying attention-grabbing stocks. In particular, institutional investors, who use computers to managetheir searches, normally specialize in a particular sector and may consider only thosestocks that meet certain criteria. For every buyer there must be a seller. So if one groupincurs losses the other group profits. If individual investors fail to react appropriately tonews and attention, there is scope for institutional investors to profit. Seasholes and Wu(2004) find individual investors tend to buy stocks that hit an upper price limit. They findan impact on the prices of these attention-grabbing stocks, which reverses to pre-eventlevels within ten working days. Further, they find a group of professional investors whoprofit from the biased behaviour of individual investors.Fang and Peress (2009) consider whether media coverage can help predict the cross-

section of future stock returns. They find stocks with no media coverage outperformwidely covered stocks even after allowing for well-known risk factors. This is contrary tothe findings of Barber and Odean. But this finding supports the investor recognitionhypothesis of Merton (1987). Da, Engleberg, and Gao (2009) also consider how theamount of attention a stock receives affects its cross-section of returns. They use thefrequency of Google searches for a particular company as a measure of the amount ofattention a stock receives. They find some evidence that changes in investor attentioncan predict the cross-section of returns. This is most pronounced amongst small-capstocks.Some researchers consider how informational flows cause investors to update their

expectations in order to explain momentum and reversal effects. DeBondt and Thaler(1985) suggest that investors overreact to recent earnings by placing less emphasis onlong-term averages. Daniel, Hirshleifer, and Subrahmanyam (1998) suggest pricemomentum is a result of investors overreacting to private information causing pricesto be pushed away from fundamentals. In contrast, Hong and Stein (1999) suggest pricemomentum occurs due to investors underreacting to new information. They suggestinformation diffuses slowly and is gradually incorporated into prices. Hirshleifer, Lim,and Teoh (2010) find that when there is a significant number of earnings announcementsin the market, investors are distracted and underreact to relevant new information, andthe post-announcement drift is strong. Investors fail to price the information efficiently,leaving an opportunity for quantitative investors. Scott, Stumpp, and Xu (2003) con-clude that price momentum is caused by underreaction of stocks to earnings-relatednews. This is contrary to prior literature which suggested that price momentum wasconnected to trading volume.Chan (2003) finds stocks with major public news exhibit momentum over the

following month. In contrast, stocks with large price movements, but an absence ofnews, tend to show return reversals in the following month. This would support atrading strategy based on momentum reinforced with news signals. Da, Engleberg,and Gao (2009) extend their analysis of Google searches to consider the debate onhow momentum works. They find price momentum is stronger in stocks with high levelsof Google (SVI) searches. This supports Daniel, Hirshleifer, and Subrahmanyam (1998)view since one would expect investors to overreact to stocks they are paying closeattention to. Gutierrez, Kelley, and Hall (2007) and Hou, Peng, and Xiong (2009) alsoinvestigate the relationship between news (information flows) and momentum.

Applications of news analytics in finance: A review 21

Page 22: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

1.4.3 Monitoring risk and risk control

For effective financial risk control, fund management companies need to identify,understand and quantify potential (adverse) outcomes, their related probabilities andthe severity of their impacts. This knowledge allows them to assess how best to manageand mitigate risk. Traditionally, historic asset price data have been used to estimate riskmeasures. These traditional approaches have the disadvantage that they provide ex-postretrospective measures of risk. They fail to account for developments in the marketenvironment, investor sentiment and knowledge. Incorporating measures or observa-tions of the market environment within the estimation of future portfolio returndistributions is important, since market conditions are likely to vary from historicobservations. This is particularly important when there are significant changes in themarket. In these cases, risk measures, calibrated using historic data alone, fail tocapture the true level of risk (see Mitra, Mitra, and diBartolomeo, 2009; diBartolomeoand Warrick, 2005). Recent technological developments have enabled the creation ofdata-mining tools that can interpret live news feeds (see Section 1.3 and RavenPack,2010; Brown/Thomson Reuters, 2010; Vreijling/SemLab, 2010). Mitra, Mitra, anddiBartolomeo (2009) find that updating risk estimates using news data can providedynamic (adaptive) measures that account for the market environment. Further, thesemeasures may be useful in identifying and giving early ex-ante warning of extreme riskevents.

The risk structure of assets may change over time in response to news. Patton andVerardo (2009) investigate whether the systematic risk (beta) of stocks increases inresponse to firm-specific news (in the form of earnings announcements). They undertakean event study on the beta of stocks around their earnings announcement dates. Thechange in beta on announcement date can be broken down as change due to an increasein volatility of that stock and change due to an increase in covariance with the index.They find that news releases do have an important impact on the risk of stocks. Further,much of the beta increase arises from an increase in covariance with other stocks. Thissuggests there could be a contagion effect in the information releases for one stock on theprice movements of other stocks. This supports anecdotal evidence that investors willmonitor earnings of related stocks when investigating the earnings of a particular stock.They suggest the Credit Crisis (2008) could be viewed as a negative earnings surprise forthe market. Correlations were observed to increase during this period.

The relationship between public information release and asset price volatility has beenwidely investigated and noted. Ederington and Lee (1993) find a relationship betweenmacroeconomic announcements and foreign exchange and interest rate futures returnvolatility. Graham, Nikkinen, and Sahlstrom (2003) find stock prices on the S&P 500 arealso influenced by macroeconomic announcements. Kalev et al. (2004) find that aGARCH model for equity returns which incorporates asset-specific news givesimproved volatility forecasts. This study is extended in Kalev and Duong (this volume,Chapter 12). Robertson, Geva, and Wolff (2007) also consider a GARCH model whichaccounts for ‘‘content-aware’’ measures of news.

It is observed that volatility is higher in down markets. This is sometimes referred toas the leverage effect. Dzielinski, Rieger, and Talpsepp (this volume, Chapter 11) refer toit as volatility asymmetry. Their investigation concludes it is likely to be driven by theoverreaction of private investors to bad news. In line with this theory, they find that an

22 The Handbook of News Analytics in Finance

Page 23: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

increase in private investor attention to negative news can predict a rise in volatility.Increased private investor attention to negative news is measured by a change in the levelof Google searches for negative words related to the macroeconomy, such as recession.The relationship between equity price volatility and web activity has also been widely

investigated. Wysocki (1999) finds that spikes in Yahoo! Message Board activity aregood predictors of equity volatility (also volume and excess returns). Antweiler andFrank (2004) also have similar findings for equity volatility. An application for trafficanalysis from the web was developed by Codexa for Bear Wagner to aid its riskmanagement strategy in predicting (unexpected) high volatility (Leinweber, 2009,Ch. 10, p. 237).As discussed in Section 1.3, Lo (2008) creates event indices (scores) that are

constructed to predict changes in (foreign exchange) volatility. Empirical event studiesshow these are effective at converting incoming qualitative text (textual news) intoquantitative signals that do indicate changes in volatility.

1.4.4 Desirable industry applications

Stock picking, trading, and fund management (Section 1.4.2) and risk control (Section1.4.3) are established application areas in the finance industry and the use of newsanalytics (NA) is researched to achieve improved performance. We may use certainnews data within quantitative models. We may use it simply to forecast the directionalimpact of news on asset prices. In more sophisticated models we might wish to determinereturn predictions. Models which forecast volatility and volume on the basis of newswill also find important applications within the investment management process.The following is an itemized list of possible/desirable applications:

. Market surveillance Responding to the state of the market and taking intoconsideration the preoccupations of the watchdogs; that is, the regulators’ marketsurveillance is becoming an important application area of quant models. It is gainingin importance because managers through internal control functions as much asexternal compliance requirement wish to have surveillance in place to catch roguetrading and insider information-based trading. An innovative application of NA is tospot patterns which capture these.

. Trader decision support News data can aid traders in making decisions. News datasignals may confirm traders’ existing analyses or it may cause them to reconsider theiranalyses.

. Wolf detection/circuit breaker Wolf detectors (circuit breakers) are a risk controlfeature for algorithmic trading built on machine-readable news. Essentially they‘‘break the circuit’’ stopping an automated algorithm from trading on a certain assetwhen particular types of news are released. It is important to try not to shout ‘‘Wolf !’’when no wolf has actually appeared. These risk control features can be customized toonly be tripped when substantive news events have occurred. Alternatively, thealgorithms can be turned back after the nature of the news has been programmaticallyanalysed. This can be done using different features of machine-readable news data (seeA Team, 2010).

. News flow algorithms It is widely recognized that news flow is a good indicator ofvolume and volatility. As the flow of news about a company rises, the volume traded

Applications of news analytics in finance: A review 23

Page 24: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

rises resulting in more stock price volatility. If news flow can be used effectively topredict volume or volatility spikes then algorithms based on News Volume WeightedAverage Price (NVWAP) vs. VWAP on its own may add value for trade executionstrategies.

. Post-trade analysis Assist in proving best execution and trader performance throughpost-trade analysis.

News data are likely to add value for investors trading at all frequencies from volatility-based strategies to equity trading.

. Alpha-generating signal News data can be used in alpha generation at varioustrading frequencies. News sentiment data may be used within factor models. Cahan,Jussa, and Luo (2009) consider such an application. Their results are positive and theyfind that such an approach does add value. In particular, they note the value of thissource of information during the Credit Crisis, when determining fundamentals(which traditional quant factors are based on) was problematic. News data can alsoaid quant investors to identify the non-rational biased behaviour of investors. Thesecan then be exploited.

. Stock-screening tool News data can be used to aid stock screening. In particular,sentiment data may be used to guess the directional movement of future returns. Verygood news stocks (e.g., top sentiment quintile) might be selected to be held long andvery bad news stocks (e.g., bottom sentiment quintile) might be selected to be heldshort.

. Fundamental research News analysis tools may aid traditional non-quant managersby allowing them to undertake market research more efficiently.

. Risk management The use of news data within risk forecasting can allow fordynamic (adaptive) risk management strategies that are forward-looking and arebased on changing market environments. Further, this risk analysis applied usingnews data can help investors understand event risk and how different kinds of eventscan impact their portfolio risk profile.

. Compliance/Market abuse News data may allow regulators to identify potentialmarket abuse and insider trading, perhaps by allowing the regulator to identifymarket reactions prior to relevant public new releases.

1.5 SUMMARY AND DISCUSSIONS

The development of news analytics and its applications to finance through sentimentanalysis is gaining progressive acceptance within the investment community. A growingnumber of academic studies have been conducted; in this chapter we have reviewed thesein a summary form. Research by service providers of data and content for the financeindustry is also discussed in this chapter and we have identified the applications of newsanalytics to high-frequency and low-frequency trading as well as in risk control andcompliance. The study of news analytics draws upon research from a number ofdisciplines including natural language processing, artificial intelligence (AI), patternrecognition, text mining, information engineering and financial engineering. Webelieve news analytics will soon become an important area of study within financialanalytics.

24 The Handbook of News Analytics in Finance

Page 25: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

1.A APPENDIX: STRUCTURE AND CONTENT OF

NEWS DATA

In this appendix we summarize the services offered by two leading providers of newsanalytics, namely Thomson Reuters (News Analytics) and RavenPack (NewsScore).The information is presented under three main headings: (i) coverage, (ii) methodand types of scores, and (iii) example of news data in tabular form.

1.A.1 Details of Thomson Reuters News Analytics equity coverage and available data

(i) Coverage

Real-time and historical equity coverage

Commodities and energy: 39 C&E topicsEquity:All equities 34,037 100.0%Active companies 32,719 96.1%Inactive companies 1,318 3.9%

Equity coverage by region

Americas: 14,785APAC: 11,055EMEA: 8,197

Equity coverage updates: Bi-weekly updates for recent changes (de-listings, M&A,IPOs).

History: Available from January 2003 (history kept for delisted companies; symbologychanges tracked). Version control procedures enable clients to test ‘‘as-was’’ versions aswell as system enhancements. With new enhancements and scoring logic, history isrescored from 2003 and provided to client.

Data fields: Eighty-two metadata fields including Timestamp (gmt in milliseconds),linked counts over various time periods which measure repetition, linked item cross-references, language, topics, prevailing sentiment, detailed sentiment, relevance, size ofitem, broker action, market commentary, number of companies mentioned, position offirst mention, news intensity, news source, story type, headline, company identifier,among others.

Delivery mechanisms: Internet/VPN, co-location, dedicated circuits, FTP, ThomsonReuters Quantitative Analytics/market quantitative analytics, and deployed onsite forcustomers wanting custom analytics and proprietary sources analysed.

News sources: Reuters and a host of third-party sources as standard; able to processcustomer-specific sources including internet feeds, PDF files, and text from databases.

Applications of news analytics in finance: A review 25

Page 26: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

(ii) Method and types of scores

Timestamp (GMT): DD MM YYYY hh :mm : ss : sss The date and time of the newsitem as time-stamped by the network and written to the News Archive. All messages aretime-stamped to the nearest millisecond—this time represents the time that the messagewas transmitted by Reuters across its real-time network.

Item ID: A unique ID, identifying the news item. If a particular news item is scored formultiple assets (companies or commodities and energy topics), it has the same ID in eachof the assets’ metadata sets.

Stock RIC: Reuters Instrument Code (RIC) of the equity (or topic code for commodityand energy items) for which the scores apply. Note: because the system’s sophisticationallows for the scoring of items at the individual entity level, not the overall article levelsentiment which tends to be less accurate for specific entities, a single news article mayproduce multiple ‘‘rows’’ or images of data corresponding to each Stock RIC (or C&Etopic) in the article.

Feed ID: Feed identifier: The identifier for the feed handler service that supplied the newsitem. Consists of the feed type, followed by feed service. Useful in determining source orfeed credibility and patterns for and effects of news syndication.

News source: This identifies the publisher of the news item within the feed. For example,the originator of a news story published widely on the internet. It is up to the feed handlerto supply a value.

Headline: The headline of the news item. For Thomson Reuters, if the news item was analert, this is the text of the alert. If it was an article, append or overwrite, then this is theheadline.

Relevance: A real-valued number indicating the relevance of the news item to the asset.It is calculated by comparing the relative number of occurrences of the asset with thenumber of occurrences of other organizations and/or commodities within the text ofthe item. For stories with multiple assets, the asset with the most mentions will have thehighest relevance. An asset with a lower number of mentions will have a lower relevancescore.

Number of sent wds/tkns: Number of sentiment words/tokens: The number of lexicaltokens (words and punctuation) in the sections of the item text that are deemed relevantto the asset. This is the number of words used in the sentiment calculation for this asset.Can be used in conjunction with Total Wds/Tkns to determine the proportion of thenews item discussing the asset.

Total wds/tkns: The total number of lexical tokens (words and punctuation) in the item.Can be used in conjunction with Number of Sent Wds/Tkns to determine the proportionof the news item discussing the asset.

First mention: The first sentence in which the scored asset is mentioned. Often, morerelevant assets are mentioned towards the beginning of a news item. Can be used inconjunction with Total Sentences to determine the relative position of the first mention inthe item.

26 The Handbook of News Analytics in Finance

Page 27: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

Total sentences: The total number of sentences in the news item. Can be used inconjunction with First Mention to determine the relative position of the first mentionof the asset in the item.

Number of companies: The number of companies in the news item. The CO _IDS fieldcontains a list of company RICs for scoring and is assigned by the feed handler. It isuseful to determine if this asset is one ofmany discussed in the news item (e.g., a round-uparticle).

Sentiment classification: This field indicates the predominant sentiment class for a newsitem with respect to this asset. The indicated class is the one with the highest probability.Values are 1¼ positive; 0¼ neutral;�1¼ negative. Scores are assigned to specific entities(or commodity topics) within the news item.

POS: Positive Sentiment Probability: The probability that the sentiment of the newsitem was positive for the asset. Range 0–1.0. The three probabilities (POS, NEU, NEG)sum to 1.0. Probability scores are assigned to specific entities (or commodity topics)within the news item.

NEU: Neutral Sentiment Probability: The probability that the sentiment of the newsitem was neutral for the asset. Range 0–1.0. The three probabilities (POS, NEU, NEG)sum to 1.0. Probability scores are assigned to specific entities (or commodity topics)within the news item.

NEG: Negative Sentiment Probability: The probability that the sentiment of the newsitem was negative for the asset. Range 0–1.0. The three probabilities (POS, NEU, NEG)sum to 1.0. Probability scores are assigned to specific entities (or commodity topics)within the news item.

Novelty fields (30 in total): Thomson Reuters News Analytics calculates the novelty ofthe content within a news item by comparing it with a cache of previous news items thatcontain the current asset. The comparison between items is done using a linguisticfingerprint, and if the news items are similar for that given asset, they are termed asbeing ‘‘linked’’. There are five history periods that are used in the comparison, by defaultthey are 12 hours, 24 hours, 3 days, 5 days, and 7 days prior to the news item’sTimestamp. Customers with deployed solutions can set their own historical look-backperiod lengths.

Two sets of scores are given:

. Within feed novelty News items are only compared with previous items from thesame feed.

. Across feed novelty News items are compared across all feeds attached to thesystem.

Each set of scores contain the following fields:

LNKD _CNTn: The count of linked articles in a particular time period gives ameasure of the novelty of the news being reported—the higher the linked count value,the less novel the story is for the given asset. If the count is zero, then the current itemcan be considered novel as there are no similar items reporting the story within thehistory period.

Applications of news analytics in finance: A review 27

Page 28: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

LNKD _IDn, LNKD _IDPVn: The ITEM _IDs of the five most recent and five oldestlinked articles for the longest of the history periods. This can be used to cluster similaritems. The Across Feed Novelty identifiers are prefixed with an ‘‘X’’.

Volume fields (10 in total): Thomson Reuters News Analytics calculates the volumenews for each asset. A cache of previous news items is maintained and the number ofnews items that mention the asset within each of five history periods is calculated. Bydefault, the history periods are 12 hours, 24 hours, 3 days, 5 days, and 7 days prior to thenews item’s timestamp and are the same as used in the novelty calculations. Thus directcomparisons between similar and total items within the history periods can be achieved.

Two sets of scores are given:

. Within feed volume Volume of news items mentioning the asset within the samefeed.

. Across feed volume Volume of news items mentioning the asset across all feeds.

Each set of scores contain the following fields:

ITEM _CNTn: The total count of items within the corresponding history period.The across feed volume identifiers are prefixed with an ‘‘X’’.

Item genre: Contains the descriptive of the story genre such as an imbalance messageor Reuters news headline tags for the item (e.g., INTERVIEWS, EXCLUSIVES,WRAPUPS, DEALTALK, etc.).

Broker action: Item is reporting the action of a broker in their recommendation of theasset. For example, ‘‘Goldman upgrades Microsoft to buy from sell’’ would contain‘‘UPGRADE’’ in the Microsoft record.

Commentary: Indicator that the item is discussing general market conditions, such asafter-the-bell summaries. May be used to filter/weight news which describes the stockprice, something we may already know by consuming a pricing feed.

Product permission code: Permission codes that apply to the record. Thomson ReutersNews Analytics currently has two permission levels: LIVE and ARCHIVE. Thesecodes are used to specify what we are allowed to do with the record. LIVE allows usageof data in real time; for example, in an algorithmic trading system. ARCHIVE allowsusage for algorithm development and training.

Item type: Indicates the type of news item. The following values are possible:

Alert The news item was generated as a result of an alert. It consists of a single line oftext generally written to report a single fact quickly.

Article Indicates that the news item was a fresh story. The item consists of a headlineand body/story text.

Append The news item was generated by appending text to an existing story take.The news item consists of a headline and story body, where News Analytics scores theentire body of the text, not just the appended section.

Overwrite The news item was generated by replacing the entire body text of a newsstory. It consists of a headline and body where the body is the new version of the body.

Primary news access code (PNAC): A story identifier used to understand theprogression of an event’s coverage. The various parts of a story chain (Alert, Article,

28 The Handbook of News Analytics in Finance

Page 29: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

updates, corrections) share the same PNAC (as well as story date/time). It can be used tosee which article follows a set of alerts, to which article the update applies, etc.

Story date/time: Date/time the first alert or take in the story chain was filed (in gmt)—therefore, STORY _DATE/TIME is the same for all messages in a story chain.

Take date/time: The date/time of the news item (i.e., when that particular version/takeof the story was published). Story date/time will be consistent across the various takes ofthe story, but take date/time will be specific to that item.

Take sequence number: Sequence number of this alert/take in this story—set to 1 for thefirst and incremented by 1 for each subsequent alert/take in the story. This can helpdetermine if the item is the second alert or third update to a story, for example.

Attribution: Organizational source of the story (i.e., Reuters, PR Newswire, etc.).

Topic codes: Topic code(s) for the story which annotate what the story is about.For example, corporate results (RES), mergers and acquisitions (MRG), research(RCH), etc.

Company codes: Instrument RIC(s) for the story (i.e, the symbology for those companiesmentioned in the story).

(iii) Example of news data in tabular form

Subset of available fields (see Figure 1.9).

Applications of news analytics in finance: A review 29

Figure 1.9. Thomson Reuters NewsScope sentiments.

Page 30: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

1.A.2 Details of RavenPack News Analytics—Dow Jones Edition: Equity coverage

and available data

(i) Coverage

Real-time and historical equity coverage

Equity coverage 28,301 (100.0%)Active companies 22,172 (78.3%)Inactive companies 6,129 (21.7%)

Coverage by region

Americas 11,950 (42.2%)Asia 8,858 (31.3%)Europe 5,859 (20.7%)Oceania 1,436 (5.1%)Africa 186 (0.7%)

Available data

HistoricalData format Comma separated values (.csv) filesArchive range Available since January 1, 2000

point-in-time sensitivefree of survivorship bias

Archive packaging All .csv files compressed in .zip on a per year basisData fields Company analytics: 23 fields including sentiment, novelty,

relevance and categories, among othersStory coding 11 fields with coding information from the original storyDownload Secure web download

Real-timeConnection Over the internet or direct leased linesSoftware Local install of RavenPack Data GatewayþAPIAccess Push feed for real-timeþ historical query mechanismAPI For Windows and Linux.

(ii) Method and types of scores

TIMESTAMP_UTC: The date/time (yyyy-mm-dd hh :mm : ss.sss) at which the newsitem was received by RavenPack servers in Coordinated Universal Time (utc).

COMPANY: This field includes a company identifier in the format ISO_CODE/TICKER. The ISO_CODE is based on the company’s original country of incorporationand TICKER on a local exchange ticker or symbol. If the company detected is aprivately held company, there will be no ISO_CODE/TICKER information, only anRP_COMPANY_ID.

30 The Handbook of News Analytics in Finance

Page 31: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

ISIN: An International Securities IdentificationNumber (ISIN) to identify the companyreferenced in a story. The ISINs used are accurate at the time of story publication. Onlyone ISIN is used to identify a company, regardless of the number of securities traded forany particular company. The ISIN used will be the primary ISIN for the company at thetime of the story.

RP_COMPANY_ID: A unique and permanent company identifier assigned byRavenPack. Every company tracked is assigned a unique identifier comprised of sixalphanumeric characters. The RP_COMPANY_ID field consistently identifies com-panies throughout the historical archive. RavenPack’s company detection algorithmsfind only references to companies by information that is accurate at the time of storypublication (point-in-time sensitive).

RELEVANCE: A score between 0 and 100 that indicates how strongly related thecompany is to the underlying news story, with higher values indicating greater relevance.For any news story that mentions a company, RavenPack provides a relevance score. Ascore of 0 means the company was passively mentioned while a score of 100 means thecompany was predominant in the news story. Values above 75 are considered signifi-cantly relevant. Specifically, a value of 100 indicates that the company identified plays akey role in the news story and is considered highly relevant (context aware). The classifierdetecting companies has access to information about each company including shortnames, long names, abbreviations, security identifiers, subsidiary information, and up-to-date corporate action data. This allows for ‘‘point-in-time’’ detection of companies inthe text.

CATEGORIES: An element or ‘‘tag’’ representing a company-specific news announce-ment or formal event. Relevant stories about companies are classified in a set ofpredefined event categories following the RavenPack taxonomy. When applicable,the role played by the company in the story is also detected and tagged. RavenPackautomatically detects key news events and identifies the role played by the company.Both the topic and the company’s role in the news story are tagged and categorized. Forexample, in a news story with the headline ‘‘IBM Completes Acquisition of TelelogicAB’’ the category field includes the tag acquisition-acquirer (since IBM is involved in anacquisition and is the acquirer company). Telelogic would receive the tag acquisition-acquiree in its corresponding record since the company is also involved in the acquisitionbut as the acquired company. Similarly, a story published as ‘‘Xerox Sues Google OverSearch-Query Patents’’ is categorized as a patent-infringement. Xerox receives the tagpatent-infringement-plaintiff while Google gets patent-infringement-defendant. Bydefinition, a company linked to a category given its role receives a RELEVANCE scoreof 100.

ESS—EVENT SENTIMENT SCORE: A granular score between 0 and 100 that repre-sents the news sentiment for a given company by measuring various proxies sampledfrom the news. The score is determined by systematically matching stories typicallycategorized by financial experts as having short-term positive or negative share priceimpact. The strength of the score is derived from training sets where financial expertsclassified company-specific events and agreed these events generally convey positive ornegative sentiment and to what degree. Their ratings are encapsulated in an algorithm

Applications of news analytics in finance: A review 31

Page 32: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

that generates a score range between 0 and 100 where higher values indicate more positivesentiment while values below 50 show negative sentiment.

ENS—EVENT NOVELTY SCORE: A score between 0 and 100 that represents how‘‘new’’ or novel a news story is within a 24-hour time window. The first story reporting acategorized event about one or more companies is considered to be the most novel andreceives a score of 100. Subsequent stories within the 24-hour time window about thesame event for the same companies receive lower scores.

ENS_KEY—EVENTNOVELTYKEY: An alphanumeric identifier that provides a wayto chain or relate stories about the same categorized event for the same companies. TheENS_KEY corresponds to the RP_STORY_ID of the first news story in the sequence ofsimilar events. The identifier allows a user to track similar stories reporting on the sameevent about the same companies.

CSS—COMPOSITE SENTIMENT SCORE: A sentiment score between 0 and 100that represents the news sentiment of a given story by combining various sentimentanalysis techniques. The direction of the score is determined by looking at emotionallycharged words and phrases and by matching stories typically rated by experts as havingshort-term positive or negative share price impact. The strength of the score (valuesabove or below 50, where 50 represents neutral strength) is determined from intradaystock price reactions modeled empirically using tick data from approximately 100 large-cap stocks.

NIP—NEWS IMPACTPROJECTIONS: A score taking values between 0 and 100 thatrepresents the degree of impact a news flash has on the market over the following 2-hourperiod. The training set for this classifier used tick data for a test set of large-capcompanies and looked at the relative volatility of each stock price measured in the 2hours following a news flash. This NIP score is based on RavenPack’s Market ResponseMethodology.

PEQ—GLOBAL EQUITIES: A score that represents the news sentiment of the givennews item according to the PEQ classifier, which specializes in identifying positive andnegative words and phrases in articles about global equities. Scores can take values of 0,50 or 100 indicating negative, neutral or positive sentiment, respectively. This sentimentscore is based on RavenPack’s Traditional Methodology.

BEE—EARNINGS EVALUATIONS: A score that represents the news sentiment ofthe given story according to the BEE classifier, which specializes in news stories aboutearnings evaluations. Scores can take values of 0, 50 or 100 indicating negative, neutral orpositive sentiment, respectively. This sentiment score is based on RavenPack’s ExpertConsensus Methodology.

BMQ—EDITORIALS&COMMENTARY: A score that represents the news sentimentof the given story according to the BMQ classifier, which specializes in short commentaryand editorials on global equity markets. Scores can take values of 0, 50 or 100 indicatingnegative, neutral or positive sentiment, respectively. This sentiment score is based onRavenPack’s Expert Consensus Methodology.

32 The Handbook of News Analytics in Finance

Page 33: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

BAM—VENTURE, COMPANY, MERGERS & ACQUISITIONS: A score thatrepresents the news sentiment of the given story according to the BAM classifier, whichspecializes in news stories about mergers, acquisitions and takeovers. Scores can takevalues of 0, 50 or 100 indicating negative, neutral or positive sentiment, respectively. Thissentiment score is based on RavenPack’s Expert Consensus Methodology and has beentrained on stories that lead up to a pre-identified mergers, acquisitions and takeoverevent.

BCA—REPORTS ON CORPORATE ACTIONS: A score that represents the newssentiment of the given news story according to the BCA classifier, which specializes inreports on corporate action announcements. Scores can take values of 0, 50 or 100indicating negative, neutral or positive sentiment, respectively. This sentiment score isbased on RavenPack’s Expert Consensus Methodology and has been trained on storiesthat lead up to a pre-identified corporate action announcement.

BER—EARNINGS RELEASES: A score that represents the news sentiment of thegiven story according to the BER classifier, which specializes in news stories aboutearnings releases. Scores can take values of 0, 50 or 100 indicating negative, neutralor positive sentiment, respectively. This sentiment score is based on RavenPack’s ExpertConsensus Methodology.

ANL-CHG—ANALYST RECOMMENDATIONS & CHANGES: A score that repre-sents the recommendation by an analyst in terms of stock upgrades and downgrades.When the mention of a company in a story matches the criteria for ANL-CHG, scorescan take values of 0, 50 or 100, indicating a downgrade, neutral or upgrade rating,depending on recommendations issued by the analyst.

MCQ—MULTI CLASSIFIER FOR EQUITIES: A score that represents the newssentiment based on the tone towards the most relevant company mentioned in a story.The score is derived from a combination of values produced by the BMQ, BEE, BCAand ANL-CHG classifiers. AnMCQ score is assigned when a company is mentioned in aheadline and tagged with a sentiment value by any of these four classifiers. When themention of a company in a story matches the criteria for MCQ, scores can take values of0, 50 or 100 indicating negative, neutral or positive sentiment, respectively.

EDITORIAL_NOVELTY: A single news event may often be reported as a chain oflinked stories. This number identifies the order of the story in a news chain. Integer scoreshave a minimum of 1 and no maximum. A score of 1 indicates this is the first take of thestory published, whereas a score of 2 indicates this is the second take.

DJ_ACCESSION_NUMBER: This numeric identifier assigned by Dow Jones identifiesto which news chain a given story belongs. Stories that are part of the same chain have thesame Dow Jones accession number; those that are part of different chains have a distinctaccession number.

Applications of news analytics in finance: A review 33

Page 34: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

RP_STORY_ID—RAVENPACK UNIQUE STORY IDENTIFIER: An alphanumericcharacter identifier to uniquely identify each news story analyzed. This value is uniqueacross all records.

NEWS_TYPE—TYPE OF NEWS STORY: Classifies the type of news story into oneof five categories: HOT-NEWS-FLASH, NEWS-FLASH, FULL-ARTICLE, PRESS-RELEASE and TABULAR-MATERIAL.

Coding file field descriptions

TIMESTAMP_UTC: The date/time (yyyy-mm-dd hh :mm : ss.sss) at which the newsitem was received by RavenPack servers in Coordinated Universal Time (utc).

RP_STORY_ID—RAVENPACK UNIQUE STORY IDENTIFIER: An alphanumericcharacter identifier to uniquely identify each news story analyzed. This value is uniqueacross all records.

DJ_STORY_ID—DOW JONES UNIQUE STORY IDENTIFIER: An alphanumericcharacter identifier provided by Dow Jones to uniquely identify each news storyanalyzed.

DJ_INDUSTRY: Includes metadata tags applied by Dow Jones that identify about 150industry categories to which the given story relates.

DJ_GOVERNMENT: Includes metadata tags applied by Dow Jones that identify towhich government bodies, agencies, representatives and personnel the given story relates.

DJ_NEWS: Metadata tags applied by Dow Jones that identify to which subject thegiven story relates.

DJ_MARKET: Metadata tags applied by Dow Jones that identify to which marketsector or swift currency code the given story relates.

DJ_REGION: Metadata tags applied by Dow Jones that identify more than 200countries, U.S. states, territories and broader regions to which the given story relates.

DJ_WSJ: Metadata tags applied by Dow Jones that identify The Wall Street Journaltopic code to which the story relates.

DJ_COMPANIES: Metadata tags applied by Dow Jones that identify to which of morethan 30,000 companies a given story relates.

DJ_COMPANIES_ISIN: Includes transformations to ISINs for the metadata tagsapplied by Dow Jones that identify to which company a given story relates. The ISINsare transformed using an internal database by RavenPack and include any of thecompanies tracked by RavenPack along with ISINs sent by Dow Jones in the metadataof the story.

34 The Handbook of News Analytics in Finance

Page 35: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

(iii) Example of news data in tabular form

Analytics file: data fields 1–8

Analytics file: data fields 9–23

Coding file: data fields 1–7

Coding file: data fields 8–11

Figure 1.10. Data layout from RavenPack News Analytics—Dow Jones Edition.

Applications of news analytics in finance: A review 35

Page 36: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

1.B REFERENCES

Antweiler W.; Frank M. (2004) ‘‘Is all that talk just noise? The information content of stock

message boards,’’ Journal of Finance, 59(3).

Bamber L.S.; Barron O.E.; Stober T.L. (1997) ‘‘Trading volume and different aspects of

disagreement coincident with earnings announcements,’’ The Accounting Review, 72, 575–597.

Barber B.M.; Odean T. (2001) ‘‘Boys will be boys: Gender, overconfidence, and common stock

investment,’’ Quarterly Journal of Economics, 116(1), 261–292.

Barber B.M.; Odean T. (2008) ‘‘All that glitters: The effect of attention and news on the buying

behavior of individual and institutional investors,’’ Review of Financial Studies, 21(2), 785–818

(updated in this volume: see Chapter 7).

Blasco N.; Corredor P.; Del Rio C.; Santamaria R. (2005) ‘‘Bad news and Dow Jones make the

Spanish stocks go round,’’ European Journal of Operational Research, 163(1), 253–275.

Boyd J.H.; Hu J.; Jagannathan R. (2005) ‘‘The stock market’s reaction to unemployment news:

Why bad news is usually good for stocks,’’ Journal of Finance, 60(2), 649–672.

Brown, R./Thomson Reuters (2010) ‘‘Incorporating news analytics into quantitative investment

and trading strategies,’’ paper presented at CARISMA Annual Conference. Available at

http://www.optirisk-systems.com/papers/RichardBrown.pdf

Busse J.A.; Green T.C. (2002) ‘‘Market efficiency in real time,’’ Journal of Financial Economics,

65(3), 415–437.

Cahan R.; Jussa J.; Luo Y. (2009) Breaking News: How to Use News Sentiment to Pick Stocks,

MacQuarie Research Report.

Campbell J.Y.; Lo A.W.; MacKinlay A.C. (1996) ‘‘The econometrics of financial markets,’’ Event

Study Analysis, Chapter 4, Princeton University Press, Princeton, NJ.

Chan W.S. (2003) ‘‘Stock price reaction to news and no-news: Drift and reversal after headlines,’’

Journal of Financial Economics, 70(2), 223–260.

Da Z.; Engleberg J.; Gao. P. (2009) In Search of Attention, Working Paper, SSRN. Available at

http://papers.ssrn.com/sol3/papers.cfm?abstract id=1364209

Daniel K.; Hirshleifer D.; Subrahmanyam A. (1998) ‘‘Investor psychology and security market

under- and overreactions,’’ Journal of Finance, 53(6), 1839–1885.

Daniel K.; Hirshleifer D.; Teoh S.H. (2002) ‘‘Investor psychology in capital markets: Evidence

and policy implications,’’ Journal of Monetary Economics, 49(1), 139–209.

Das S. (this volume) ‘‘News analytics: Framework, techniques, and metrics’’ (see Chapter 2).

Das S.R.; Chen M.Y. (2007) ‘‘Yahoo! for Amazon: Sentiment extraction from small talk on the

web,’’ Management Science, 53(9), 1375–1388.

De Bondt W.; Thaler R. (1985) ‘‘Does the stock market overreact?’’ Journal of Finance, 40(3),

793–805.

diBartolomeo D. (this volume) ‘‘Using news as a state variable in assessment of financial market

risk’’ (see Chapter 10).

diBartolomeo D.; Warrick S. (2005) ‘‘Making covariance based portfolio risk models sensitive to

the rate at which markets reflect new information,’’ in J. Knight and S. Satchell (Eds.), Linear

Factor Models, Elsevier Finance.

Dzielinski M.; Rieger M.O.; Talpsepp T. (this volume) ‘‘Volatility, asymmetry, news, and private

investors’’ (see Chapter 11).

Ederington L.H.; Lee J.H. (1993) ‘‘How markets process information: News releases and

volatility,’’ Journal of Finance, 48, 1161–1191.

Engleberg J.; Sankaraguruswamy S. (2007) ‘‘How to gather data using a web crawler: An

application using SAS to search EDGAR.’’ Available at http://papers.ssrn.com/sol3/papers.cfm?abstract id=1015021&r

Fama E.F.; French K.R. (1992) ‘‘The cross-section of expected stock returns,’’ Journal of Finance,

47(2), 427–466.

36 The Handbook of News Analytics in Finance

Page 37: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

Fang L.H.; Peress J. (forthcoming) ‘‘Media coverage and the cross-section of stock returns,’’

Journal of Finance.

GrahamM.; Nikkinen J.; Sahlstrom P. (2003) ‘‘Relative importance of scheduled macroeconomic

news for stock market investors,’’ Journal of Economics and Finance, 27(2), 153–165.

Gutierrez Jr. R.C.; Kelley E.; Hall M.C. (2007) ‘‘The long-lasting momentum in weekly returns,’’

Journal of Finance.

Hafez P. (2009a) Construction of Market Sentiment Indices Using News Sentiment, White Paper,

RavenPack.

Hafez P. (2009b) Detection of Seasonality in News Flow, White Paper, RavenPack.

Hafez P. (2010) ‘‘The role of news in financial markets,’’ paper presented at CARISMA Annual

Conference. Available at http://www.optirisk- systems.com/papers/PeterAgerHafez.pdf

Hirshleifer D. (2001) ‘‘Investor psychology and asset pricing,’’ Journal of Finance, 56(4),

1533–1597.

Hirshleifer D.; Lim S.S.; Teoh S.H. (2010) ‘‘Driven to distraction: Extraneous events and

underreaction to earnings news,’’ CFA Digest, 40(1), Digest Summary.

Hong H.; Stein J.C. (1999) ‘‘A unified theory of underreaction, momentum trading, and

overreaction in asset markets,’’ Journal of Finance, 54(6), 2143–2184.

Hou, K. Peng, L. Xiong. W. (2009) ‘‘A tale of two anomalies: The implications of

investor attention for price and earnings momentum.’’ Available at http://ssrn.com/abstract=976394

Kahneman D. Tversky. A. (1979) ‘‘Prospect theory: An analysis of decision under risk,’’

Econometrica, 47(2), 263–291.

Kalev P.S.; Duong H.N. (this volume) ‘‘Firm-specific news arrival and the volatility of intraday

stock index and futures returns’’ (see Chapter 12).

Kalev P.S.; Liu W.M.; Pham P.K.; Jarnecic E. (2004) ‘‘Public information arrival and volatility of

intraday stock returns,’’ Journal of Banking and Finance, 28(6), 1441–1467.

Karpoff J.M. (1987) ‘‘The relation between price changes and trading volume: A survey,’’ Journal

of Financial and Quantitative Analysis, 22, 109–126.

Kittrell J. (this volume) ‘‘Sentiment reversals as buy signals’’ (see Chapter 9).

Kothari S.P.; Warner J.B. (2005) ‘‘Econometrics of event studies,’’ in B. Espen Eckbo (Ed.),

Handbook of Empirical Corporate Finance, Elsevier Finance.

Lavrenko V.; Schmill M.; Lawrie D.; Ogilvie P.; Jensen D.; Allan J. (2000) ‘‘Language models for

financial news recommendation,’’ paper presented at Proceedings of the Ninth international

Conference on Information and Knowledge Management, ACM.

Leinweber D. (2009) Nerds on Wall Street, John Wiley & Sons.

Leinweber D.; Sisk J. (this volume) ‘‘Relating news analytics to stock returns’’ (see Chapter 6).

Li F. (2006) Do Stock Market Investors Understand the Risk Sentiment of Corporate Annual

Reports?, Working Paper, University of Michigan. Available at http://papers.ssrn.com/sol3/papers.cfm?abstract id=898181

Liang X. (2005) ‘‘Impacts of internet stock news on stock markets based on neural networks,’’ in

Advances in Neural Networks, Springer, Berlin.

Lintner J. (1965) ‘‘The valuation of risk assets and the selection of risky investments in stock

portfolios and capital budgets,’’ Review of Economics and Statistics, 47(1), 13–37.

Lo A. (1997) Market Efficiency: Stock Market Behaviour in Theory and Practice, Edward Elgar.

Lo A.W. (2005) ‘‘Reconciling efficient markets with behavioural finance: The Adaptive Market

Hypothesis,’’ Journal of Investment Consulting, 7(2), 21–44.

Lo A. (2008) Reuters NewsScope Event Indices, AlphaSimplex Research Report, Thomson

Reuters.

Lo A.; Healy A. (this volume) ‘‘Managing real-time risks and returns: The Thomson Reuters

NewsScope Event Indices’’ (see Chapter 3).

Applications of news analytics in finance: A review 37

Page 38: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

Loughran T.; McDonald B. (forthcoming) ‘‘When is a liability not a liability?’’ Journal of Finance.

Luss R.; d’Aspremont A. (2009) Predicting Abnormal Returns from News Using Text

Classification, Working Paper, ORFE, Princeton, NJ.

Merton R.C. (1987) ‘‘A simple model of capital market equilibrium with incomplete information,’’

Journal of Finance, 42(3), 483–510.

Mitra L.; Mitra G.; diBartolomeo D. (2009) ‘‘Equity portfolio risk (volatility) estimation using

market information and sentiment,’’ Quantitative Finance, 9(8), 887–895.

Mittermayer M.A.; Knolmayer G. (2006) Text Mining Systems for Market Response to News:

A Survey, Working Paper, University of Bern. Available at SSRN http://www2.ie.iwi.unibe.ch/publikationen/berichte/resource/WP-184.pdf

Moniz A.; Brar G.; Davies C. (2009) Have I Got News for You, MacQuarie Research Report.

Moniz A.; Brar G.; Davies C.; Strudwick A. (this volume) ‘‘The impact of news flow on asset

returns: An empirical study’’ (see Chapter 8).

Munz M. (2010) ‘‘US markets: Earnings news release—an inside look,’’ paper presented at

CARISMA Annual Conference. Available at http://www.optirisk-systems.com/papers/MarianMunz.pdf

Odean T.; Barber B. (1998) ‘‘Are investors reluctant to realize their losses?’’ Journal of Finance,

53(5), 1775–1798.

Patton A.; Verardo M. (2009) Does Beta Move with News? Systematic Risk and Firm-Specific

Information Flows, FMG Discussion Paper. Available at http://eprints.lse.ac.uk/24421/1/dp630

Peramunetilleke D.; Wong R.K. (2002) ‘‘Currency exchange rate forecasting from news

headlines,’’ in Xiaofang Zhou (Ed.), 13th Australasian Database Conference (ADC2002),

Melbourne, Australia, Australian Computer Society.

Peterson R.L. (2007) ‘‘Affect and financial decision-making: How neuroscience can inform market

participants,’’ Journal of Behavioral Finance, 8(2), 70–78.

RavenPack (2010) RavenPack News Scores User Guide, February 11, V1.3.1.

Robertson C.; Geva S.; Wolff R. (2006) ‘‘What types of events provide the strongest evidence that

the stock market is affected by company specific news?’’ paper presented at Proceedings of

the Fifth Australasian Conference on Data Mining and Analytics, Vol. 61, p. 153, Australian

Computer Society.

Robertson C.S.; Geva S.; Wolff R.C. (2007) ‘‘News aware volatility forecasting: Is the content of

news important?’’ paper presented at Proceedings of the Sixth Australasian Conference on Data

Mining and Analytics, Vol. 70, pp. 161–170, Australian Computer Society.

Rosenberg B.; Reid K.; Lanstein R. (1985) ‘‘Persuasive evidence of market inefficiency,’’ Journal

of Portfolio Management, 11(3), 9–16.

Ross S.A. (1976) ‘‘The arbitrage pricing theory of capital asset pricing,’’ Journal of Economic

Theory, 13(3), 341–360.

Ryan P.; Taffler R.J. (2004) ‘‘Are economically significant stock returns and trading volumes

driven by firm-specific news releases?’’ Journal of Business Finance and Accounting, 31(1/2),

49–82.

Schmerken I. (2006) ‘‘Trading off the news,’’ Wall Street and Technology. Available at http://www.wallstreetandtech.com/technology-risk-management/showArticle.jhtml

Scott J.; Stumpp M.; Xu P. (2003) ‘‘News, not trading volume, builds momentum,’’ Financial

Analysts Journal, 59(2), 45–54.

Seasholes M.; Wu G. (2004) Profiting from Predictability: Smart Traders, Daily Price Limits, and

Investor Attention, Working Paper, University of California, Berkeley. Available at http://www.nd.edu/ pschultz/SeasholesWu.pdf

Sharpe W.F. (1964) ‘‘Capital asset prices: A theory of market equilibrium under conditions of

risk,’’ Journal of Finance, 19(3), 425–442.

38 The Handbook of News Analytics in Finance

Page 39: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter

A Team (2010)Machine Readable News and Algorithmic Trading, White Paper, Thomson Reuters

and Market News International.

Tetlock P.C. (2007) ‘‘Giving content to investor sentiment: The role of media in the stock market,’’

Journal of Finance, 62, 1139–1168.

Tetlock P.C.; Saar-Tsechansky M.; Macskassy S. (2008) ‘‘More than words: Quantifying language

to measure firms’ fundamentals,’’ Journal of Finance, 63(3), 1437–1467, .

Thomson-Reuters (2009) Reuters NewsScope Sentiment Engine: Guide to Sample Data and System

Overview, December, V3.

Tversky A.; Kahneman D. (1981) ‘‘The framing of decisions and the psychology of choice,’’

Science, 211(4481), 453.

Vreijling M./SemLab (2010) ‘‘Practical use of news in equity trading strategies,’’ paper presented

at CARISMA Annual Conference. Available at http://www.optirisk-systems.com/papers/MarkVreijling.pdf

Wysocki P.D. (1999) Cheap Talk on the Web: The Determinants of Postings on Stock Message

Boards, Working Paper No. 98025, University of Michigan.

Applications of news analytics in finance: A review 39

Page 40: 1 Applications of news analytics in finance: A review...Leela Mitra and Gautam Mitra ABSTRACT A review of news analytics and its applications in finance is given in this chapter