macquarie global quant conference, hong kong, 22...

Crowdsourced alphaMacquarie Global Quant Conference, Hong Kong, 22 September 2014

Vinesh Jha, CEO

ExtractAlpha

23 September 2014

Agenda

� Motivation: new data

� What is crowdsourced alpha?

� Crowdsourced earnings estimates

� Alpha capture

� Financial bloggers

2

August, 2007

� Oops! Our models are all pretty much the same!

3

-4%

-3%

-2%

-1%

0%

1%

2%

3%

4%

5%

6%

7%

7/30/2007

7/31/2007

8/1/2007

8/2/2007

8/3/2007

8/4/2007

8/5/2007

8/6/2007

8/7/2007

8/8/2007

8/9/2007

8/10/2007

8/11/2007

8/12/2007

8/13/2007

8/14/2007

8/15/2007

8/16/2007

8/17/2007

8/18/2007

8/19/2007

8/20/2007

8/21/2007

8/22/2007

8/23/2007

8/24/2007

8/25/2007

8/26/2007

8/27/2007

8/28/2007

8/29/2007

8/30/2007

8/31/2007

Daily returns to 1-day reversal strategy, Aug, 2007 (Khandani and Lo, 2007)

Why?

� Pretty much the same universe…

� Pretty much the same modeling techniques…

� Pretty much the same risk models…

� Pretty much the same alphas!

4

Where to get new data? (1)

� Collect from existing traditional data sets, but dig deeper

� Detailed financial statements

� Industry specific

� Footnotes

� Conference call transcripts

� Broker research reports

� News sentiment

5


� Collect as “exhaust” from alternate sources

� Transactional data (e.g., point of sale)

� Consumer behavior (e.g., foot traffic, web traffic)

6


� Crowdsource it!

� Ask people…

� To collect data for you (e.g., Premise Data Corp)

� For their opinions (the subject of this talk)

� …or collect those opinions from public forums

� Blogs, social media, Amazon reviews,…

7

Caveats

� History often limited (or none if you’ve just started asking people)

� In/out of sample harder

� Cross-sectional coverage often thin

� Often US only

� Nasty formats

� No clean identifiers

� Need to figure out sentiment

8

Agenda




� Alpha capture


9

Defining crowdsourced alpha

� The use of multiple humans’ forecasts to make investment decisions

� The humans needn’t know their forecasts are being used

� They needn’t actually be explicit forecasts

� For example, sentiment on products in a Twitter feed

10

What we need

� A measurable thing to forecast

� Which we need help in forecasting

� Humans to make the forecasts

� A platform for submission or collection of the forecasts

� This often starts with unstructured data

� Often have to build this yourself!

� An incentive for people to contribute their forecasts

� A way to clean up the noise

11

What we want

� A diversity of opinions or knowledge

� A good way to measure skill

� Objective

� Appropriate to the task and to our expectations

� Persistence of skill

� Can only find this if there is a diversity of skill levels

� Incentive to forecast well (not just to forecast)

� Leaderboards, monetary incentives, track record, marketing

12

Example

� IBES

� The original crowdsourced alpha!

� Fulfills all of our needs

� Platform for structured, useful, relatively noise-free forecasts by incentivized humans

� And most of our wants

� Somewhat diverse opinions, persistent skill

� Incentives are mixed

13

We can use the crowd to forecast..

14

What Who Where

EPS, revenues Sell side, IBES, FactSet, CapIQ

buy side, independents, individuals Estimize

Returns Sell side research IBES, FactSet, CapIQ, TipRanks

Sell side sales desks TIM Group

Financial bloggers and newsletters Seeking Alpha, Motley Fool, TipRanks

Individuals PredictWallStreet, StockTwits

Social sentiment Twitterers, Facebookists Gnip, many others

Macro Governments, economists Reuters, Gov't, Consensus Economics, Estimize

M&A Rumor mill Mergerize by Estimize

Strategies Fund managers FoF, multistrat firms

Algo developers Quantopian, Quantconnect

Agenda




� Alpha capture


15

Crowdsourced earnings estimates

� Data from Estimize

� EPS and revenue estimates

� November 2011-2014 on U.S. stocks

� Pseudonymous

� Contributor base

� Buy side, independent, individuals, and students

� Diversity of backgrounds and forecasting methodologies

� Users can contribute biographical information

16

Estimize data

� 25,000 registered users, 75,000 unique viewers of data last quarter

� 4,000 contributors, 17,000 estimates made last quarter

� Coverage (3+ estimates) on 900+ stocks in recent quarters

� Cleaned for errors/noise

� Highly seasonal

17

How accurate?

� For what % of EPS reports is the Estimize consensus closer to actual EPS than is the sell side?

� Just using equally weighted crowd estimates

18

n

% more

accurate

Estimize

error

Wall Street

error

>= 1 analyst 8971 53% 17.3% 17.4%

>= 3 analysts 4916 58% 13.7% 14.5%

>= 10 analysts 1438 62% 11.7% 12.6%

>= 20 analysts 487 62% 12.6% 13.3%

A better benchmark for expectations

19

Estimize Wall Street

N 1 day 2 day 5 day N 1 day 2 day 5 day

IC 4614 0.010 0.016 0.024 4614 (0.018) (0.012) (0.001)

Mean return All surprises 4548 0.14% 0.14% 0.19% 4417 0.08% 0.03% 0.00%

> 1% surprises 4059 0.14% 0.13% 0.16% 4107 0.07% 0.02% -0.01%

> 5% surprises 2521 0.20% 0.20% 0.21% 2755 0.13% 0.06% 0.01%

> 10% surprises 1654 0.20% 0.25% 0.27% 1849 0.10% 0.05% -0.09%

Earnings surprise strategy

20

-40%

-20%

0%

20%

40%

60%

80%

11/3/2011

2/3/2012

5/3/2012

8/3/2012

11/3/2012

2/3/2013

5/3/2013

8/3/2013

11/3/2013

2/3/2014

Cumulative residual return to surprise strategies

1 day holding 5 day holding

Holding period

1 day 5 day

Ann ret 25.7% 10.7%

Ann SD 19.8% 14.5%

Sharpe 1.30 0.73

% days invested 29% 77%

Accuracy is persistent

21

� Require >= 5 prior quarters

� Compute error z score relative to other estimators, adjust by coverage

� Of estimators in the top (bottom) 20% per their prior coverage, what % end up in the top (bottom) 20% going forward?

� Would be 20% if random

Current ----->

Prior Bad Good Persistence

Bad 26.4% 19.4% 7.0%

Good 15.4% 21.0% 5.6%

What makes for an accurate estimate?

22

� Regress estimate-level accuracy (% error) against

� Track record +

� how good has the analyst been in this sector in the past?

� Difficulty of forecasting -

� condition track record on the overall accuracy of the Estimize community

� Expect less accuracy if everyone’s been inaccurate

� Amount of coverage +

� more is better, to a point

� Days to report -

� more recent forecasts contain more information

� Bias +

� higher estimates tend to be more accurate

� Commentary +

� Estimates accompanied by commentary are more accurate

What makes for an accurate estimate?

23

N 23,342

Factor Parameter T p

Track record 0.07 9.38 <.0001

Difficulty (0.03) (2.69) 0.007

Coverage 0.02 3.34 0.001

Days to report (0.10) (11.59) <.0001

Bias 0.14 21.85 <.0001

Comment 0.02 2.07 0.039

Agenda




� Alpha capture


24

Alpha capture

� Data from TIM Group

� Sell side trading desks produce short-term (5-30 day) trade ideas for select hedge fund clients

� These are distinct from research desks’ recommendations

� Incentive: paid for in commissions!

� Global, 2006-2014

� 300 brokers, 3000 contributors (“authors”) providing stock level forecasts

� 64% of ideas are Long

25

Alpha capture event study

26

-0.4%

-0.3%

-0.2%

-0.1%

0.0%

0.1%

0.2%

0.3%

0.4%

-5 0 5 10 15 20

Cumulative signed mean return to ideas by region

North America Europe Asia

� Residual Returns

� Require > US$100mm market cap, > US$4 equivalent, > US$1mm ADV

Stronger & longer for small caps

27

-0.4%

-0.2%

0.0%

0.2%

0.4%

0.6%

-5 0 5 10 15 20

Cumulative signed mean return to ideas by size

Large Mid Small

Ideas just after earnings are weaker

28

-0.4%

-0.3%

-0.2%

-0.1%

0.0%

0.1%

0.2%

0.3%

0.4%

-5 0 5 10 15 20

Cumulative signed mean return to ideas by earnings date, North America

Post-earnings others

Author performance is persistent

� Require at least 3 ideas over the last 2 years

� Measure performance by a blend of average return, Sharpe ratio, hit rate

� Using residual returns

29

Current ----->

Prior Bad Good Persistence

Bad 26.1% 17.9% 8.3%

Good 18.8% 27.6% 8.9%

Combining the insights:The TIM Indicator

30

Agenda




� Alpha capture


31

Financial bloggers

� Data from TipRanks

� U.S., 2010-2014

� 65 financial blogs’ data. 122,000 recommendations from 4000+ authors on 2000+ stocks, collected in real time

� NLP used to algorithmically determine buy/sell recommendation from the blog

� 84% Long

32

0

500

1000

1500

2000

2500

3000

3500

201009

201103

201109

201203

201209

201303

201309

201403

Blogger recommendations per month

Nbuy Nsell

Why blogs?

� Many include in-depth research by independent analysts, not captured in other data sets like news, broker research

� Contributors include buy side and industry experts

� Contributors are often compensated for providing original research, and typically disclose their positions

� Varying editorial standards across blogs, for example some are trying to upsell to premium content

33

Blogger event study

� Residual Returns

� Require > $100mm market cap, > $4, > $1mm ADV

34

-0.3%

-0.2%

-0.1%

0.0%

0.1%

0.2%

0.3%

-10 0 10 20 30 40 50 60

Cumulative residual returnsblogger recommendations

Buy Sell

TipRanks Expert Sentiment Signal (TRESS)

� Same universe, Market neutral “deciles”

� 18.8% annualized returns, Sharpe 2.13

35

-10%

0%

10%

20%

30%

40%

50%

60%

70%

80%

9/1/2010

12/1/2010

3/1/2011

6/1/2011

9/1/2011

12/1/2011

3/1/2012

6/1/2012

9/1/2012

12/1/2012

3/1/2013

6/1/2013

9/1/2013

12/1/2013

3/1/2014

6/1/2014

So…

� Lots of interesting new data out there (finally!)

� We need to be OK with limited history for many data sets

� Crowdsourcing, broadly defined, seems to add value!

� Let’s crowdsource more things!

36

Thanks!

[email protected]

www.extractalpha.com

37

macquarie global quant conference, hong kong, 22...

Documents