noise bottleneck nnt
TRANSCRIPT
The Noise Bottleneck or How Noise Explodes Faster than Data
(very Brief Note for the Signal Noise Section in Antifragile)
Nassim N TalebAugust 25, 2013
The paradox is that increase in sample size magnifies the role of noise (or luck).
Keywords: Big Data, Fooled by Randomness, Noise/Signal
PRELIMINARY DRAFT
IntroductionIt has always been absolutely silly to be exposed the news. Things are worse today thanks to the web.
We are getting more information, but with constant "consciouness", "desk space", or "visibility". Google News,Bloomberg News, etc. have space for, say, <100 items at any point in time. But there are millions of events every day.As the world is more connected, with the global dominating over the local, the number of sources of news is multiply-ing. But your consciousness remains limited. So we are experiencing a winner-take-all effect in information: like a largemovie theatre with a small door.Likewise we are getting more data. The size of the door is remaining constant, the theater is getting larger.
The winner-take-all effects in information space corresponds to more noise, less signal. In other words the spuriousdominates.Similarity with the Fooled by Randomness Bottleneck. This is similar to my idea that the more spurious returnsdominate finance as the number of players get large, and swap the more solid ones. Start with the idea (see Taleb 2001),that as a population of operators in a profession marked by a high degrees of randomness increases, the number of stellarresults, and stellar for completely random reasons, gets larger. The “spurious tail” is therefore the number of personswho rise to the top for no reasons other than mere luck, with subsequent rationalizations, analyses, explanations, andattributions. The performance in the “spurious tail” is only a matter of number of participants, the base population ofthose who tried. Assuming a symmetric market, if one has for base population 1 million persons with zero skills andability to predict starting Year 1, there should be 500K spurious winners Year 2, 250K Year 3, 125K Year 4, etc. Onecan easily see that the size of the winning population in, say, Year 10 depends on the size of the base population Year 1;doubling the initial population would double the straight winners. Injecting skills in the form of better-than-randomabilities to predict does not change the story by much. (Note that this idea has been severely plagiarized by someone,about which a bit more soon).Because of scalability, the top, say 300, managers get the bulk of the allocations, with the lion's share going to the top30. So it is obvious that the winner-take-all effect causes distortions: say there are m initial participants and the "top" kmanagers selected, the result will be k
m managers in play. As the base population gets larger, that is, N increases
linearly, we push into the tail probabilities. Here read skills for information, noise for spurious performance, and translate the problem into information and news.
The paradox: This is quite paradoxical as we are accustomed to the opposite effect, namely that a large increases insample size reduces the effect of sampling error; here the narrowness of M puts sampling error on steroids.
Here read skills for information, noise for spurious performance, and translate the problem into information and news.
The paradox: This is quite paradoxical as we are accustomed to the opposite effect, namely that a large increases insample size reduces the effect of sampling error; here the narrowness of M puts sampling error on steroids.
DerivationsLet Z ª Izi
jM1< j<m, 1§i<nbe a (n × m) sized population of variations, m population series and n data points per distribution,
with i, j œ N; assume “noise” or scale of the distribution s œ R+ , signal m ¥0 . Clearly s can accommodate distribu-tions with infinite variance, but we need the expectation to be finite. Assume i.i.d. for a start.
Cross Sectional (n = 1)Special case n = 1: we are just considering news/data without historical attributes.
Let F¬ be the generalized inverse distribution, or the quantile, F¬HwLã inf 8t œ R : F HtL ¥ w<, for all nondecreasingdistribution functions FHxL ª PHX < xL. For distributions without compact support, w œ (0,1); otherwise w œ @0, 1D. Inthe case of continuous and increasing distributions, we can write F-1 instead.The signal is in the expectaion, so E HzL is the signal, and s the scale of the distribution determines the noise (which for aGaussian corresponds to the standard deviation). Assume for now that all noises are drawn from the same distribution.
Assume constant probability the "threshold", z= km
, where k is the size of the window of the arrival. Since we assumethat k is constant, it matters greatly that the quantile covered shrinks with m.
Gaussian NoiseWhen we set z as the reachable noise. The quantile becomes:
F-1HwL = 2 s erfc-1H2wL + m
Where erfc-1is the inverse complementary error function.
Of more concern is the survival function, F ª F HxL ª PHX > xL, and its inverse F-1
(1)F-1s,mHzL = - 2 s erfc-1 2
k
m+ m
Note that s (noise) is multiplicative, when m (signal) is additive.
As information increases, z becomes smaller, and F-1 moves away in standard deviations. But nothing yet by compari-son with Fat tails.
2 Noise Bottleneck.nb
0.02 0.04 0.06 0.08 0.10z
5
10
15
20
F¬
1
2
3
4
Figure 1: Gaussian, s={1,2,3,4}
Fat Tailed Noise
Now we take a Student T Distribution as a substitute to the Gaussian.
(2)f HxL ª
a
a+Hx-mL2
s2
a+1
2
a sBI a2
, 12M
Where we can get the inverse survival function.
(3)g-1s,mHzL = m + a s sgnH1 - 2 zL1
IH1,H2 z-1L sgnH1-2 zLL-1 Ia
2, 12M- 1
Where I is the generalized regularized incomplete Beta function IHz0,z1LHa, bL = BHz0,z1LHa,bL
BHa,bL, and BzHa, bL the incomplete
Beta function BzHa, bL ‡ Ÿ0zta-1 H1 - tLb-1 dt. B Ha, bL is the Euler Beta function
BHa, bL ‡ GHaL GHbL êGHa + bL ‡ Ÿ01ta-1 H1 - tLb-1 dt.
Noise Bottleneck.nb 3
2.µ10-7 4.µ10-7 6.µ10-7 8.µ10-7 1.µ10-6z
2000
4000
6000
8000
10000g¬
1
2
3
4
Figure 2: Power Law, s={1,2,3,4}
As we can see in Figure 2, the explosion in the tails of noise, and noise only.
Part 2 of the discussion to come soon.
4 Noise Bottleneck.nb