noise

GEOPHYSICS, VOL. 63, NO. 4 (JULY-AUGUST 1998); P. 11221124, 2 FIGS.

What is noise?

John A. Scales and Roel Sniederz

Lest men suspect your tale untrue,Keep probability in view

John Gay

The concept of noise plays a crucial role in the statisti-cal analysis of data. As an example of a noisy record considerFigure 1 that shows the ground motion of the seismologicalstation NE51 in St. Petersburg after an earthquake in Egypt.(In earthquake seismology, periods may be orders of magni-tude larger than in exploration seismology, but the principlesare the same.) This time series shows no distinct arrivals orother apparent signatures of an organized nature. Given theproximity of the recording station to a major population cen-ter and to the coast, such a noisy record does not seem to bevery surprising.

But what is noise exactly? In the context of seismic prospect-ing, Dobrin and Savit (1988) define noise as spurious seismicsignals from ground motion not associated with reflections.They have in mind such things as surface waves, near-surfacereverberations and so on; coherent but uninteresting signalin other words. Fair enough. One might dispute the use ofthe term noise here, but these authors are certainly withintheir rights to identify certain signal as being uninteresting.But they go on to speak of incoherent noise, sometimes re-ferred to as random noise ... usually associated with scatteringfrom near-surface irregularities. (Emphasis in the original.)By identifying random noise with incoherency they have sailedinto rough waters. For although the signal associated with scat-tering from near-surface irregularities may well be incoherent(though that is debatable), it is clearly reproducible, so does itmake sense to call it random? And further, there is no law thatsays that random processes must be uncorrelated. (Just take anuncorrelated white process and apply a smoothing operatorto it.)

It turns out to be extraordinarily difficult to give a precisemathematical definition of randomness, so we wont try. (Abrief perusal of randomness in Volume 2 of Knuths great TheArt of Computer Programming is edifying and frustrating inequal measures.) In any case, it is more satisfying undoubtedlyto think in terms of observations of physical experiments. Hereis Parzens (1960) definition, which is as good as any:

Dept. of Geophysics and Center for Wave Phenomena, Colorado School of Mines, Golden, CO 80401. E-mail: [email protected]. of Geophysics, Utrecht University, P.O. Box 80.021, 3508 TA Utrecht, The Netherlands. E-mail: [email protected] 1998 Society of Exploration Geophysicists. All rights reserved.

A random (or chance) phenomenon is an empiricalphenomenon characterized by the property that itsobservation under a given set of circumstances doesnot always lead to the same observed outcomes (sothat there is no deterministic regularity) but ratherto different outcomes in such a way that there isstatistical regularity. By this is meant that numbersexist between 0 and 1 that represent the relative fre-quency with which the different possible outcomesmay be observed in a series of observations of in-dependent occurrences of the phenomenon. ... Arandom event is one whose relative frequency of oc-currence, in a very long sequence of observations ofrandomly selected situations in which the event mayoccur, approaches a stable limit value as the num-ber of observations is increased to infinity; the limitvalue of the relative frequency is called the pro-bability of the random event.

It is precisely this lack of deterministic reproducibility that al-lows us to reduce random noise by averaging over many repeti-tions of the experiment. Using this definition, the incoherentnoise of Dobrin and Savit (1988) is not random.1

But why should we care about the definition of noise? Asgeophysicists, the data at our disposal will always contain somefeatures that we will not bother to explain. If we acceptedour data as being absolutely precise and reproducible, then nomodel whose response disagreed with the observations evento the slightest degree could be correct. But we dont believethat our data are exact and exactly reproducible. And further,because we cannot calculate the exact response of our Earthmodels (because we cannot afford to put all the physics onthe computer) and because we have only approximate mod-els anyway (we cannot use an infinite number of parameters),there are likely to be deterministic aspects of the data that we

1The term coherency spectrum was coined by Wiener to denote theabsolute value of the cross covariance of two signals divided by thesquare root of the product of the respective autocovariance functions(cf. Priestley, 1981, 661). If two stationary processes are uncorrelated,for example, if they are independent, then the coherency spectrum iszero at all frequencies.

1122

What Is Noise? 1123

cannot or do not want to explain. Keep in mind, however, thatwith enough degrees of freedom one can fit any data, even if itsnot worth fitting. And the resulting model might be excessivelycomplicated or physically unreasonable.

In fact, in many situations noise is highly reproducible be-tween different experiments and corresponds therefore to adeterministic process. For example, let us return to the seismo-gram of Figure 1. In Figure 2 the same seismogram is shown(on the same scale) but now the signal before the first arrivingP-wave around 400 s is shown as well. The signal before theP-wave consists purely of ambient noise. It can be seen thatthis noise level is negligible compared to the later parts of thesignal. This means that the signal shown in Figure 1 is Earthresponse that corresponds to a multitude of different arrivalsrather than random noise. Some of these arrivals can be ex-plained by a simple 1-D earth model, but as shown by Neeleand Snieder (1991), this part of the signal also contains exotic

FIG. 1. Vertical ground motion recorded by the NARS sta-tion NE51 in St. Petersburg after an earthquake in Egypt (22November 1995). The time series is low-pass filtered with acorner frequency of 0.1 s.

FIG. 2. Same series as in Figure 1 but for a larger time interval.

arrivals such as body waves that have been converted to sur-face waves. It may be that one chooses not to explain this partof the signal, discarding it as noise.

In doing so, one must have some way of separating those datawe want to explain from those we do not. So a more generaldefinition of noise in the vein of Dobrin and Savit would be:noise is that part of the data that we choose not to explain.2

Now, this noise may or may not be deterministic. Clearly thereis unwanted deterministic data (neglected physics for instance,such as ground rollalthough we personally find ground rollrather interesting, e.g., Gabriels et al. (1987)).

It is not clear to what extent deterministic signals such asunmodeled Earth response can be treated as noise. Howeversome features in geophysical data can usefully be modeled byrandom processes. By this we mean the following. Suppose weisolate some part of the data that we wish to model stochasti-cally. And suppose further that we can construct a probabilitylaw with samples that are statistically indistinguishable fromthe noise we are modeling. Then we can usefully call this signalrandom.

For instance, suppose we use a random number generator ona computer to approximate samples of a normal distribution.The algorithms used by such generators are entirely determin-istic, the output being more properly described as pseudoran-dom numbers. [See Knuth (1981) for details.] Let us supposethat this random number generator is very good and its samplespass any test of Gaussianity that is known. Then, even thoughthese numbers are clearly deterministic, they can reasonablybe modeled as being Gaussian random numbers. Ultimatelyit doesnt really matter whether nature admits truly randomprocesses or not, unless youre doing quantum mechanics.

As a more concrete example, suppose we have a well logthat we have modeled as a Markovian process. In other words,we have estimated the n-dimensional joint distribution func-tion of a Markovian process, one realization of which we taketo be the well log. Operationally, we could do this by makinghistograms to approximate the 1- and 2-D marginals, which aresufficient to describe a Markovian process. Then we generatepseudorandom samples of this model Markovian process. Thisworks exactly like the pseudorandom number generator de-scribed above, but the samples come out in accordance with thisMarkovian distribution that we have estimated. We pass theoriginal well log and one of the pseudorandomly simulated logsto a fancy statistical hypothesis test and it says that at the 99%confidence level, for instance, the two are drawn from the samedistribution. Then, whether or not we believe the real log is a re-alization of a random process, we can usefully model it as such.

So what does all this have to do with geophysics? There arethree main implications:

1) Stacking of data.The notion that averaging over re-peated realizations of an experiment (stacking) reducesnoise (compared to signal) presupposes that the noise inthe different experiments is uncorrelated because onlythen do we get the desired noise-suppression. The crite-rion here is that the correlation in the noise is zero fordifferent experiments.

2We are purposely using the term explain rather than fit sincethe latter seems to be burdened with certain psychological baggage.But note that fitting is nothing more than a quantitative attempt atexplanation.

1124 Scales and Snieder

2) Prescription of a priori errors in Bayesian inversion.Insuch an inversion we need to prescribe the joint distri-bution function of the data errors (e.g., the data covari-ance matrix and mean if the distribution is assumed to beGaussian). It is fine to include signal-generated noisein this (although a purist might argue against this). How-ever, this type of noise will in general be highly correlated(between different samples, between different shots, andbetween different receivers). The correlations are cru-cial here, but in practice they may be difficult to quantify(Gouveia and Scales, 1998).

3) Making the decision how well to fit the data.In a least-squares fitting procedure, one must use chi-square orsome other measure of the misfit to determine how wellthe model explains the data. Here, one needs to knowwhich parts of the data are real (or interesting) and whichpart should be considered noise. (The latter should notbe fitted.) This is not only an issue of noise levels or corre-lations (e.g., knowing the mean and variance of Gaussiannoise), but in some applications, it is important to identifydata and noise in a more subtle way. For example, groundroll may be considered noise that should not be fitted byweird reflectors.

This means that the issue what is noise is of more than aca-demic interest. If we define noise as being that data we choosenot to fit, then we must have a model that explains the rest ofthe data. If not, this could be a sign that the noise is carryingimportant information. The association of noise with nonde-terministic processes may be misleading since the concept ofnoise is also used as the garbage-can of unexplained determin-istic phenomena. A treatment of this type of noise on purelystatistical grounds may lead to conceptual as well as practicalproblems.

REFERENCES

Dobrin, M., and Savit, C. H., 1988, Introduction to geophysical pros-pecting: Mc-Graw Hill Book Co.

Gabriels, P., Snieder, R., and Nolet, G., 1987, In situ measurements ofshear-wave velocity in sediments with higher-mode Rayleigh waves:Geophys. Prosp., 35, 187196.

Gouveia, W., and Scales, J. A., 1998, Bayesian seismic waveform in-version: Parameter estimation and uncertainty analysis: J. Geophys.Res., 103, 27592779.

Knuth, D., 1981, The art of computer programming, Vol II: AddisonWesley.

Neele, F., and Snieder, R., 1991, Are long-period body wave codacaused by lateral heterogeneity?: Geophys. J. Int., 107, 131153.

Parzen, E., 1960, Modern probability theory and its applications: JohnWiley & Sons, Inc.

Priestley, M. B., 1984, Spectral analysis of time series: Academic Press,Inc.

noise

Documents