download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

24
Government of India & Government of The Netherlands DHV CONSULTANTS & DELFT HYDRAULICS with HALCROW, TAHAL, CES, ORG & JPS VOLUME 8 DATA PROCESSING AND ANALYSIS Reference Manual

Upload: hydrologyproject001

Post on 24-May-2015

89 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Government of India & Government of The Netherlands

DHV CONSULTANTS &DELFT HYDRAULICS withHALCROW, TAHAL, CES,ORG & JPS

VOLUME 8DATA PROCESSING AND ANALYSIS

Reference Manual

Page 2: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page i

Table of Contents

1 PRODUCTION OF CONTOUR MAPS 1-1

1.1 INTRODUCTION 1-1

1.1.1 MANUAL AND AUTOMATIC CONTOURING 1-11.1.2 STRATEGY ADOPTED IN THE DEDICATED SOFTWARE 1-21.1.3 CLASSIFICATION OF ALGORITHMS 1-21.1.4 ALGORITHMS INCLUDED 1-4

1.2 TREND SURFACE ANALYSIS 1-4

1.2.1 BASIC FEATURES 1-41.2.2 INITIAL CHOICE OF THE POLYNOMIAL 1-41.2.3 ESTIMATION OF THE COEFFICIENTS 1-41.2.4 OPTIMAL DEGREE 1-51.2.5 OPTIMAL FORM 1-51.2.6 FINAL ACCEPTABILITY 1-51.2.7 ANALYSIS OF RESIDUES 1-51.2.8 MATHEMATICAL MANIPULATIONS 1-61.2.9 SUITABILITY CRITERIA 1-6

1.3 KRIGING 1-7

1.3.1 BASIC FEATURES 1-71.3.2 THE VARIOGRAM 1-81.3.3 GENERATION OF DISCRETE VARIOGRAM 1-81.3.4 MODELLING OF THE VARIOGRAM 1-91.3.5 INTERPOLATION BY KRIGING 1-101.3.6 MATHEMATICAL MANIPULATIONS 1-101.3.7 SUITABILITY CRITERIA 1-11

1.4 UNIVERSAL KRIGING 1-11

1.4.1 COMPUTATIONAL DETAILS 1-12

1.5 SPLINE FUNCTIONS 1-12

1.5.1 BASIC FEATURES 1-121.5.2 COMPUTATIONAL DETAILS 1-121.5.3 MATHEMATICAL MANIPULATIONS 1-121.5.4 SUITABILITY CRITERIA 1-13

1.6 SUGGESTED READING 1-13

2 TIME SERIES ANALYSIS 2-1

2.1 INTRODUCTION 2-12.2 AUTO/CROSS CORRELATION ANALYSIS 2-1

2.2.1 CORRELATION 2-12.2.2 AUTO/CROSS CORRELATION 2-22.2.3 CORRELOGRAM 2-3

2.3 STATIONARITY ANALYSIS 2-3

2.3.1 STATIONARY TIME SERIES 2-32.3.2 TESTS OF STATIONARITY 2-32.3.3 LEVELLING OF SERIES 2-5

2.4 SPECTRAL ANALYSIS 2-6

2.4.1 THE HARMONICS 2-72.4.2 VARIANCE OR POWER OF A HARMONIC 2-72.4.3 PERIODOGRAM 2-82.4.4 CONTINUOUS POWER SPECTRUM 2-82.4.5 FILTERS 2-8

2.5 CO-KRIGING 2-92.6 SUGGESTED READING 2-9

Page 3: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-1

1 PRODUCTION OF CONTOUR MAPS

1.1 INTRODUCTION

Contours of groundwater level (that is, water table and piezometric head) and of other related spatiallyvarying quantities like quality parameters, aquifer characteristics and rainfall are required for a varietyof computations aimed at the quantitative estimation of groundwater resources and groundwaterquality (contaminant transport). Analysing the following data produces these contours:

• Coordinates of the gauge points that is, observation wells / piezometers / pumping test sites /raingauges etc. The coordinates are henceforth termed as spatial data.

• Recorded/estimated information (water table elevations, piezometric elevations, rainfall, aquiferparameters etc.,) at the gauge points. This information is henceforth termed as attribute data.

The attribute maintains the same numerical value along a contour and hence a contour is also termedas an iso-line. The contours are usually processed to perform a variety of tasks likegridding/interpolation, integration (spatial averaging, storage estimation) and differentiation (velocitycalculation).

1.1.1 MANUAL AND AUTOMATIC CONTOURING

The production of contours and the processing can be accomplished by a manual procedure or by acomputer assisted automatic procedure. The latter has to be based upon a pre selected algorithm.The relative merits/demerits of the two procedures are as follows:

Relative merits/demerits

The manual procedure essentially involves drawing of contours by visual inspection of the plottedattribute data. The main advantage of this procedure is that it permits assimilation of many tangible aswell as intangible hydrogeological features into the contours. This advantage, however, occurs only ifthe contouring is done at a large enough scale by an experienced and intuitive hydrogeologist.

The main handicap of the manual procedure is its slowness especially in respect of processing likegridding, integration and differentiation. Such processing would have to be done after the contours areplotted. On the other hand if the contouring is done by a computer procedure, such processing can beaccomplished simultaneously without any significant increase in the computer time.

The other problem of the manual procedure is its inherent subjectivity and lack of consistency. Thecomputer procedure overcomes this problem in the sense that different persons would produce thesame set of contours as long as they use the same algorithm. This permits a consistent revision of thecontours upon availability of additional data. However, though the human subjectivity is removed, thesubjectivity in the choice of the algorithm is added. This may not be less severe than the humansubjectivity.

It follows from the preceding discussion that the main incentives for adopting an automatic contouringprocedure are as under:

• It is quick, especially if the contour plotting is to be followed by gridding, etc.

• The result can easily be reproduced

• It permits a consistent revision of the contours upon the availability of additional data.

Page 4: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-2

The main dis-incentives are as follows:

• Automatically produced contours may be inconsistent with hydrogeological features, especially ataquifer boundaries, surface water bodies and sharp elevation changes.

• There is a considerable subjectivity in the choice of the algorithm.

1.1.2 STRATEGY ADOPTED IN THE DEDICATED SOFTWARE

An optimal strategy has been adopted in the dedicated software namely; the manual and theautomatic procedures are combined, making use of their positive traits.

Right at the beginning, the user selects an appropriate algorithm of automatic contouring from anarray of built in algorithms. The necessary guidelines are given in the subsequent sections.

Thereupon, the computer produces the contours and displays them on the VDU (screen of thecomputer). The contours are studied on the VDU by the hydrogeologist. If necessary, a hard copy ofthe contours may be taken. He decides whether the contours need any modification. If foundnecessary, he modifies the contours. The modifications are digitised and stored in the computermemory for subsequent processing. The maximum benefit of such a human intervention occurs whileincorporating existing hydrogeological features. Thus, the contours can be twisted and fine-tuned tomake them compatible to a river boundary or to a dyke.

The above strategy shall be implemented in the following steps:

• The user specifies the grid spacing and the contour interval.

• The user selects an algorithm of interpolation from an array of built-in algorithms.

• A user-specified algorithm analyzes the available spatial and attribute data of the gauge points.

• The analysis provides the interpolated spatial data of the attribute at the nodal points of thespecified grid. The spatial data at the nodal points are termed as the Raster data.

• The Raster data are processed to generate and store the spatial data of the desired contours.The spatial data of a contour shall essentially comprise coordinates of points lying on it. Suchdata are termed as Vector data.

• The user, using his professional judgement and guided by relevant information (thematic maps,hydrogeological cross sections etc.) may edit some or all the contours manually.

• Vector data and hence the Raster data corresponding to the edited contours are generated andstored together with the data corresponding to the unedited contours (if any).

1.1.3 CLASSIFICATION OF ALGORITHMS

The algorithms of the automatic contouring can be classified as per the following criteria:

Basic structure

The algorithms can involve two kinds of computations. The first kind essentially involves computing aweighted mean of the observed data. There are different criteria for estimating the weights. Thesecond kind involves fitting a surface to the observed data. Mathematically, the surface can be viewedas a functional relation between the attribute and the spatial coordinates.

Page 5: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-3

Reproduction of the observed data

An automatic contouring shall essentially involve interpolation at a large number of points. Many ofthese points may be overlapping (or almost overlapping) with the locations of the gauge points. Thereare two possibilities in respect of the interpolated values at such points. The first possibility is thatthese interpolated values are exactly (or almost) equal to the respective attribute data. An automaticcontouring algorithm that permits such exact reproduction of the observed data is termed as exactalgorithms. The second possibility is that the interpolated levels may deviate (widely at times) from therespective data. Such algorithms are known as smoothening algorithms. Since the smoothening ismostly accomplished by minimising sum of squares of the deviations, these may also be termed asleast squares algorithms.

It may be noted here that the nomenclature of exact algorithm is a misnomer since an error-freeinterpolation is no way assured.

A least squares algorithm smoothens the data. Thus, one may prefer it, if the data are known tocomprise non-systematic errors (that is, random errors with zero mean). The smoothening in that casemay attenuate the data errors.

Deterministic and Stochastic algorithms

There are two ways of looking at the spatial continuity. It may be viewed as a process that can bepredicted exactly (for example, by solution of a differential equation). Such a process is termed as adeterministic process. A variable resulting from a deterministic process is known as a deterministicvariable. The other viewpoint is that the process is too complex to permit an exact prediction. Such aprocess is termed as a stochastic process and the resultant variable as stochastic variable.

Typically, a stochastic variable may be predicted as a mean and a band of possible variation aroundit. The band width is dependent upon the desired level of confidence. The confidence level has to beless than 100%. The band width corresponding to a complete assurance, i.e., 100% confidence levelshall be infinity. Typical selected confidence levels are 95% and 99%.

The interpolation algorithm may treat the attribute either as a deterministic or as a stochastic variable.The advantage of a stochastic approach is that it permits error estimation of the interpolated attribute.This can assist in evaluation/ improvement of the monitoring network.

Local or regional trends

Some algorithms are good at capturing the regional trend whereas others may hold only for localtrends.

Neighbourhood algorithms

An algorithm that uses every data point in every calculation can be termed as a uniqueneighbourhood algorithm. On the other hand a moving neighbourhood scheme uses only a part of thedata for each interpolation. In practice the data points are chosen in the vicinity of the interpolationpoint. The moving neighbourhood could be selected arbitrarily or rationally.

Theoretical rigor

Some algorithms have a theoretical justification whereas the others are purely empirical. Other thingsbeing equal, a theoretically rigorous algorithm is obviously preferable.

Page 6: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-4

1.1.4 ALGORITHMS INCLUDED

The algorithms included in the text are: Trend surface analysis (polynomial approximation), Kriging(including Universal Kriging) and Spline functions.

There are many other methods of interpolation reported in literature. Some of these are: Nearestneighbourhood method, Rolling mean method, and Weighted moving average. These methods havevery little or no theoretical justification and are also devoid of any conceptual elegance. Hence, theyhave not been included in the dedicated software.

1.2 TREND SURFACE ANALYSIS

1.2.1 BASIC FEATURES

The method involves approximating the functional relation between the attribute (h) and the spacecoordinates (x,y) by a polynomial of the space coordinates.

)y,x(fa)y,x(h jj∑≅

where a are the coefficients of the polynomial to be determined and f are the prescribed terms, i.e.,basis functions of the spatial coordinates.

It is a surface fitting, least squares, stochastic, unique neighbourhood technique with some theoreticaljustification. It is ideally suitable for producing small-scaled contour maps of the regional trend. It canalso be used for isolating the local trends. It is unsuitable for a data set devoid of any regional trend.

1.2.2 INITIAL CHOICE OF THE POLYNOMIAL

A tentative choice of the approximating polynomial is made in accordance with the following criteria:

• The degree of the polynomial may be decided by examining the spatial gradients. In case thegradients are uniform, a first-degree polynomial, comprising three terms may be chosen. On theother hand, spatially varying gradients may indicate the necessity of adopting a second-degreepolynomial comprising six terms.

• The number of the coefficients to be determined should be far less than the number of datapoints. The difference between the number of data points and the number of unknowncoefficients is known as the degree of freedom.

Initially a full polynomial of the chosen degree may be adopted.

1.2.3 ESTIMATION OF THE COEFFICIENTS

The coefficients of the polynomial are estimated by regression analysis. The analysis involves arrivingat such estimates of the coefficients which minimise the residual variance, that is, sum of the squaresof the differences between the observed and the computed values of the attribute at the gaugingpoints.

Page 7: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-5

1.2.4 OPTIMAL DEGREE

The optimal degree of the polynomial is decided by the criteria of minimum standard error. Thestandard error of the regression is defined as the square root of minimized residual variance dividedby the degree of freedom. Thus, regression is carried out sequentially with full polynomials ofincreasing degree till the standard error reaches a minimum and then onwards starts increasing.

1.2.5 OPTIMAL FORM

The optimal form of the polynomial of the chosen degree is arrived at by checking the significance ofthe contribution of each term by the test. The terms not found to be contributing significantly aredropped and the coefficients are recalculated.

The performance of the thus truncated polynomial is compared with that of the full polynomial by the Ftest. The performance or rather lack of it, is quantified by the minimised residual variance (MRV). Ifthe MRV of the truncated polynomial is found to be statistically larger than the MRV of the fullpolynomial, the full polynomial is retained. Otherwise the truncated polynomial is adopted.

1.2.6 FINAL ACCEPTABILITY

The acceptability of the polynomial is checked by computing the goodness of the fit (RSQ), given bythe following expression:

RSQ = 1- MRV/VAR

where VAR is the variance of the attribute data. The variance may be viewed as a measure of scatterof the data around their mean. Thus, goodness of the fit represents the fraction of the scatterexplained by the polynomial. If RSQ is close to 1. the polynomial fit may be considered to beacceptable. If it is close to zero (or is negative), it may be inferred that the polynomial fit is just aboutas good as (or worse than) averaging the attribute data. Thus, the polynomial fit may be rejected.

1.2.7 ANALYSIS OF RESIDUES

A residue at a gauging station is defined as the difference between it’s attribute data and thepolynomial value corresponding to it’s spatial coordinates. Assuming that the chosen polynomial hasexplained adequately the regional trend and has smoothened the data noise, a residue can bevisualized as a measure of gauging error and/or a local trend at the respective station.

Though it may not be possible to split up the residues among these two components, it may bepossible to pick up the predominant component of the residues. This is done by computing thestandard residues. A standard residue is the residue divided by the standard error. Further,autocorrelations are computed between the residues at different distances. Having done that, thefollowing two possibilities are examined:

Identification of regional trend

If the standard residues lie in a range (-2 to +2) and the computed autocorrelations are insignificant, itmay be inferred that the data are devoid of any local trends and the residues may indeed representthe measurement errors in the data. The contours may thus be treated as a representation of theregional trend of the attribute.

Page 8: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-6

Identification of local trends

If some standard residues lie outside the range (-2 to +2) and the residues are auto correlated, thepresence of local trends may be inferred at 95% confidence level. Local trends imply that the truesurface deviates from the polynomial surface consistently in regions larger than the samplingintervals. This invalidates the statistical tests and hence the evolved polynomial.

Therefore, it is necessary to evolve a new polynomial, which generally follows the regional trend only.This is done by revising the fit by adopting a first or at the most second degree full polynomial. Thechoice between the first and second degree can be made by visually examining the regional trend ofthe data. Fitting of the polynomial of the chosen degree shall involve only the estimation ofcoefficients. The statistical tests, which are invalidated due to violation of the basic assumptions, areunnecessary.

The revised polynomial and hence the resulting contours may represent mostly the regional trend.The residues shall represent the local trends and possibly some measurement errors. Assuming themeasurement errors to be small enough, the residues may be treated as a representation of the localtrends. Local trends may be contoured using Kriging.

1.2.8 MATHEMATICAL MANIPULATIONS

The evolved polynomial relation between the attribute and the spatial coordinates can be employed toperform the following mathematical manipulations.

Interpolation

The attribute at any ungauged point can be estimated by merely substituting the coordinates of thepoint in the polynomial relation.

Gradients

The polynomial function can be differentiated analytically. Thus, the spatial gradients of the attribute atany point can be estimated by substituting it’s coordinates in the respective derivatives of thepolynomial.

Spatial averaging

The spatial average value of the attribute in any given domain can be estimated by integration of thepolynomial over the domain and dividing the integral by the area of the domain.

1.2.9 SUITABILITY CRITERIA

Merits

• Polynomials are well-behaved mathematical functions. With adequate degree, they cantheoretically approximate any continuous function. Thus, there is a theoretical rationale in the useof polynomial functions, provided the data display significant regional trend with little or no localtrends.

• A least square polynomial, apart from providing the spatial variation of the attribute, alsosmoothens the data. The data smoothening can be viewed as possible attenuation (notelimination) of the data errors, provided there are no significant local trends and the errors arenormally distributed with zero mean (that is, there are no systematic errors).

Page 9: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-7

• Further, polynomials can also be used for checking whether a given data set comprises any localtrends and subsequently also to quantify the local trends if present.

Thus, polynomials are suitable for contouring the regional trends on small-scale maps.

Demerits

The polynomials are of little use if the data display only the local trends with no or insignificantregional trend.

1.3 KRIGING

1.3.1 BASIC FEATURES

Kriging is a weighted mean, stochastic, exact, moving neighbourhood interpolation technique basedupon the theory of regionalised variables. The spatial spread of the moving neighbourhood around theinterpolation point may be decided rationally. The technique has a theoretical rigor. It ismathematically elegant since under certain assumptions, it assures the best and unbiasedinterpolation. However, the underlying assumption that is, absence of a regional trend among theattribute data, is very rarely satisfied. Universal Kriging is able to avoid this assumption. The mainutility of Kriging thus, lies in its ability to produce contours of local trends.

Regionalised variable

Kriging treats the attribute as a regionalised variable. A regionalised variable is a random function witha spatial continuity. The spatial continuity implies that the recorded values of the variable at closelocations will be more similar than the values recorded at widely spaced locations. However, thecontinuity is considered to be too complex to be described by a deterministic function. The spatialcontinuity is expressed as a variogram. The same variogram is assumed to hold over the entiredomain of the variable.

Requirement of stationarity

This uniqueness of the variogram is valid provided the variable is stationary over the entire domain.The stationarity essentially implies that there is no well-defined trend in the spatial variation of thevariable. Thus, the variable changes only locally without any regional trend. The assumption ofstationarity has the following two components:

• The variable has the same mean value over the entire domain; e.g., it undulates locally aroundthe same mean everywhere. Thus, if the domain is divided into small pockets, then the meanvalues of the variable in each pocket tend to be same. This is known as the stationarity of thefirst order. Though there is an unavoidable subjectivity in sizing of the pockets, the check is easyto make and may be robust enough.

• The second component relates to the spatial covariance of the variable. The spatial covariance isessentially the covariance of the variable values recorded at a certain distance apart that is, autocovariance with a lag equal to the distance. The assumption states that the spatial covariancecorresponding to any distance is independent of the location. Thus, it is assumed to be uniqueover the entire domain of the variable. This implies that the local undulations follow a similarspatial trend all over the domain. Thus, if the domain is divided into number of small pockets, theauto covariance corresponding to any distance tends to be the same in each pocket. This isknown as stationarity of second order.

Page 10: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-8

1.3.2 THE VARIOGRAM

Figure 1.1:Variogram

The variogram (see Figure 1.1) is essentially a plot of semi variance (γ) against the distance (h). Thesemi variance is defined as half the mean of squares of the differences of the recorded values of theattribute at a pair of stations spaced at a given distance.

In accordance with the definition of the semi variance, its value at h equal to zero is half the varianceof a data from itself, which has to be zero. Thus, the variogram has to pass through the origin.

Further, the semi variance for a distance less than the minimum distance between the observationpoints is not defined. The minimum distance may be quite small in comparison with the total range ofh and the semi variance corresponding to this distance may not be close to zero. This leads to anugget effect that is essentially a vertical rise of the curve at the origin.

Beyond the nugget, the semi variance rises gradually up to a limiting distance beyond which itbecomes a constant. This distance is known as range or span of the variable. The maximum semivariance reached at the range shall be half the variance around the mean and is known as the sill.

The range can be viewed as an effective neighbourhood within which the continuity of theregionalised variable holds. It may be inferred that while Kriging, only data from the effectiveneighbourhood need to be considered, although there may be some exception.

1.3.3 GENERATION OF DISCRETE VARIOGRAM

Given, the attribute and the spatial data, the variogram may be prepared in the following steps:

• Select the range of variation of the distance and the discrete distances in the selected range. Therange could be from the minimum to the maximum distance between the recording points. Withinthis range it is desirable to have about ten discrete distances. This will permit an estimation ofthree to four parameters of a theoretical model of the variogram, with sufficient degree offreedom. The choice of the discrete distances could be governed by the orders of the mostfrequently occurring distances.

• Start with the first chosen distance. Pick up the pairs of the recording points spaced at thisdistance. The semi variance can be estimated reliably only if a large number of such pairs areavailable uniformly all over the area. For data points that are irregularly distributed in the domain,it may not be possible to find sufficient number of data points spaced exactly at the desireddistance. Therefore, it is necessary to define a tolerance level with in which a distance may be

Page 11: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-9

deemed to be equal to the desired distance. For example, consider one of the chosen discretedistances to be 200 m and the tolerance as 10 percent. Then all pairs of observation pointsspaced more than 180 m and less than 220 m could be deemed to be spaced at 200 m.Compute the sum of squares of the differences between the recorded values at the pairs of therecording points, deemed to be spaced at the selected distance. Then divide this by twice thenumber of the pairs. This provides the semi variance corresponding to the distance. Repeat forevery other distance. This provides the crucial semi variance (γ) versus distance (h) data.

1.3.4 MODELLING OF THE VARIOGRAM

The computed values of the semi variance for various distances can theoretically provide thenecessary variogram. However, in practice this may pose the following problems:

• The data may be too lumpy to be used as such and may require some smoothening.

• The variogram would be defined at some discrete points whereas during calculations the semivariance may be required at any distance.

• The range may not be directly visible in the plot.

• The discrete point data may not ensure positive definiteness of the resultant matrix, necessaryfor interpolation.

These problems are overcome by fitting a model to the computed γ data. For smoothening of thelumpy data, the number of data points should be much larger than the parameters of the model.

Fitting of model

Thus, the next step is to fit a model to the computed γ data. The model may be a single or multiplefunctional relations. It may incorporate nugget and range as parameters. There could be otherparameters too. The available models are linear, De Wijsian’s, spherical or their combinations.

The model fitting essentially involves choosing an appropriate model and estimating its parameters.The model could be chosen from a set of suggested theoretical models. The criterion for the choice isthe closest possible approximation or fit. The parameters are estimated by regression analysis i.e.,minimisation of the weighted squares of the deviations. The weights are held proportional to thecorresponding number of pairs of the gauge points.

The twin tasks of identifying the optimal model and estimating its parameters are accomplishedsimultaneously. This involves carrying out a series of regression and residual analyses for the variousmodels. Such analyses for each model provide estimates of the corresponding model parameters andthe statistics of the fit. The latter essentially quantifies the extent to which the model is capable ofexplaining the variation of the dependent variable (γ in the present case).

The relative performance of various models can be gauged by comparing their fit statistics. Since themodels incorporate varying number of parameters the appropriate fit statistic is the standard error thatis, the square root of the minimized residual variance per unit degree of freedom (refer Section 2.2.4).The residual variance is the sum of squares of the difference between the computed and the modelfitted semi variance. The degree of freedom is the difference between the number of data points (e.g.,number of the discrete distances) and the number of model parameters. The degree of freedomshould be sufficiently large. Usually a degree of freedom equal to six or more is consideredacceptable.

The model displaying the minimum standard error along with its estimated parameters are tentativelychosen for subsequent calculations. The corresponding goodness of the fit is computed.

Page 12: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-10

Sometimes the goodness of fit of even the best model (that is the model displaying minimum standarderror) may be poor due to inconsistencies in the data of computed γ. Τhe inconsistent data are termedas outliers. This requires computation of the standard residues at the data points and identifying theoutliers. Statistically, an outlier at 95% confidence level may be defined as a data point having theabsolute value of the standard residue more than two. It is necessary to review the attribute data,which led to such inconsistent (outlier) values of γ. If the data seem to be doubtful, the entire analysisshould be performed again, ignoring such data.

The other possibility is that the fit statistics may be poor due to consistently poor model performanceand as such, there may not be any outlier. This means that none of the trial models is capable ofsimulating the trend of the computed γ values. This calls for two things. First, it may ascertainedwhether the basic assumptions of the Kriging are being satisfied. If the assumptions are indeed beingsatisfied (or are not being violated too severely), it is necessary to define a new problem specificfunction, keeping in view the trend of the γ values. If the user is unable to suggest any new function, ageneral polynomial function may be selected.

1.3.5 INTERPOLATION BY KRIGING

Size of the neighbourhood

Kriging describes the regionalized variable at an interpolation point as a weighted mean of the knownvalues in it’s neighbourhood. Neighbourhood may be a circle around the interpolation point. In case ofsevere anisotropy, it may be an ellipse. The size (say radius of the circle) of the neighbourhood isgoverned by the following considerations:

• Theoretically the size of the neighbourhood should be restricted to the range of the hydrograph.This shall assimilate only the local trends into the simulation.

• The data, within the selected neighbourhood, should not display any trend.

• Preferably, there should be at least eight to twelve data points within the selectedneighbourhood.

• If the number of data points within the range is inadequate, the neighbourhood may be extendedbeyond it, to include some more data points. This may however, partly assimilate the regionaltrend in the interpolation.

System of equations

The sum of the weights is 1 and these are chosen in such a way that the estimation variance isminimized. These requirements lead to a system of (n+1) linear equations (n being the number ofobservation points within the selected neighbourhood). The equations comprise n weights and aLagrangian multiplier, as unknowns. The semi variances corresponding to various distances (betweenthe observation and the interpolation points) appear as the coefficients. These have to be picked upfrom the variogram. The determinate system of equations is solved for the weights and theLagrangian multiplier. Thus, the weighted mean that is, the Kriged (interpolated) estimate of thevariable is arrived at.

1.3.6 Mathematical manipulations

Kriging does not provide an explicit functional relation between the attribute and the spatialcoordinates. Its primary capability is an implicit interpolation. This capability permits generation of theraster, i.e. gridded data. These discrete point data can nevertheless be analysed numerically toestimate derivatives and integrals, though with inevitable numerical errors.

Page 13: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-11

Gradients

Spatial gradients of the attribute are estimated by numerical differentiation of the raster data. Thecalculations are performed in double precision to restrict the round- off errors.

Spatial averaging

The spatial average value of the attribute in any given domain can be estimated by numericalintegration of the raster data over the domain and dividing the integral by the area of the domain.

1.3.7 SUITABILITY CRITERIA

Merits

Kriging, apart from yielding an interpolated value, also permits an estimation of its standard error,which in turn leads to an estimation of the confidence band around the interpolated value. With anassumption of a normal distribution of the standard error, the band at 95% level of confidence is theinterpolated value + twice the standard error.

Kriging at multiple points can provide the contours of the standard error. These contours permit anevaluation of the network of the observation points used for Kriging. Creation of additional data pointsin the regions displaying high standard errors may be considered and vice versa.

Demerits

• Kriging analysis is applicable only if the attribute data are stationary at least within the selectedneighbourhood.

• Kriging describes the variable at an interpolation point as a weighted mean of the known values.The weights are usually positive and their sum is one. Therefore the interpolated value willusually lie in a range defined by the highest and the lowest recorded values used in the analysis.As such, Kriging is not a good extrapolator.

• The weights depend upon the coordinates (that is, location) of the point at which the interpolationhas to be carried out. Thus, each interpolation shall involve setting up and solving a system ofequations followed by computation of the weighted mean. Contouring essentially involves a largenumber of interpolations. Thus, the computational efforts required for obtaining Kriging assistedcontours may be quite large.

1.4 UNIVERSAL KRIGING

In Universal Kriging the assumption of the stationarity of the data is avoided. However in place, it isassumed that the mean instead of being a constant undergoes a regular or slow change in thedomain. The slowly varying mean is known as the drift. Thus, non-stationary data is assumed tocomprise a drift and a residual that is stationary. The residuals are further assumed to be regionalisedvariable.

Universal Kriging essentially splits up the raw data into the drift and the residual, and simultaneouslyperforms Kriging on the residuals. The split is accomplished by an additional assumption that the drift(that is spatially varying mean) can be described as a polynomial of the spatial coordinates. Thedegree of the polynomial is assumed to be known a priori and is usually restricted to two.

Page 14: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-12

1.4.1 COMPUTATIONAL DETAILS

Kriging of the residuals will however, require the variogram of the residuals and not the variogram ofthe raw (non-stationary) data. Such a variogram can not be prepared prior to the analysis since theresiduals are not known. This calls for an iterative procedure. The procedure shall be as follows:

• Arrive at an approximate spatial variation of the drift by carrying out trend analysis of the rawdata. Trend analysis essentially involves passing a smooth that is, least squares polynomialsurface through the raw data.

• From the fitted polynomial estimate the drift at each gauge point.

• Estimate the residual at each gauge point by subtracting the estimated drift from the recordedstationary data.

• Arrive at the variogram of the residuals.

• Perform the Universal Kriging analysis on the recorded data employing the evolved variogram ofthe residuals and hence split up these data into drift and stationary residuals.

• Arrive at a modified variogram employing the data of residuals obtained in Step 5

• Compare this variogram with the variogram used for performing the Universal Kriging analysis.

• If the two variograms compare reasonably well, the Universal Kriging just performed can beaccepted. Otherwise, go back to Step 5, with the modified variogram.

1.5 SPLINE FUNCTIONS

1.5.1 BASIC FEATURES

Spline functions are piece wise continuous polynomials of the spatial coordinates satisfying thecompatibility conditions at the interfaces between the adjacent polynomials. It is a surface fitting,exact, moving neighbourhood technique that is devoid of any theoretical rationale. It can producecontours of regional as well as local trends.

1.5.2 COMPUTATIONAL DETAILS

The entire domain is divided into a number of zones. An exact polynomial is fitted to the data pointsfrom each zone. The coefficients of the pre stipulated polynomials are computed by satisfying thefollowing requirement: the exact fit, gradients across a part or the entire domain boundary honouringthe assigned gradients and the compatibility among the adjacent polynomials. The compatibilityincludes the requirements that the attribute and its gradients must be uniquely defined along theboundary between two zones. Or in other words, the attribute and the gradients as computed from thetwo corresponding fitted polynomials must be identical. This approach is essentially a numericalequivalent of french curving.

1.5.3 MATHEMATICAL MANIPULATIONS

The evolved polynomials are employed to perform the following mathematical manipulations.

Interpolation

The attribute at any un-gauged point can be estimated by merely substituting the coordinates of thepoint in the appropriate polynomial relation.

Page 15: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 1-13

Gradients

The polynomial function can be differentiated analytically. Thus, the spatial gradients of the attribute atany point can be estimated by substituting its coordinates in the respective derivatives of theappropriate polynomial.

Spatial averaging

The spatial average value of the attribute in any given domain can be estimated by dividing thedomain into sub-domains falling in the zones of various polynomials. Subsequently, the correspondingpolynomials are integrated over the respective sub-domains. The average is given by the summationof the integrals divided by the area of the domain.

1.5.4 SUITABILITY CRITERIA

Spline functions, unlike the least squares polynomials and Kriging, are devoid of any theoreticalrationale.

Their biggest advantage is that they can honour the assigned gradients at the boundary. This may beuseful in assimilating the information on subsurface horizontal flow across the boundary, into thecontours of water table elevation.

They permit a smooth and exact fit to the data with polynomials of low degree. Thus, they are able tocontour both the local and regional trends with low degree polynomials.

1.6 SUGGESTED READING

• Bardossy, A. (Ed), Geostatistical Methods: Recent Developments and Applications in Surfaceand Subsurface Hydrology. Proceedings of an International Workshop held at Karlsruhe,Germany, from 17 to 19 July 1990, UNESCO (IHP IV), Paris 1992.

• Chatterjee, S. and Price, B., Regression Analysis by Example. New York: Wiley, 1977.

• Clark, I., Practical Geostatistics. Applied Sciences Publishers, London, 1979.

• Cressie, N., Statistics for Spatial Data. Wiley, New York, 1993.

• David, M., Geostatistical Ore Reserve Estimation. Elsevier Scientific Publishing Company, 1977.

• Davis, John C., Statistics and Data Analysis in Geology. John Wiley and Sons, 1986.

• Delhomme, J.P., Kriging in Hydrosciences. Advances in Water Resources, Vol. 1, No. 5, 1978.

• Deutsch, C. and Journel, A. G., GSLIB: Geostatistical Software Library and User’s Guide. OxfordUniversity Press, New York, 1992.

• Isaaks, E. H. and Srivastava, R. M., An Introduction to Applied Geostatistics. Oxford UniversityPress 1989.

• Neuman, S.P., Role of Geostatistics in Subsurface Hydrology. In Geostatistics for NaturalResources Characterization (ed. G. Verly, et al.), D. Reidel Publishing Company, 1984.

Page 16: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-1

2 TIME SERIES ANALYSIS

2.1 INTRODUCTION

A time series essentially represents the temporal variation of a variable of interest at a given location.A groundwater time series may comprise multiple- time data of water table/piezometrichead/rainfall/river stage/quality etc. from a single space point (observation well/ piezometer/ raingauge/ river gauge station etc.), arranged in a chronological order. The interval between thesuccessive data may be uniform through out the series or may be non-uniform.

The analyses of time series may provide valuable information on the long term and short-term trendsand the relationships between sites and different variables. It may lead to the identification of cycles ofvarying duration (e.g., daily, barometric, tidal, seasonal and annual). It may also provide tentativeestimates of some aquifer parameters. A composite analysis of the time series of water level and ofrainfall may provide a useful in-sight into the process of the rainfall recharge. Similarly an analysis ofwater level time series along with the time series of atmospheric pressure or tide level may provide anestimate of the specific storage.

The following analyses of the time series have been incorporated in the dedicated software:

• Auto/cross correlation analysis

• Stationarity analysis

• Spectral (or Fourier or Harmonic) analysis

• Co-Kriging

The details of the analyses are incorporated in the following sub-sections.

2.2 AUTO/CROSS CORRELATION ANALYSIS

2.2.1 CORRELATION

This statistic determines the degree of linear interrelation (that is, scaled similarity) between a givenseries and a derived series.

A direct linear relation (that is, as one series rises, the other also rises and vice versa) is termed aspositive correlation. An inverse linear relation (that is, as one series rises, the other declines and viceversa) is termed as negative correlation. If the rise of one series has apparently no effect on the other,the two series are known to be uncorrelated.

The given series may be termed as a pivotal series. The derived and the pivotal time series musthave the same data frequency and an adequately long overlap. A correlation between two seriesfalling in different time spans can be computed by analysing the overlapping period only, that is, bycurtailing one or both the series. If there is no overlapping period, the correlation can not beestimated. If the two series comprise data at different frequencies, the series of higher frequency (sayseries of DWLR data) may be pruned to ensure a compatible frequency.

Page 17: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-2

The derived series can be either of the following:

• The pivotal time series itself, but displaced (or lagged) by the given number of time intervals(known as lag).

• Any other time series with the same time span and the data frequency as that of the pivotal timeseries.

• Any other time series with the same time span and the data frequency as that of the pivotal timeseries and lagged by a given number of the time intervals.

This correlation is essentially a normalised covariance between the pivotal and the derived series.Two positively correlated time series shall have a correlation greater than zero and it may go up to +1.A correlation of +1 indicates a perfect positive correlation, that is, a direct proportionality of thefluctuations in the two series and hence a perfect linearity with positive gradient between the twovariables. Similarly, two negatively correlated time series shall have a correlation less than zero and itmay go up to -1. A correlation of -1 indicates a perfect negative correlation, that is, an inverseproportionality and hence a perfect linearity with negative slope. Two uncorrelated series shall have azero correlation.

2.2.2 AUTO/CROSS CORRELATION

A correlation between the pivotal and the lagged pivotal time series is known as auto correlation orserial correlation. Further, a correlation between a pivotal time series and any other time series(lagged or unlagged) is known as cross correlation. Both the auto as well as cross correlation need tobe qualified by the lag. It may be noted that a cross correlation between a pivotal and anotherconcurrent unlagged time series is termed as cross correlation with zero lag. Similarly an autocorrelation with zero lag is the correlation of the pivotal series with itself. This is apparently equal tothe variance of the pivotal series.

Confidence limits

A direct proportionality between the fluctuations in the two series leads to a correlation value of plusone. Such a perfect positive correlation is very improbable and may be considered to be significant atten percent confidence level. Generally, a correlation at 95% or higher confidence level is consideredto be significant enough. The threshold positive correlation (CLp) for such a confidence level is givenby the following equation:

kn1

1kn1kn

1kn1

p 96.1CL −+−−−

+− +−= (2.1)

where, k is the lag and n is the number of data points.

Similarly, though a correlation value of -1. is required for concluding an inverse proportionality, anynegative correlation at 95% or higher confidence level is considered to be significant enough. Thethreshold negative correlation (CLn) for such a confidence level is given by the following equation:

kn1

1kn1kn

1kn1

n 96.1CL −+−−−

+− −−= (2.2)

Page 18: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-3

2.2.3 CORRELOGRAM

A plot of the auto correlation versus the lag is known as auto correlogram. Similarly a plot of crosscorrelation versus the lag is known as the cross correlogram. The lag is restricted to n/4, n being thenumber of data in the series (or in the overlapping portions of the series).

The auto correlogram assists in detecting the dependencies through time. The cross correlogram onthe other hand, determines the strength of the relationship between the two related phenomenarepresented by the two series. It further determines the lag between the two phenomena (say the lagbetween the occurrence of the rainfall and the resultant recharge to the water table).

2.3 STATIONARITY ANALYSIS

2.3.1 STATIONARY TIME SERIES

A time series is said to be stationary if its properties do not change with time. Thus, theoretically atruly stationary time series must comprise an array of equal numbers. Such a series would howeverbe of no interest to any one.

In practice a compromise is made and heuristically, a time series is regarded as stationary if it doesnot display any long-term trend. It is assumed that the short-term trends do not render a time seriesnon-stationary, provided they occur as statistically similar waves of a statistically constant wavelength(time period). Thus, a stationary time series may be viewed as one comprising series of nearly similarshort-term trends.

2.3.2 TESTS OF STATIONARITY

The series is divided into several (say n) equal segments. Each segment may comprise one or moresequential waves. Since most of the hydrological series display an annual cycle, a wave maycomprise an annual hydrograph. Subsequently, the properties of these segments are computed andanalysed. The relevant properties and the corresponding inferences are as follows:

Stationarity of mean

The first obvious pre requisite of stationarity is that the mean should be stationary that is, the mean ofeach segments should tend to be the same. Stationarity of mean is known as first order stationarity.

However, the means could be unequal not only due to non-stationarity of the mean but also due to tooshort segment- length. Therefore, this requirement is replaced by a less stringent condition, that is,absence of any consistent rising or falling trend in the mean of the segment hydrograph. Thus, thoughthe segment mean may vary over the series, the stationarity of mean may still hold provided there isno consistent rising or falling trend in it’s variation.

Mathematically, this requires that the slope of the regressed linear relation between the segmentmean and the time should not be statistically different from zero.

The stepwise procedure for identifying the stationarity of mean shall be as follows:

• Compute mean of each segment.

• Assume the following linear relation between the segment mean (y) and the time (x) till the middleof the segment:

Page 19: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-4

y = a + bx (2.3)

• Estimate a and b and minimized residual variance (MRV1) by subjecting the computed means xand the associated times y to a regression analysis.

• Compute t statistic of b as per the following equation:

SSX/MSDbt = (2.4)

where:

2n

MRVMSD 1

−= (2.5)

n: Number of the data points

( )XSQxSSX 2i

i−∑= (2.6)

n

xXSQ

2i∑

= (2.7)

• Compare the computed t statistic with the tabulated t at (n-2) degrees of freedom and astipulated significance level (usually 95%).

• If the computed t is less than the tabulated t, the computed slope coefficient (b) may be treatedas zero and hence the stationarity of mean may be inferred.

If the computed t is greater than the tabulated t, the slope coefficient may be treated to be significantlydifferent from zero. As such, a trend in the mean may be inferred. In such a case a check for seconddegree (quadratic) trend may be made as follows:

• Add a quadratic term to the assumed relation between y and x.

2cxbxay ++= (2.8)

• Estimate a, b and c; and the minimized residual variance (MRV2) by regression analysis.

• Compute F statistic of the added term as per the following expression:

)3n/(MRV

)2n/(MRVF

2

1

−−

= (2.9)

• Compare the computed F statistic with the tabulated F at (n-2) and (n-3) degrees of freedom anda stipulated significance level (usually 95%).

• If the computed F is larger than the tabulated F, it may be inferred that MRV2 is significantlysmaller than MRV1, that is, there has been a significant improvement in the fit as the quadraticterm was introduced. Thus, a quadratic trend may be inferred. Otherwise a linear trend may beinferred.

Page 20: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-5

Stationarity of shape

The other prerequisites of stationarity are that the segment shape must be stationary that is, eachsegment apart from having the same mean, and also must have the same shape. This is termed assecond order stationarity.

The test of stationarity of shape, which should be made only if the test of stationarity of mean hasbeen found to hold, involves computing auto correlogram for each segment. If the auto correlogramsof all the segments tend to be the same (i.e., if the autocorrelation changes only with the time lag andnot with the position of the time series) the stationarity of shape may be inferred.

However, the auto correlograms may be dissimilar not only due to non-stationarity of the shape butalso due to too short segment- length. Therefore, this requirement is replaced by a less stringentcondition, that is, absence of any consistent rising or falling trend in the standard deviation of thesegment hydrograph. Thus, though the correlograms may vary over the series, the stationarity ofshape may still be assumed to hold approximately provided the mean and the standard deviation ofthe individual segment hydrographs do not display any consistent rising or falling trend.

Mathematically, this requires that the slope of the regressed linear relation between the segmentstandard deviation and the time should not be statistically different from zero. The necessary stepwiseprocedure has already been discussed in the context of ascertaining the stationarity of the mean.

If the computed t is greater than tabulated t, slope coefficient may be treated to be significantlydifferent from zero. As such, a trend in the shape may be inferred. In such a case a check for seconddegree (quadratic) trend may be made as discussed already in the context of determining the natureof the trend of mean.

2.3.3 LEVELLING OF SERIES

As described earlier, levelling essentially involves transforming a non-stationary time series into anearly stationary series. This is essential because most analyses of time series are applicable only tostationary time series.

Levelling of mean

This comprises computing and subsequently removing the trends of the macro (say fortnightly,monthly) means from the time series.

The step-wise procedure is as follows. The steps are defined for removing trends of monthly means.However, the same procedure shall hold for removing trends at any other level.

• Compute the monthly means from the available higher frequency (6 hourly, daily, weekly etc.)data contained in the entire series. The series is expected to span over a number of years.

• Arrange the means of the first month (say January) of all the years in a chronological order. Thisleads to the time series of the first month’s mean.

• Fit a regression line to this time series and apply t test to check if the slope coefficient issignificantly different from zero, that is, whether there is any trend. If a trend is found (that is, ifthe slope is found to be significantly different from zero), apply F test to determine whether thetrend is linear or quadratic.

• Compute the trend in the data of the first month of each year. The trend in any given year istaken as the difference between the polynomial value at the last and at the given year.

Page 21: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-6

• This is added to all (that is 4-daily, daily, weekly etc.) data of the first month of the given year.This removes the trend from the data of the first month of all the years.

• Repeat steps 2 to 5 for all the subsequent months.

• This removes trend of the mean from the entire time series.

The residual series shall be devoid of any trend of the mean at the selected macro level. Thus, it shalldisplay a stationarity of mean. It may also display a stationarity of shape of the annual hydrograph,provided the selected macro period is sufficiently smaller than one year (for example, at monthlylevel). However, if the selected macro period is higher, the following additional levelling may benecessary.

Levelling of shape

This comprises computing and subsequently removing the trend of the standard deviation of theannual hydrographs from the time series.

The step-wise procedure is as follows:

• Divide the series into sequential hydrographs.

• Compute the standard deviation of each annual hydrograph.

• Arrange the standard deviations of all the years in a chronological order. This leads to the timeseries of the standard deviation.

• Fit a regression line to this time series and apply t test to check if the slope coefficient issignificantly different from zero, that is, whether there is any trend. If a trend is found (that is, ifthe slope is found to be significantly different from zero), apply F test to determine whether thetrend is linear or quadratic.

• Compute the trend in each year. The trend in any given year is taken as the ratio of thepolynomial value at the last and at the given year.

• Remove the trend from the data of each year. All the data of a given year are multiplied by thecorresponding computed trend. This removes the trend from the data of the given year.

The resultant series shall be devoid of any trend of the standard deviation at the selected macro level.Thus, it may display a stationarity of shape, provided the mean is also stationary.

2.4 SPECTRAL ANALYSIS

A hydrologic time series (say of groundwater level) represents the resultant effect of a number ofphenomenon many of which may be periodic (that is, self repeating). Each periodic phenomenonimparts a periodicity to the time series. However, due to their superposition, all these periodicities maynot be visible in the time series. Spectral analysis is essentially aimed at breaking up the time seriesinto these hidden periodicities (or say cycles). This reveals the hidden periodicities. This mayultimately facilitate identification of significant or dominant cycles.

Each periodicity is represented by an independent time series of the form of a sinusoid wave (alsoknown as a harmonic) and having its own parameters. The parameters include wavelength (timeperiod), amplitude and starting point. Thus, first step of the analysis involves computation of theparameters. The parameters are so computed that superposition (summation) of the sinusoids leadsto the original time series. Further sum of variances of individual sinusoids equals the variance of theoriginal time series. The computed parameters permit an identification of the predominant cycles.

Page 22: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-7

2.4.1 THE HARMONICS

Consider a time series comprising n attribute data (Yi, i varying from 0 to n-1) at a uniform timeinterval (say ∆t). Thus, the total span of the time series is ∆t.(n-1). Spectral analysis permitsapproximation of the attribute (Y) at any time t since the beginning of the time series, as a sum of aconstant and (n-1) harmonics [Hj, j varying from 1 to (n-1)].

∑+=−

=

1n

1jjo Ha

2

1)t(Y (2.10)

The constant and the jth harmonic are given by the following expressions:

∑=−

=

1n

0iio Y

n

2a (2.11)

)j(Sinb)j(CosaH jjj θ+θ= (2.12)

where:

t)1n(

t2

∆−π

=θ (2.13)

π=

=

1n

0iij n

ij2CosY

n

2a (2.14)

π=

=

1n

1iij n

ij2SinY

n

2b (2.15)

The jth harmonic represents a phenomenon of a time period equal to j.∆t/n.

2.4.2 VARIANCE OR POWER OF A HARMONIC

The variance (or power) of the jth harmonic is given by the following expression:

2j

2jj baA += (2.16)

The variance of the time series equals sum of the variances of the individual harmonics. Variance is ameasure of the scatter of a time series around the mean. Thus, it can be inferred that variance of aharmonic is proportional to its relative dominance in the original time series and therefore, may beviewed as a measure of the relative significance of the associated phenomenon.

Page 23: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-8

2.4.3 PERIODOGRAM

A plot of the variance versus the harmonic number (j) is known as periodogram of the time series.Since this plot comprises discrete number of variance values, it is also known as discrete powerspectrum.

By a visual inspection of the periodogram, one can identify the dominant harmonic numbers, i.e.,serial numbers of the harmonics displaying conspicuously high variance. Knowing the harmonicnumbers, the corresponding time periods can be computed.

The cycle of lowest time period that can be identified (or separated) by the spectral analysis isobviously linked to the interval (∆t) of the data. It can be easily visualised that the lowest identifiabletime period is 2∆t. Τhus for example, if the data are available at an interval of one day, only the cyclesof two days or longer time period can be identified by the analysis. Similarly, if a daily cycle is to beidentified, the interval must be 12 hours or less.

The cycle of longest time period that can be identified is apparently linked to the total time domain(that is, the length) of the time series. It may usually be restricted to one fourth of the length of thetime series. This implies that for identification of a cycle, the length of the series must be at least fourtimes its time period.

2.4.4 CONTINUOUS POWER SPECTRUM

One apparent problem with the discrete power spectrum is that it does not permit the estimation of thevariance associated with a given time period. It permits estimation of variance corresponding to thediscrete time periods that is, j.∆t/n with j varying from 1 to n-1. For example, a daily cycle may not beidentifiable explicitly, because no discrete period may assume a value of one day.

This problem is overcome by apportioning the total variance of the time series among a set offrequency (inverse of the time period) bands. The derivative of the variance with respect to thefrequency (that is, the apportioned variance divided by the width of the frequency band) is known ascontinuous spectrum or spectral density function. A plot of the spectral density function versus thefrequency is known as continuous power spectrum. The total area under the spectrum is equal to thetotal variance of the time series. The area under the curve between any two discrete frequenciesrepresents the corresponding variance. In contrast, the periodogram attributes the variance toindividual discrete time periods. In the dedicated software an algorithm known as Fast FourierTransform calculates the continuous power spectrum.

2.4.5 FILTERS

The power spectrum (either discrete or continuous) computed as above is known as raw powerspectrum. This may generally not be satisfactory because of inevitable errors in the varianceapportioned to the time periods or to the frequency bands. The error may be on account of the limitedresolution of the series and the inevitable data noise. Thus, it is necessary to smoothen the spectrum.This is accomplished by a weighted averaging of the apportioned variances in the neighbourhood.Thus, the smoothened variance in any band is taken as the weighted mean of the variancesapportioned to that and to the adjacent bands. The weights are known as the filter.

The filter incorporated in the dedicated software assigns a weight of 0.5 to the band underconsideration and weights of 0.25 to the preceding and to the following frequency bands. A moreflexible approach suggested by Tuckey has also been incorporated.

Page 24: Download-manuals-ground water-manual-gw-volume8referencemanualdataprocessing

Reference Manual – Data Processing and Analysis (GW) Volume 8

Data Processing and Analysis March 2003 Page 2-9

2.5 CO-KRIGING

Co-Kriging is essentially an extension of the classical Kriging technique (refer Section 1.3). Co-Kriging, in the context of time series is analysis of concurrent time series of two correlated variablesexhibiting random temporal fluctuations. The frequency of data in the two time series may or may notbe identical. Co-Kriging permits interpolation of one dependent variable over the other, accounting fortheir auto-correlations as well as cross-correlation between them.

In principle co-Kriging is quite similar to Kriging, except for the requirement of modelling threevariograms (in place of one) and more complex computations. The three required variogramscomprise the variograms of the two dependent variables in respect of the time; and the cross-variogram of one dependent variable in respect of the other. The cross-variogram is usually difficult tomodel.

2.6 SUGGESTED READING

• Davis, J. C., Statistics and Data Analysis in Geology. John Wiley and Sons, 1986.

• Law, A. G., Stochastic Analysis of Groundwater Level Time Series in the Western United States.Hydrology Papers, Colorado State University, Fort Collins, Colorado; No. 68, May 1974.

• Myers, D. E., Matrix Formulation of Co-Kriging. Mathematical Geology, Vol. 14, No. 3, 1982.

• Yevjevich, V., Stochastic Processes in Hydrology. Water Resources Publications, Fort Collins,Colorado, 1972.

• Jenkins, G. M., and Watts, D.G., Spectral Analysis and its Applications. Holden Day, Sam Day,San Francisco, 1968.