streamflow record extension using power ...moog et al.: streamflow record extension 245 cox [1964]...

12
WATER RESOURCES RESEARCH, VOL. 35, NO. 1, PAGES 243-254, JANUARY 1999 Streamflow record extension using power transformations and application to sediment transport Douglas B. Moog and Peter J. Whiting Departmentof Geological Sciences, CaseWesternReserve University, Cleveland, Ohio Robert B. Thomas Sherwood, Oregon Abstract. To obtain a representative set of flow ratesfor a stream, it is often desirable to fill in missing data or extendmeasurements to a longer time period by correlationto a nearby gagewith a longer record. Linear least squares regression of the logarithms of the flowsis a traditional and still common technique. However, its purpose is to generate optimal estimates of each day'sdischarge, rather than the populationof discharges, for whichit tendsto underestimate variance. Maintenance-of-variance-extension (MOVE) equations [Hirsch,1982]were developed to correctthis bias.This study replaces the logarithmic transformation by the more general Box-Cox scaled power transformation, generating a more linear, constant-variance relationship for the MOVE extension. Combining the Box-Coxtransformation with the MOVE extension is shown to improve accuracy in estimating order statistics of flow rate, particularly for the nonextreme discharges which generally governcumulative transportover time. This advantage is illustrated by predictionof cumulative fractions of total bed load transport. 1. Introduction In hydrologic, geologic, and water quality analysis, mean daily flow data are usedto calculate numerous quantities, in- cluding fluxes of water, sediment, pollutants, andnutrients; the frequency of different flows; geemerphicallyeffective dis- charges; and habitat quantity and quality. For a variety of reasons a long,continuous record of mean dailyflows may not be available. Flow records, if theyexist, maybe too short to be hydrologically representative. Equipment malfunctions or sea- sonal shutdowns may create gaps in the flow record. Thus hydrelegists often mustextendshortrecords or estimate miss- ing flows. It is not always necessary to produce a time serieswith proper serial correlation and a best estimate of each daily mean flow. Instead, it is often sufficient to generatea set of flowswith appropriate generalcharacteristics, suchas mean, variance, and frequency-duration relationship. This paper is concerned with such cases. Several techniques have beendeveloped for predicting flows in a short-record stream by establishing a relationship to a base station for periods of concurrent data, then using this relation- shipto predictflows in the extension period (review by Hirsch et al. [1993]).The indexstation method [Searcy, 1959]paired baseand short-record data of equalexceedance probability in the concurrent period. This technique is limited to the range of flowsoccurring in the concurrent period. A traditional tech- nique is to express the short station streamflow as a linear functionof the base streamflow usingordinary least squares (els) regression for the set of concurrent data. Matalas and Jacobs [1964] showed that els underestimates the variance of the extension period, and they increased the variance by add- Copyright1999 by the American Geophysical Union. Paper number 1998WR900014. 0043-1397/99/1998WR900014509.00 ing independent noise (RPN model). Hirsch [1982] demon- strated bias in both els and RPN, the latter due to excess kurtosis, and proposed linear maintenance-of-variance- extension (MOVE) functions, which were refined by Vogel and Stedinger [1985]. The purposeof the current investigation is to analyzethe characteristics and accuracy of the linear extension equations MOVE and els for single base stations, using both natural logarithmic (ln) and Box-Cox (BC) transformations, and their application to bed load transport. They are tested by applica- tion to pairs drawn from ten long-record Idaho streams. Whereas Hirsch[1982]tested MOVE estimates of low monthly flows, this study applies record extension to mean daily dis- charges over the full range of flow magnitude. By analysis of cumulative bed load predictions we assess the suitabilityof each technique for problems which are lesssensitive to pre- diction of rare, extreme events. 2. Record Extension Techniques 2.1. Extension Equations This study focuses on the useof a single nearby base gage to extendflow records. Multiple basestations are advantageous in some cases [Alley and Bums, 1983],but a well-chosen base stationwith a strongcorrelation to the short-record station might be expected to produce better estimates than a group including more weakly correlated stations, the addition of which can actually increase prediction error [Clarke,1994,p. 263]. Even multiple-gage techniques are built by relating pairs of gages, so accurateextension from a singlebase station is fundamental. The flow record of streampairs is split into four sets, des- ignated Xc, xE, Yc, andY•r, wherex indicates discharges (or transformed discharges) of the base station; y indicates the short-record, or simply short, station; subscript C (concurrent) 243

Upload: others

Post on 03-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

WATER RESOURCES RESEARCH, VOL. 35, NO. 1, PAGES 243-254, JANUARY 1999

Streamflow record extension using power transformations and application to sediment transport

Douglas B. Moog and Peter J. Whiting Department of Geological Sciences, Case Western Reserve University, Cleveland, Ohio

Robert B. Thomas

Sherwood, Oregon

Abstract. To obtain a representative set of flow rates for a stream, it is often desirable to fill in missing data or extend measurements to a longer time period by correlation to a nearby gage with a longer record. Linear least squares regression of the logarithms of the flows is a traditional and still common technique. However, its purpose is to generate optimal estimates of each day's discharge, rather than the population of discharges, for which it tends to underestimate variance. Maintenance-of-variance-extension (MOVE) equations [Hirsch, 1982] were developed to correct this bias. This study replaces the logarithmic transformation by the more general Box-Cox scaled power transformation, generating a more linear, constant-variance relationship for the MOVE extension. Combining the Box-Cox transformation with the MOVE extension is shown to improve accuracy in estimating order statistics of flow rate, particularly for the nonextreme discharges which generally govern cumulative transport over time. This advantage is illustrated by prediction of cumulative fractions of total bed load transport.

1. Introduction

In hydrologic, geologic, and water quality analysis, mean daily flow data are used to calculate numerous quantities, in- cluding fluxes of water, sediment, pollutants, and nutrients; the frequency of different flows; geemerphically effective dis- charges; and habitat quantity and quality. For a variety of reasons a long, continuous record of mean daily flows may not be available. Flow records, if they exist, may be too short to be hydrologically representative. Equipment malfunctions or sea- sonal shutdowns may create gaps in the flow record. Thus hydrelegists often must extend short records or estimate miss- ing flows.

It is not always necessary to produce a time series with proper serial correlation and a best estimate of each daily mean flow. Instead, it is often sufficient to generate a set of flows with appropriate general characteristics, such as mean, variance, and frequency-duration relationship. This paper is concerned with such cases.

Several techniques have been developed for predicting flows in a short-record stream by establishing a relationship to a base station for periods of concurrent data, then using this relation- ship to predict flows in the extension period (review by Hirsch et al. [1993]). The index station method [Searcy, 1959] paired base and short-record data of equal exceedance probability in the concurrent period. This technique is limited to the range of flows occurring in the concurrent period. A traditional tech- nique is to express the short station streamflow as a linear function of the base streamflow using ordinary least squares (els) regression for the set of concurrent data. Matalas and Jacobs [1964] showed that els underestimates the variance of the extension period, and they increased the variance by add-

Copyright 1999 by the American Geophysical Union.

Paper number 1998WR900014. 0043-1397/99/1998WR900014509.00

ing independent noise (RPN model). Hirsch [1982] demon- strated bias in both els and RPN, the latter due to excess kurtosis, and proposed linear maintenance-of-variance- extension (MOVE) functions, which were refined by Vogel and Stedinger [1985].

The purpose of the current investigation is to analyze the characteristics and accuracy of the linear extension equations MOVE and els for single base stations, using both natural logarithmic (ln) and Box-Cox (BC) transformations, and their application to bed load transport. They are tested by applica- tion to pairs drawn from ten long-record Idaho streams. Whereas Hirsch [1982] tested MOVE estimates of low monthly flows, this study applies record extension to mean daily dis- charges over the full range of flow magnitude. By analysis of cumulative bed load predictions we assess the suitability of each technique for problems which are less sensitive to pre- diction of rare, extreme events.

2. Record Extension Techniques

2.1. Extension Equations

This study focuses on the use of a single nearby base gage to extend flow records. Multiple base stations are advantageous in some cases [Alley and Bums, 1983], but a well-chosen base station with a strong correlation to the short-record station might be expected to produce better estimates than a group including more weakly correlated stations, the addition of which can actually increase prediction error [Clarke, 1994, p. 263]. Even multiple-gage techniques are built by relating pairs of gages, so accurate extension from a single base station is fundamental.

The flow record of stream pairs is split into four sets, des- ignated Xc, xE, Yc, and Y•r, where x indicates discharges (or transformed discharges) of the base station; y indicates the short-record, or simply short, station; subscript C (concurrent)

243

Page 2: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

244 MOOG ET AL.: STREAMFLOW RECORD EXTENSION

BC/MOVE. 1

In/MOVE. 1 ../•.. ...... In/ols . -.

ß ß , '.' ...; ':'..•..'o. , ß ... . . : ;. , ..•.'•.:;

... :....• ..-. • :..;- -•,•-• ..... ß

ß ß ' ' ß: :•;: • ,' ',-e•:' .a•!' ': • v. ee "' ß .,. ß ... •,...,... •., -•-•••'••••;: . ... .': ..." :::. ',.r;•'• .•,•.::.'•'•:•?':?'.•.- '.

.. . ... ..,.•.'...c• •?'.-' .. ..'

ß - :; '-,•::•27;• •!"' ...:•.'.'•';:i&'." ' ß.,

'•..• • ß " .'• •'"* .., '.'5 .•:... •. :•". ß ß .

I i I I

10 50 100 500

Daily Mean Discharge in m3/s of Selway River near Lowell

Figure 1. An example of concurrent daily mean flows (1986-1995). The plotted extension functions map Selway River (base station) flows into Little Salmon River (short station) estimates. The long-dashed line indicates ordinary least squares regression with natural log transformation (ln/ols), the short-dashed line indicates the MOVE.1 extension technique with natural log transformations (In/MOVE. 1), and the solid line indicates MOVE.1 with the Box-Cox scaled power law transformation (BC/MOVE.1).

indicates the dates on which discharges at both stations are known; and subscript E (extension) indicates dates on which only the base station flow is known. Since serial correlation is not of interest, the order of dates in these sets is unimportant, and they need not cover continuous periods of time.

Ordinary least squares estimates are generated by

S(y½) = m(yc) + r S(xc) [xE- m(xc)], (1)

where S is the sample standard deviation, rn is the mean,/PE is the set of estimates derived from x•r, and r is the product moment correlation coefficient of the sets x½ and y½. This equation was applied to the natural logarithms of the dis- charges, leading to line ln/ols in Figure 1, which also shows points representing the known, concurrent flows.

Unfortunately, (1) is not fully suited to the task of stream- flow record extension. Its purpose is to generate accurate es- timates of each point in Y•r. However, record extension (in the absence of interest in serial correlation) requires only that the distribution of the set/P•r approximate that of the true set Y•r- Hirsch [1982] showed that ols tends to result in a set/P•r with excessively low variance. The reason may be explained by ref- erence to Figure 1. The ln/ols line represents the best estimate (via least squares) ofy for each point in x. However, in record extension, what matters are not individual points, but the line itself, specifically, the set of estimates to be produced by the line, and the statistical distribution of that set. If we were to produce an extended flow set/Pc for the concurrent flows x½ which are plotted as points in Figure 1, ideally, we would like a set having the same variance as the cloud of points. What we actually generate is a set of points falling on the line. In pro- ducing the ln/ols line we have essentially discarded all variation about that line as random error. However, that error is, in fact,

natural variation, which we would like to retain. Thus, if a line is to be produced, it should have a greater slope, so as to produce a set/P•r with greater variance.

The MOVE techniques proposed by Hirsch [1982] provide a linear extension function with unbiased mean, like ols, and greater variance. MOVE.1 is the linear function for which the standard deviation and mean of the estimates/pc for the con- current period equal those of the known values, y½. Unlike ols, in MOVE.1 the elements in x½ and y½ are no longer paired through the correlation coefficient, since

S(y½) iP• = m(yc) + S(xc) [x•- m(xc)], (2)

which is (1) with r = 1. Because the coefficient ofx•r in (2) is larger, the slope of the MOVE.1 line is always higher than that of ols, as may be seen by comparing the In/MOVE.1 line with the ln/ols line in Figure 1. The higher slope means that the variance of the extended flows for a given set of base stream- flows will be higher using MOVE.1. MOVE.2 [Hirsch, 1982] instead produces estimates which, applied to the entire record XcE -- x½ U xE, would yield the unbiased maximum likeli- hood estimates of mean and variance for y½• -= y½ U y• [Vogel and Stedinger, 1985], making use of x•r data. The MOVE.2 line (not shown) is generally very close to that of MOVE. 1. Vogel and Stedinger [1985] refined MOVE.2 by ad- justing it to estimate the statistics of the extended record y½ U /P•r but indicated that the result was nearly indistinguishable.

2.2. Transformations

As linear equations, ols and MOVE are best applied to pairs for which scatterplots of the transformed flows show a linear trend. Because this criterion is often not well met by logarith- mic transformations, a more general form discussed by Box and

Page 3: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245

Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981, p. 225]. This system of scaled power transformations (BC) may be written in terms of flow rate Q as

Q'm --ff x4:0 . lnQ x=0

(3)

The scaling produces a gradual change in the scatterplot pat- terns as X is varied to select a transformation with improved linearity and homoscedasticity (i.e., uniformity of variance over the flow domain). It is possible for the inverse of this transfor- mation to become imaginary, and on this basis the Box-Cox transformation was rejected for synthetic streamflow genera- tion by Hirsch [1979]. As shown later, this problem may be avoided by placing constraints on the selected values of X.

In the aforementioned references, (3) is applied only to the response variable (in this case, the short-record stream), while an unscaled power transformation [Box and Tidwell, 1962] is offered for the predictor variable (base stream). The "R code" computer program of Cook and Weisberg [1994] permits appli- cation of (3) to both variables. This option was chosen as more appropriate for streamflow record extension, in which the vari- ables are of identical type. With the transformation (3) the MOVE. 1 equation (2) becomes

S(y) •r = m(y•) + S(x•) [x•r- m(x•)]. (4)

A pair of X values (Xx for the base stream, Xy for the short stream) chosen for the stream pair of Figure 1 resulted in the curve BC/MOVE. 1, which more closely follows the overall curvature of the scatterplot. However, the overall improve- ment may come at a price with respect to extreme flows. For example, at the high base station flows in Figure 1, the Box- Cox transformation replaces the tendency of the log transfor- mation to produce some unrealistically low flows with some unrealistically high flows.

2.3. Selection of Box-Cox Parameters

The Box-Cox parameters X in (3) may be selected analyti- cally or graphically. Cook and Weisberg [1982, p. 62] and Draper and Smith [1981, p. 226] present a maximum-likelihood calcu- lation for the response variable exponent, and Cook and Weis- berg [1982, p. 78] describe an iterative procedure for selecting a predictor exponent in the Box-Tidwell transformation. With proper graphic software [e.g., Cook and Weisberg, 1994] the analyst may watch a scatterplot of the short-record station versus the base station as the choices of X are varied and select

the pair judged to result in the best combination of linearity and homoscedasticity.

In the course of the present study, Xx and Xy were estimated both graphically and analytically, the latter using a determin- istic method customized for the streamflow record extension

problem. While graphical determination potentially allows more sophisticated choices, for example, balancing linearity and homoscedasticity for widely varying relationships between streams, an analytical technique may be replicated more pre- cisely by the reader. We do not wish to discourage graphical selection, which may be simpler in many cases, and which we found to be of similar accuracy.

The analytical technique was based on the maximized log likelihood technique as presented by Draper and Smith [1981,

p. 226], which is equivalent to the method of Cook and Weis- berg [1982], generalized to two variables. (Specifically, rather than selecting the response parameter Xy to maximize the likelihood function for the short-record flows, this function was

maximized over the set of pairs (Xx, Xy), subject to constraints discussed below.) For simplicity, the calculations employed dimensionless variables v in place of Q', where [Draper and Smith, 1981, equation 5.3.7a]

•-- Q'/[G(Q)X-•]. (5)

G(Q) is the geometric mean of the set of flow rates, either x½ or y½. These dimensionless variables simplify the expression for the maximum log likelihood estimator, so that the problem becomes one of selecting the set (Xx, Xy) which minimizes the sum of squares of the residuals when •, based on the short record is regressed on •, based on the base record, using ordi- nary least squares regression.

For practical purposes, this minimization was subject to sev- eral constraints: (1) a set of discrete possible values for Xx and Xy (-0.67, +_0.5, +-0.33, +-0.25, +-0.1, 0), (2) Xy large enough to avoid complex values on retransf9rmation, and (3) absolute difference Ixx - xyl < 0.34.

With respect to constraint (1), it is traditional, efficient, and sufficiently precise to limit the potential exponent values to a discrete set, chosen herein to be a limited range of those offered by Cook and Weisberg [1994]. With this finite set and a computer it is simpler and more accurate to calculate the sum of squares for each pair than to employ a more complex iter- ative algorithm such as that of Cook and Weisberg [1982, p. 78].

The need for constraint 2 may be understood by considering the reverse transformation of (3) as applied to record exten- sion:

•E = (X•r + 1) •/xy. (6)

In general, •r in this equation has a singularity at Xy• •r = - 1 and is complex when Xy• •r < - 1. Since • •r, as the output of the MOVE equation, may be arbitrarily large or small, such values are indeed encountered in practice, though, in this study, only for negative Xy (because positive Xy would require very unlikely values of.9•r). One could avoid this problem by limiting • •r to some value below - 1/Xy (for Xy < 0), but, in practice, such limited flows are very erroneous, and it is far preferable to limit Xy to values above those which lead to Xy• •r < -1 for any value of •r. This strategy was adopted in this study and is strongly recommended in general.

Constraint 3 regards the absolute difference I X• - Xyl, which was found to closely reflect the degree of curvature of the MOVE. 1 line in logarithmic space. In record extension the highest base station flows often lie above those of the concur- rent period, or in a domain which is more sparsely populated than the lower discharges. In a scheme which weights each point in the concurrent period equally, the highest flow domain in the extended period thus has less influence on the selection of X values and represents at least to some extent an extrapo- lation from lower discharges. Highly curved transformations, those with large IX• - Xyl, may accurately reflect a nonlinear relationship over most of the range of concurrent flows but can lead to large errors at high discharges, as compared to the more conservative log transformation (Xx = Xy = 0), which is linear in logarithmic space. (This is illustrated in Figure 1). In practice, it was found that limiting IXx - Xyl to two increments of the set specified in constraint 1, equivalent to I X• - Xyl -<

Page 4: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

246 MOOG ET AL.: STREAMFLOW RECORD EXTENSION

Table 1. Streams Used in Flow Extension Analysis

USGS

Station Record, Gage Location Number water years

Record

Quality

Characteristics of the Distribution

Mean of the Logarithms of Daily Flow Drainage Annual

Area, Flow, Standard Skew Kurtosis knl 2 m3/s Deviation Coefficient Coefficient

Boise River near Twin Springs S.F. Payette River at Lowman Salmon River B1 Yankee Fork

near Clayton Salmon River at Salmon

Johnson Creek at Yellow Pine

Little Salmon River at Riggins

Salmon River at White Bird

Selway River near Lowell Lochsa River near Lowell

S.F. Clearwater River at Stites

13185000 1912-1995

13235000 1942-1995

13296500 1922-1971, 1974, 1977-1991

13302500 1914-1916, 1920-1995

13313000 1929-1995

13316500 1952-1954, 1957-1995

13317000 1911-1917, 1920-1995

13336500 1930-1995

13337000 1911-1912, 1930-1995

13338500 1965-1995

good 2,150 33.4 0.911 0.902 2.66 good 9,820 24.2 0.802 1.04 3.03 good 2,080 28.0 0.761 1.24 3.56

good 9,740 54.7 0.621 1.30 4.34

fair 552 9.63 1.08 1.20 3.32

fair 1,490 21.9 0.979 0.695 2.40

good 35,090 314 0.824 1.19 3.48

good 4,950 104 1.10 0.697 2.45 good 3,060 79.6 1.10 0.598 2.35

good 2,980 28.9 1.02 0.528 2.23

USGS, U.S. Geological Survey; S.F., South Fork.

0.34, improved the estimation of extreme high flows, with little effect on lower flows. While this constraint is recommended in

general, it is not critical, and may be relaxed, particularly when the highest 5% of flows are not of particular importance.

The pairs of Box-Cox exponents (;ix, ;iy) for this test were calculated using the maximized log likelihood technique de- scribed earlier, subject to these constraints. That is, the pairs were chosen to minimize the sum of squares of the residuals when •, based on the short record is regressed on •, based on the base record, using ordinary least squares regression, with the candidate pairs subject to constraints 1, 2, and 3. Note that •, is defined by (3) and (5).

3. Application of Record Extension to Selected Database

The extension techniques were applied to pairs of streams with long gage records, taking one of each pair in turn as the short-record stream for an assumed concurrent period. Then order statistics of the extended and measured records were

compared. The data were drawn from U.S. Geological Survey gages on

ten streams in central Idaho, all within the Snake River basin (Table 1). Discharge records are rated by the U.S. Geological Survey as good at all but two sites, where they are fair. All basins receive the bulk of their precipitation as snow. The annual hydrograph is dominated by snowmelt, which occurs from March to June. There exists some diversion for agricul- tural purposes in many of the basins, but irrigated lands ac- count for no more than 4% of any basin. This test is thus representative of extensions using base stations from a reason- ably homogeneous hydrologic region.

The test data may also be characterized by r 2 values of the base and short-record daily log-transformed mean flows during the assumed concurrent periods. Figure 2 shows the distribu- tion of these values.

3.1. Design of Tests Each of the streams was used as both a base station and a

short station for each of the others. All except the Salmon River below Yankee Fork used assumed concurrent (i.e.,

short-record) periods of both 1976-1985 and 1986-1995, lead- ing to 9 x 8 x 2 -- 144 pairs. Because it lacked some data in those periods, the Salmon below Yankee Fork was paired with each of the others for 1982-1991, adding another 9 x 2 pairs for a total of 162. For each pair the designated short stream was extended from the assumed concurrent period to each of the years for which an actual record existed in both the base and short stream. For example, with Salmon River at White Bird as base x, and Lochsa as short y, for a concurrent period

c o• o ß

Figure 2. Box plot of squared correlation coefficients for the 162 concurrent data sets used to generate record extensions. The boxes cover the interquartile range. The line inside it is the median. Whiskers extend to the farthest points not beyond 1.5 times the interquartile distance from the median. The horizon- tal lines are outliers.

Page 5: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

MOOG ET AL.: STREAMFLOW RECORD EXTENSION 247

of 1976-1985, X c and Y c correspond to 1976-1985, while x•r and/P•r cover 1911-1912, 1930-1975, and 1986-1995.

For each extension technique, logarithms of the extended flow record, iPc•r -- Y c U /P•r, were compared over the full range of exceedance probabilities. (In this paper, exceedance probability refers to the probability that a flow sampled from the test set exceeds a certain discharge.) For each of the 162 stream pairs, three techniques were used to generate the ex- tended record: Box-Cox transformation with MOVE. 1 extension, logarithmic transformation with MOVE.1 extension, and log- arithmic transformation with ordinary least squares extension.

3.2. Error Metrics

Analyzing the general behaviors of extension techniques over different ranges of discharge requires a metric by which to compare the sets of extended and actual flow rates. Metrics for regression, such as standard error, are based on errors of individual points, where each error represents the difference between the extended and actual flow corresponding to a base stream discharge. This approach is not strictly applicable in this case because we are not attempting to predict individual points. However, because the extension functions are mono- tonic, the approach is equivalent to measuring error in flow rate percentiles corresponding to each base streamflow and is therefore appropriate.

An error metric requires a way to combine, or weight, the errors of individual points. The error weighting itself has two components. The manner in which the residuals are scaled, or normalized, affects the relative weight accorded different lev- els of discharge. Second, the functional form of the residuals affects the relative weight accorded different degrees of error. Given a specific application, the metric might be dictated by an objective function which provides different values for alterna- tive outcomes. Otherwise, it is not possible to choose a defin- itively optimal metric, but nonetheless a judicious choice is helpful in assessing the particular merits of different extension techniques.

It is preferable to scale the residuals by computing errors not in flow rate Q, but in the logarithm of Q [Hirsch, 1979]. In this way, errors are appropriately weighted relative to the discharge level. Otherwise, the metric would be dominated by the largest discharges in the largest streams. In addition, this transforma- tion tends to improve the normality of discharge histograms, which generally exhibit strongly positive skew [Hirsch, 1982]. Therefore this study calculates errors in terms of lnQ.

The two most common alternatives for the functional de-

pendence of the metric on these errors are to take their abso- lute values (first order) or squares (second order). The latter is probably more common, employed in the familiar root-mean- squared error (rmse). The use of rmse stems in large part from its identity as the standard error of estimate in linear regres- sion, for which it is useful if the residuals are normally distrib- uted. Second-order metrics are very sensitive to large errors and far more susceptible to domination by a small fraction of outliers than are first-order metrics.

For this study, both first- and second-order metrics were calculated. Their implications are equivalent. It was judged that the first-order metric provided a more general error as- sessment and a more intuitive interpretation, since use of lnQ and absolute residuals leads to the mean multiplicative error (MME) [Moog and Jirka, 1998]:

N

I ln - In (y

MME = exp N ' (7)

where N is the number of estimated data to be averaged. The MME is equal to the geometric mean of (iP/Y) and thus may be regarded as the mean factor by which an estimate is "off," that is, the mean factor, greater than or equal to unity, by which an estimated point would have to be multiplied or divided to equal the true value. It is also the mean absolute deviation of the logarithms of the flows. The second-order metrics included both the rmse of the logarithms and the chi-squared error of the logarithms, in which the squared residuals are divided by the true lnQ.

Another metric, the normalized mean error or NME [Wilson and Macleod, 1974],

1 N ^ NME -= • • y i - y-----•/ , (8)

i=1

uses the true value rather than the logarithm as a scaling factor and dispenses with the absolute values so that positive and negative errors cancel, leading to a measure of bias. Positive NME indicates overprediction, while negative NME indicates underprediction.

4. Accuracy of Flow Rate Estimates This section compares the accuracy of flow extension using

the different transformation and extension equations pre- sented earlier. The notation follows that defined earlier; spe- cifically, "ln/ols" is extension by ordinary least squares regres- sion (equation (1)) using natural log transformations of the flow rate; "ln/MOVE. i" is extension by the MOVE.1 tech- nique (equation (2)) using natural log transformations (or, equivalently, (3) with • = 0 and (4)); and "BC/MOVE. i" is MOVE. 1 with Box-Cox transformations of the flow rates

(equations (3) and (4)).

4.1. General Flows

4.1.1. MOVE versus ols. Figure 3 illustrates the mean multiplicative error exhibited by the three extension tech- niques in estimating the flow rates having exceedance proba- bilities from 98 to 2%. Overall, In/MOVE.1 exhibits smaller MME than ln/ols for flows between 98 and 6% exceedance

probability. Hirsch [1982] found that the MOVE techniques removed the bias toward overprediction of extreme low monthly flows exhibited by ols. Comparison of In/MOVE. 1 with ln/ols in Figure 4 shows effective bias correction for daily flows having exceedance probability greater than about 10%. The plot clearly depicts ols overestimation of low flows and underestimation of high flows, as would be expected from its tendency to produce an extended record with too low a vari- ance.

The full analysis was also performed using MOVE.2 exten- sion with both Box-Cox and logarithmic transformations. MOVE.2 was found to be slightly more accurate than MOVE.1 at all flow rates, though not enough to compel its use if MOVE. 1 is much more convenient. ,Because their predictions are very close, conclusions regarding MOVE.1 may be applied equally to MOVE.2.

Page 6: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

248 MOOG ET AL.: STREAMFLOW RECORD EXTENSION

BC/MOVE.1

m •. .............. In/MOVE.1 • "- In/ols J

ß

o

'" /....

•: ........ .. /• \ ..' ,,, -......... /• • :

c 0 ',, "•' -... // \

I I I I I I I I I I I I I I I I I I I I I I I I

96 92 88 84 80 76 72 68 64 60 56 52 48 44 40 36 32 28 24 20 16 12 8 4

Low flows High flows Exceedance Probability, %

Figure 3. Mean multiplicative error of the tested extension techniques in estimating discharges of various exceedance probabilities.

4.1.2. Box-Cox versus Logarithmic Transformations. Comparing the MOVE techniques using logarithmic (ln/ MOVE. 1) and Box-Cox (BC/MOVE. 1) transformations, Fig- ure 3 shows that the choice of transformation may have an equally large effect on accuracy as does the choice of regres- sion. Because the natural logarithm is simply a particular case of the Box-Cox transformation, one might expect the more flexible tool to produce better results. Figure 3 shows this to be the case, at least for flows of exceedance probability greater than 2%. The Box-Cox transformations lead to smaller errors

in ranges over which MOVE. 1 showed modest or no improve-

ment over ols, using log transformations, while retaining the improvement in the 22 to 8% range.

Figure 4 shows that despite the lower mean errors, the Box- Cox transformations reintroduce some bias, but it is probably negligible for flows smaller than those having a 10% ex- ceedance probability. The magnitude of the BC/MOVE.1 bias reaches that of ln/ols at 2% exceedance probability.

4.2. Extreme Flows

Figures 3 and 4 do not contain information on flows less than 98% or greater than 2% exceedance probability. These ex-

BC/MOVE.1

............. In/MOVE. 1

In/ols

I I I 1 ! I I 1 I I I I I I I I I I I ! I '1 I I

96 92 88 84 80 76 72 68 64 60 56 52 48 44 40 36 32 28 24 20 16 12 8 4

Low flows High flows Exceedance Probability, %

Figure 4. "Normalized mean error" of the extension techniques, indicating bias.

Page 7: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

MOOG ET AL.: STREAMFLOW RECORD EXTENSION 249

BC/MOVE.1

• ............. In/MOVE.1 l•) ' --

• In/ols

o

-,,,,.,

= ' .... ,-----.7 ' X

'-- "':'•-":i'"'-'"-"-'""'"'""':-.'..: ' ' ...... ' I I I I I I I I I I I I I I I I I I I I I I I I I I I I' I I I I I I

99.99 99.5 99 97 95 90 70 50 30 10 5 3 1 0.5 0.1 0.01

Low flows High flows Exceedance Probability, %

Figure 5. Mean error of the tested extension techniques for a selected set of exceedance probabilities (indicated by axis tick marks), emphasizing extreme flows.

treme flows are more difficult to estimate accurately, in part because the magnitudes of the discharges may lie outside the range of the concurrent period or in a more sparsely populated domain. In addition, the hydrologic relationship between the streams may differ from that for more moderate flows, a par- ticular problem for functions such as MOVE. 1 and els, which linearly extend the overall trend to more extreme flows. For applications in which extreme flows are of primary importance, other techniques may be preferable [Stedinger et al., 1993]. Nonetheless, extreme flows may be important in cases of

record extension, and it is useful to examine how well they are estimated by the techniques under discussion.

Mean errors for a set of exceedance probabilities emphasiz- ing extreme discharges are plotted in Figure 5. The improve- ment afforded by the BC/MOVE.1 technique extends down to the minimum flow rate in the extended record, but els exhibits lower mean errors for the 2% highest flows. Figure 6 shows that the els bias tends toward zero for exceedances below !%, while that of BC/MOVE.1 continues to increase. Thus the

daily flows from MOVE. 1 exhibit the same bias improvement

i BC/MOVE.1 •o I ............. In/MOVE.1

ß (5 - -- In/ols

L• _, ......... ../:"'"" ....... N / % øøøøøø

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I

99.99 99.5 99 97 95 90 70 50 30' 10 5 3 I 0.5 O. 1 0.01

Low flows High flows Exceedance Probability, %

Figure 6. Bias of the tested extension techniques for a selected set of exceedance probabilities (indicated by axis tick marks), emphasizing extreme flows.

Page 8: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

250 MOOG ET AL.: STREAMFLOW RECORD EXTENSION

o

o

i - i

i ' i

( a ) Exceedance Prob.= 98%

BC/MOVE. 1 In/MOVE. 1 In/ols

o

o

( b ) Exceedance Prob.= 80%

BC/MOVE. 1 In/MOVE. 1 In/ols

o

o

( c ) Exceedance Pr0b.= 50%

o

o

i ; i

,

,

,

i - !

( d ) Exceedance Prob.= 20%

BC/MOVE. 1 In/MOVE. 1 In/ols BC/MOVE. 1 In/MOVE. 1 In/ols

Figure 7. Box plots showing the distribution of errors in estimating the logarithms of discharges, lnQ, having exceedance probabilities of (a) 98%, (b) 80%, (c) 50%, (d) 20%, and (e) 2%.

at low levels that Hirsch [1982] observed for monthly flows, but it is not observed for the 1% highest flows.

That the BC/MOVE.1 technique performs better at extreme low flows than at extreme high flows is largely because of the method used to select the exponents X. The analytical tech- nique described earlier weights each point equally, but, even after log transformation, low flows are much more common than high flows. Conversely, we found graphical selection to produce much more similar errors at high and low flows, pre- sumably because the human eye reacts more strongly to the

overall point pattern on a scatterplot and less strongly to its density. Thus an analyst selecting Box-Cox exponents may wish to use a graphical technique, adjust the way in which points are weighted, or select different values for different flow domains, depending on the relative importance of each domain to the application.

It should be noted that large errors tend to be associated with poorly matched pairs of base and short-record stations, characterized by low correlation coefficients. By using every combination of the test streams in this analysis, we are includ-

Page 9: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

MOOG ET AL.: STREAMFLOW RECORD EXTENSION 251

I : I

( e ) Exceedance Prob.= 2%

BC/MOVE. 1 In/MOVE. 1

Figure 7. (continued)

In/ols

ing pairs which a judicious analyst would be very unlikely to employ. These are characterized by the lower correlation co- efficients in Figure 2. Thus the prevalence of large errors at high flows is probably overemphasized in this study.

4.3. Error Distributions

Examination of the spread of the errors via box plots aids in understanding the causes of the test results. Figure 7 shows the errors of estimate of lnQ at several exceedance probabilities. Each box plot represents 162 values, one for each stream pair.

At median flows the three techniques exhibit roughly equal mean error (Figure 3), and Figure 7c shows the spread of

errors to be similar, with a slightly smaller interquartile range for ols offset by its greater bias. However, this case is atypical; at other discharges the error spread is smallest for BC/ MOVE.1. In those cases the In/MOVE.1 box plots show that the main effect of MOVE.1 is to remove the bias exhibited by ols, while the BC/MOVE.1 curve shows that further improve- ment using the Box-Cox transformation arises from its reduced error spread.

At 2% exceedance probability the mean errors are again similar (Figure 5), because the reduced interquartile range of BC/MOVE.1 is offset by a greater tendency toward positive outliers, as seen in Figure 7e. This trend grows at higher flows, so that while the MOVE.1 techniques (BC/MOVE.1, in par- ticular) usually produce better estimates than ln/ols, even at extreme high flows, the large overpredictions are much more likely. This result reflects the risk of extrapolating a relation- ship based on the generally more limited concurrent flow do- main, suggesting that the less sensitive ols technique may lend a desired conservatism above 2% exceedance. It also suggests that a hybrid technique or an entirely different approach (e.g., parametric; Stedinger et al. [1993]) may be desirable, if ex- treme-flow estimation is a primary focus.

Each box plot in Figure 8 covers 162 points representing the set of mean multiplicative errors for each stream pair across all the exceedance probabilities (98, 96, 94,..-, 2%). These box plots confirm that BC/MOVE. 1 produces smaller errors for the great majority of stream pairs, as indicated by its smaller in- terquartile range and span. It has more outliers, relatively speaking, though they are still smaller than when using ln/ols. Some stream pairs are simply poorly matched and less prone to improvement.

5. Application In selecting a flow record extension technique, one should

consider which range of flows are salient to the intended ap- plication. The preceding analysis indicated that use of the Box-Cox transformation, with the described analytical tech- nique for selecting exponents, may lead to large errors in the 2% highest flows in some cases.

BC/MOV E. 1 In/MOVE. 1 I n/o Is

Figure 8. Box plot showing the distribution of the mean error of discharge over all exceedance probabilities.. Each box plot contains one value for each stream pair.

Page 10: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

252 MOOG ET AL.: STREAMFLOW RECORD EXTENSION

0 20 40 60 80 1 O0

Percentage of Sediment

Figure 9. Mean error of the tested extension techniques in estimating discharges below which various percentages of total bed load are transported.

An important class of problems which is unlikely to depend on extreme flows is that involving cumulative transport over time, for example, transport of water, sediment, or pollutants. We would like to test BC/MOVE. 1 and other techniques for such a problem. It is widely recognized that flows of sufficiently low exceedance probability contribute little to the total bed load transport over time, despite their high discharges [Wol- man and Miller, 1960; Andrews, 1980; Whiting et al., 1999].

As a test application, we therefore select the question of what discharge levels are needed to transport various fractions of the total bed load. This knowledge is vital to assessing the effects of water withdrawals or climate change on a stream's ability to maintain its channel. We start with a record of daily flow rates, either real or extended. A set of bed load values is calculated for each day using a sediment rating, that is, an equation relating discharge to bed load transport rate. A cu- mulative bed load transport curve is generated by taking a set of daily flow rates covering the observed range and, for each of these flow rates, summing the bed load transported on all days having an equal or lesser flow rate. The resulting curve tells how much of the total bed lo.ad is transported by flows at or below a selected flow rate.

In a calculation of cumulative bed load transport a realistic record of flow rates is important. For consistency with the preceding analysis we reverse the question and ask not how accurate the predictions of sediment transport are, but how accurately we have estimated the set of discharges Q B below which certain fractions of the total bed load are transported. By comparing a set of Q B generated using an extended flow record to a set generated using measured flows, one may cal- culate the mean error arising from use of a specific extension technique.

These test calculations require bed load sediment ratings. Because the available ratings for the test streams are them- selves estimates of varying accuracy, and because we wish to test only the extension techniques, we adopt an identical, typ-

ical, but fictitious sediment rating for each test stream, specif- ically,

Qs •c Q2.5, (9)

where Q is the flow rate and Qs is the bed load transport rate [Whiting et al., 1999]. The rating includes no coefficient be- cause any coefficient would have no effect on calculations of the dimensionless bed load fractions.

Figures 9 and 10 illQstrate the error in values of Q• corre- sponding to 5-95% of the total bed load transport. In Figure 9, mean multiplicative error in estimating Q• is plotted as a function of the corresponding transported fraction of bed load. The plot generally reflects the discharge error seen in Figure 3, with BC/MOVE. 1 improving upon ln/ols where In/MOVE. 1 did not, and exhibiting more consistent accuracy over the range of sediment fractions. Like Figures 7 and 8, box plots of error in lnQ• (Figure 10) reflect the bias improvement of MOVE. 1 and, in ranges where BC/MOVE. 1 improves upon In/MOVE. 1 (Figure 10b), the reduced error spread obtained by use of the Box-Cox transformation. A plot of overall MME values for each stream pair (not shown) appears virtually identical to that for the flow rate order statistics (Figure 8).

Among the tested extension equations, MOVE. 1 (equation (2)), using Box-Cox-transformed flow rates (equation (3)), provides the most consistent accuracy in predicting discharges needed to carry specified fractions of the total load. This result supports our conclusion that this technique is an improvement on log-transformed regression, for cumulative transport calcu- lations in particular.

6. Conclusions

Both the MOVE. 1 extension equation and the use of Box- Cox variable transformations improved mean daily flow record extension using a single base station, resulting in improved

Page 11: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

MOOG ET AL.: STREAMFLOW RECORD EXTENSION 253

( a ) Sediment Fraction = 20%

! . ,

, ,

, , ß . ,

i i i I i i

BC/MOVE. 1 In/MOVE. 1 In/ols

( b ) Sediment Fraction - 50%

! . ,

BC/MOVE. 1 In/MOVE. 1 In/ols

( c ) Sediment Fraction - 80%

iii , , i i i , I

L•. i , i • -

i

BC/MOVE. 1 In/MOVE. 1 In/ols

Figure 10. Box plots showing the distribution of errors in estimating the logarithms of discharges below which (a) 20%, (b) 50%, and (c) 80% of total bed load is transported.

estimates of the flows needed to transport cumulative bed load fractions. They succeeded by reducing bias and better approx- imating the variance of the extended record, and by more accurately conforming to the relationships between pairs of streams. Though more modest for poorly matched streams, the improvement is observed across the full set of pairings and discharges. Therefore the Box-Cox/MOVE. 1 technique may be recommended generally for streamflow record extension. However, the risk of large errors for the 2% highest flows is greater when using Box-Cox/MOVE. 1 with Box-Cox exponents based on equal weights for each discharge pair in the concur- rent record. For applications in which these high flows are of particular importance, an analyst selecting exponents may wish to use a technique which weights high flows more strongly (such as graphical selection) or to select different exponents in this domain. A more conservative approach for high flows is to use least squares regression with log-transformed discharges, which generally avoids large errors but produces a record with excessively low variance. However, the risk of large errors

using Box-Cox transformations with MOVE. 1 is small with a carefully chosen base station.

References

Alley, W. M., and A. W. Burns, Mixed-station extension of monthly streamflow records, J. Hydrol. Eng., 109, 1272-1284, 1983.

Andrews, E. D., Effective and bankfull discharges in the Yampa basin, Colorado and Wyoming, J. Hydrol., 46, 311-330, 1980.

Box, G. E. P., and D. R. Cox, An analysis of transformations, J. R. Stat. Soc., Ser. B, 26, 211-246, 1964.

Box, G. E. P., and P. W. Tidwell, Transformations of the independent variables, Technometrics, 4, 531-550, 1962.

Clarke, R. T., Statistical Modelling in Hydrology, John Wiley, New York, 1994.

Cook, R. D., and S. Weisberg, Residuals and Influence in Regression, Chapman and Hall, New York, 1982.

Cook, R. D., and S. Weisberg, An Introduction to Regression Graphics, John Wiley, New York, 1994.

Draper, N., and H. Smith, Applied Regression Analysis, 2nd ed., John Wiley, New York, 1981.

Hirsch, R. M., Synthetic hydrology and water supply reliability, Water Resour. Res., 15, 1603-1615, 1979.

Page 12: Streamflow Record Extension Using Power ...MOOG ET AL.: STREAMFLOW RECORD EXTENSION 245 Cox [1964] may be employed [see also Cook and Weisberg, 1982, p. 60; Draper and Smith, 1981,

254 MOOG ET AL.: STREAMELOW RECORD EXTENSION

Hirsch, R. M., A comparison of four streamflow record extension techniques, Water Resour. Res., 18, 1081-1088, 1982.

Hirsch, R. M., D. R. Helsel, T. A. Cohn, and E. J. Gilroy, Statistical treatment of hydrologic data, Handbook of Hydrology, edited by D. R. Maidment, chap. 17, pp. 17.1-17.55, McGraw-Hill, New York, 1993.

Matalas, N. C., and B. Jacobs, A correlation procedure for augmenting hydrologic data, U.S. Geol. Surv. Profi Pap., 434-E, 1964.

Moog, D. B., and G. H. Jirka, Analysis of reaeration equations using mean multiplicative error, J. Environ. Eng., 124, 104-110, 1998.

Searcy, J. K., Flow-duration curves, Manual of Hydrology: Part 2, Low- Flow Techniques, U.S. Geol. Surv. Water Supply Pap., 1542-,'1, 25 pp., 1959.

Stedinger, J. R., R. M. Vogel, and E. Foufoula-Georgiou, Frequency analysis of extreme events, Handbook of Hydrology, edited by D. R. Maidment, chap. 18, pp. 18.1-18.66, McGraw-Hill, New York, 1993.

Vogel, R. M., and J. R. Stedinger, Minimum variance streamflow record augmentation procedures, Water Resour. Res., 21, 715-723, 1985.

Whiting, P. J., J. F. Stamm, D. B. Moog, and R. L. Orndorff, Sediment transporting flows in headwater streams, Geol. Soc. Am. Bull., in press, 1999.

Wilson, G. T., and N. Macleod, A critical appraisal of empirical equa- tions and models for the prediction of the coefficient of reaeration of deoxygenated water, Water Res., 8, 341-366, 1974.

Wolman, M. G., and J. Miller, Magnitude and frequency of forces in geomorphic processes, J. Geol., 698, 54-74, 1960.

D. B. Moog and P. J. Whiting, Department of Geological Sciences, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106-7216. (e-mail: [email protected])

R. B. Thomas, 14851 SE Michael Court, Sherwood, OR 97140.

(Received October 14, 1997; September 8, 1998; accepted September 8, 1998.)