representation of daily rainfall distributions using normalized rainfall curves

7
INTERNATIONAL JOURNAL OF CLIMATOLOGY, VOL. 16, 1157-1 163 (1996) 551.521 .I l(4) REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES IAN T. JOLLIFFE AND PETER B. HOPE Department of Mathematical Sciences. University of Aherdeen. Edward Wright Building, Dunhar Street, Aherdeen AB9 2Tb UK emai I : itj@maths . abdn.ac. uk Received 22 February I995 Accepted I2 December 199s ABSTRACT For daily rainfall the normalized rainfall curve (NRC) provides a plot of the cumulative percentage of rain days 6) against the cumulative percentage of rain amount (x). It is not immediately clear whether the equations that have been widely used to represent the relationship between x and y correspond to a valid probability distribution for daily rainfall amounts. We show that such distributions exist, but that they are truncated, i.e. not all positive values of rainfall have non-zero probability. In practice, the truncation may not be too important, but the form of the equations relating x and y is also determined for a number of well- known probability distributions that are not truncated. The fit of the corresponding NRCs to some published rainfall data is examined. KEY WORDS: daily rainfall; exponential distribution; gamma distributions; normalized rainfall curves; Weibull distributions I. INTRODUCTION Normalized rainfall curves (NRCs) provide a graphical means of representing the observed frequency distribution of daily rainfall amounts at any given rainfall station. The cumulative percentage (or proportion) of raindays, y, is plotted against the cumulative percentage (or proportion) of rainfall amounts, x. The NRCs have been used to display rainfall distributions by a number of authors. Here we focus on their use in the paper by Ananthakrishnan and Soman (1 989), subsequently referred to as AS. A number of hrther references can be found in AS. In AS the relationship between x and y is estimated by x =yexp[-b(100 -y)‘], (1) for suitably chosen constants b and c. This relationship is an adaptation of an earlier expression x = ay exp(by), (2) where a and b constants with a = exp(-b), which dates back to Olascoago (1 950). Equation (1) is shown by AS to give a good fit to data from various Indian rainfall stations. It is therefore of interest to investigate to what probability distributions for rainfall amounts equation (1) corresponds. In section 2 we demonstrate that equations (1) and (2) correspond to probability distributions that are truncated in the following sense. There is a lower bound I, to the possible values for rainfall amounts when equation (1) is valid, and upper and lower bounds u2 and f2 for rainfall amounts in the case of equation (2). Although it appears that this truncation may often not be of practical importance, it is desirable in some circumstances to have probability distributions that allow all non-zero values of x. For this reason we investigate, in section 3, the form of NRCs for some standard probability distributions. The results are mixed: results can be derived for the exponential distribution and for some Weibull distributions, and we show how the corresponding NRCs can be fitted to some of AS’S Indian data in section 4. However, there seems to be no readily available analytical solution for the equation relating x and y for general gamma distributions, a class of distributions that is often used to model rainfall data. CCC 0899-8418/96/101157-07 (Q 1996 by the Royal Meteorological Society

Upload: peter-b

Post on 06-Jun-2016

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES

INTERNATIONAL JOURNAL OF CLIMATOLOGY, VOL. 16, 1157-1 163 (1996) 551.521 . I l(4)

REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES

IAN T. JOLLIFFE AND PETER B. HOPE Department of Mathematical Sciences. University of Aherdeen. Edward Wright Building, Dunhar Street, Aherdeen AB9 2Tb UK

emai I : itj @maths . abdn .ac. uk

Received 22 February I995 Accepted I2 December 199s

ABSTRACT

For daily rainfall the normalized rainfall curve (NRC) provides a plot of the cumulative percentage of rain days 6) against the cumulative percentage of rain amount (x). It is not immediately clear whether the equations that have been widely used to represent the relationship between x and y correspond to a valid probability distribution for daily rainfall amounts. We show that such distributions exist, but that they are truncated, i.e. not all positive values of rainfall have non-zero probability. In practice, the truncation may not be too important, but the form of the equations relating x and y is also determined for a number of well- known probability distributions that are not truncated. The fit of the corresponding NRCs to some published rainfall data is examined.

KEY WORDS: daily rainfall; exponential distribution; gamma distributions; normalized rainfall curves; Weibull distributions

I . INTRODUCTION

Normalized rainfall curves (NRCs) provide a graphical means of representing the observed frequency distribution of daily rainfall amounts at any given rainfall station. The cumulative percentage (or proportion) of raindays, y, is plotted against the cumulative percentage (or proportion) of rainfall amounts, x . The NRCs have been used to display rainfall distributions by a number of authors. Here we focus on their use in the paper by Ananthakrishnan and Soman (1 989), subsequently referred to as AS. A number of hrther references can be found in AS. In AS the relationship between x and y is estimated by

x =yexp[-b(100 -y)‘], (1)

for suitably chosen constants b and c. This relationship is an adaptation of an earlier expression

x = ay exp(by), (2)

where a and b constants with a = exp(-b), which dates back to Olascoago (1 950). Equation (1) is shown by AS to give a good fit to data from various Indian rainfall stations. It is therefore of interest to investigate to what probability distributions for rainfall amounts equation (1) corresponds.

In section 2 we demonstrate that equations (1) and (2) correspond to probability distributions that are truncated in the following sense. There is a lower bound I , to the possible values for rainfall amounts when equation (1) is valid, and upper and lower bounds u2 and f2 for rainfall amounts in the case of equation (2). Although it appears that this truncation may often not be of practical importance, it is desirable in some circumstances to have probability distributions that allow all non-zero values of x. For this reason we investigate, in section 3, the form of NRCs for some standard probability distributions. The results are mixed: results can be derived for the exponential distribution and for some Weibull distributions, and we show how the corresponding NRCs can be fitted to some of AS’S Indian data in section 4. However, there seems to be no readily available analytical solution for the equation relating x and y for general gamma distributions, a class of distributions that is often used to model rainfall data.

CCC 0899-8418/96/101157-07 (Q 1996 by the Royal Meteorological Society

Page 2: REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES

1158 I. T. JOLLIFFE AND I? B. HOPE

2. PROBABILITY DISTRIBUTIONS CORRESPONDING TO NRCs

Using the notation of AS, let r I , r,, . . . , rN denote non-zero rainfall amounts observed on N days at a rainfall station, arranged in ascending order;

N

is the total observed rainfall. Then xk and yk are defined as N

i= 1 xk = rJR, yk = k / N , k = 1,2, . . . , N .

The plotted values (xk and yk), k = 1,2, . . . , N give the observed NRC for the station. Note that AS express xk and yk as percentages (i.e. they multiply both quantities defined above by 100). It is more convenient in what follows, and completely equivalent, to treat them as proportions.

Suppose that the rainfall amounts are observations on a random variable U, which has probability density function (p.d.f.) f (u ) , u > 0, and cumulative distribution function (c.d.f.)

F(u) = f(v)dv.

Let u be a particular value for U, and consider the relationship between y, the probability of a value of U no greater than u, and x, the proportion of total rainfall accounted for by falls no greater than u. We have

c rU

y = f(v)dv = F(u) Jo and

(3)

In equation (4) the denominator represents the overall mean rainfall, p, and the numerator is the contribution to the mean of falls not exceeding u. From equation (3), u = F-'(Y), and substituting in equation (4) gives

x = J'" vf(v)dv. P o

Note that the right-hand-side of equation (5) can be written as

1 - p F-'(w)dw, P o

where w = F(v), so dw =f(v)dv and v = F-'(w). Hence, differentiating both sides of equation (5) with respect toy gives

and

If we assume that rainfall can take all positive values we have F-'(O) = 0 and as p is finite it follows that

= O when y=O. dx - dY

Page 3: REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES

DAILY RAINFALL DISTRIBUTION CURVES

If equation (2) holds, x = uyexp(by) and

1159

dx - = u(1 + by) exp(by), dY

which cannot be zero when y = 0 (unless a = 0). Similarly, if equation (1) holds then equation (8) cannot be satisfied.

At first sight, this implies that equations (1) and (2) do not correspond to any probability distribution, but this is not the case if we relax the requirement that all positive values of v are possible.

From equation (l), expressed in terms of proportions rather than percentages, we find

The corresponding results for equation (2) are obtained by setting c = 1 in equations (9) and (lo). Equation (6 ) can be rewritten as

but y = 0 corresponds to the smallest possible value of the random variable for which the distribution function is E Similarly, y = 1 corresponds to the largest possible value of that random variable. When y = 0, we have from equation (9) that p(dx/dy) = p exp(-b), and for y = 1, p(dx/dy) = p( 1 + b).

Hence equation (1) implies that rainfall is restricted to the range [pexp(-b), p(1 + b)]. If equation (2) is used, the same lower bound applies, but there is no upper bound.

Equations (6), (7), (9), and (10) do not allow us to find a formula for the corresponding probability distribution Au). However, substituting equation (6 ) into (7) gives

- I

f (%) = [$I where we have taken p = 1 because the shape of the NRC, which is based on percentages or proportions, is independent of p. By letting y run fkom 0 to 1 in small steps, and calculating corresponding values of dx/dy and d2x/dg from equations (9) and (lo), we can use equation (12) to plot the function f(u) which is monotone, decreasing over all values of u in the range [exp(-b), (1 + b)].

3. NRCs FOR SOME STANDARD PROBABILITY DISTRIBUTIONS

We shall see that in the examples of section 4 the limits on the range of rainfall amounts implied by equations (1) and (2) may not be too restrictive. However, it is of interest to investigate the form of NRCs for some standard probability distributions that have been used to model rainfall amounts and which allow all possible positive values. Equation (5) expresses x in terms ofy, but with the inverse distribution function F-'(,v) appearing as a limit of integration it is not in a very useful form. For some probability distributions it is possible to use equation (5), or its equivalent form equation (6), to obtain an explicit expression for x in terms of y. For example, consider the exponential distribution with p.d.f.

Page 4: REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES

1160 1. T. JOLLIFFE AND P. B. HOPE

For this distribution, F(u) = 1 - e-"/P, so ify=F(u), then F - ' ( y ) = -p In(1 - y) . Substituting into equation (6) gives

dr -= -ln(l - y ) , dY

leading to

x = y + ( l -y)ln(l-y).

The exponential distribution is a special case, with a = 1 and j = 1 /p, of the family of gamma distributions with p.d.f.

exp(-Pu), u > 0, P" f,(u) = -u"-I r(L-4

and this family has often been used to model daily rainfall amounts-see, for example, Stem and Coe (1 984) and references therein.

For gamma distributions, equation ( 5 ) can be written x = F,+,[F;'(y)], where

= p ( v ) d v . 0

However, it appears to be impossible to derive explicit expressions relating x to y for general members of this family. The expression for

F,(u) = 5," & v"-I exp(-Pv)dv

cannot be written in closed form and so cannot be inverted to give F;'(y), which is required in equation (6). The reason for the popularity of gamma distributions in fitting rainfall data is that they provide a flexible family

of positively skewed distributions. As rainfall distributions are usually highly skewed, they can often be well-fitted by gamma distributions. An alternative family of distributions that can accommodate a similar range of distributional shapes is the Weibull family, with p.d.f.

f ( u ) = y6u'-' exp(-6u'), u > 0. (13) As with the gamma family, the exponential distribution arises as a special case, when y = 1 and 6 = l /p .

It is possible to solve to equation (6) for some members of the Weibull family. First, note that the solution of equation (6) will not depend on 6, so in finding the form of NRCs we can take 6 = 1 without loss of generality. We have F(u) = 1 - (exp(-uY) leading to F-'(y) = [- ln(l - y ) ] ' /Y . When y = 112, it follows that

x=4[2y+2(1 -y)In(l - + ( I -y)[ln(l - y > l 2 ] ,

and when y = 113, x = i [6y+ 6(1 -y)ln(l - y ) - 3(1 -y)[ln(l -y)I2 + (1 -y)[ln(l - y ) I3 ] .

Other cases when y-' is integer can be evaluated similarly, and expressions can also be found in some other situations. For example, when y = 213,

where @ denotes the c.d.f. of the normal (Gaussian) distribution with mean zero and variance one.

4. EXAMPLES

Equation ( I ) is fitted by AS to data from a number of Indian rainfall stations. In order to make the distribution of rainfall fairly homogeneous across each data set studied, different calendar months are analysed separately. As the

Page 5: REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES

DAILY RAINFALL DISTRIBUTION CURVES 1161

coefficient of variation (CV) of the data increases, so the NRC based on those data moves towards the top left (north-west) of the diagram, and by changing the constants a, b, and c in equations (1) and (2) a variety of NRCs can be well-fitted by these equations. Similar families of curves can also be generated by varying the parameter y in the Weibull family (equation (13)). This is illustrated in Figures 1 and 2.

It is suggested by AS that the NRCs are defined uniquely by their CVs. This certainly holds for the Weibull cases that we have examined-both the curve and the CV depend only on y and not on 6. For equations (1) and (2), a CV cannot be calculated readily, because equations (5) and (6) do not lead to explicit expressions for the mean and variance of the corresponding probability distributions.

Among their data sets, AS take measurements for Mangalore in July and for Bombay in September as special test cases, because they represent extremes in terms of the CV Mangalore in July has a CVof 96 per cent, whereas Bombay in September has a CV of 224 per cent. To assess the fit of equations (1) and (2) to those data, AS find values of y corresponding to

x = 0.03, 0.05, 0.10, 0.20, 0.30, .. .,0.80, 0.90 (14)

from the data set for Mangalore, and corresponding to

x = 0.02, 0.03, 0.10, 0.20,. . . ,0.80, 0.90 (15)

for Bombay. The x value in the data, x*, corresponding toy = 0.50 is also included in this set, and equals 0.154 for Mangalore and 0.055 for Bombay. For each value of y thus found, x is calculated using equations (1 ) and (2), with estimated values of a, 6, and c, and the x values calculated are compared with the values in equations (1 4) and (1 5). For equation (l), b and c are estimated by fitting the NRC exactly at (x* and 0.5) and (0.5 and y*), where y* is the value of y corresponding to x = 0.5. To estimate b in equation (2), the NRC is fitted exactly at (0.5 and y*).

Tables I and I1 and Figures 1 and 2 give information on the fit of the equations used by AS for these data sets. These also include corresponding information when NRCs based on exponential and Weibull distributions are fitted to data from Mangalore and Bombay respectively. Tables I and I1 present the mean, standard deviations, and root-mean-square error ( = [(mean)* + (standard de~iation)~] ' '~) of the differences between the fitted x values and those in equation (1 4).

For Mangalore, equation (1) gives a better fit than equation (2), but fitting the curve corresponding to an exponential distribution is as good as equation (1). Note that the CV for the exponential distribution is 100 per cent, close to that of the Mangalore data, which is 96.4 per cent.

For Bombay, equation (1) is much better than equation (2), whereas the Weibull with y = 1/2 gives an intermediate fit. It is likely that there are Weibull distributions with fits to the NRC that are competitive with that of equation (2). No attempt has been made to optimize the Weibull fit because of the difficulty of finding an expression for x in terms o fy for general values of y . In terms of finding the best-fitting Weibull distribution to the

0.0 i - I 0.0 0.5 1 .o

X

Figure 1. Fitted Bombay NRC using equation (2). ....... Fitted Bombay NRC using equation ( l ) , Fitted Mangalore NRC using equation (2). - - - - - - Observed NRCs for Bombay (upper curve) and Mangalore (lower curve),

Page 6: REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES

1162 1. T. JOLLIFFE AND P. B. HOPE

1.0 -

i’ I 0.0 -

0.0 0.5 1 .o X

Figure 2. Fitted Weibull NRC, y = 1/2, Fitted Weibull NRC, y = 2/3, ....... Fitted Weibull NRC, y = 1 (exponential distribution), _ _ _ - - -. Observed NRCs for Bombay (upper curve) and Mangalore (lower curve), .

Table I. Means, standard deviations, and root-mean-square errors (RMSE) of differences between observed and fitted x values in NRC curves for Mangalore

July rainfall

Mean Standard deviation RMSE

Equation (2) 0 . 0 9 1 . 4 6 1 . 4 6 Equation (1) 0 .37 0 . 7 7 0 . 8 5 Exponential - 0 . 6 1 0 ’ 59 0 . 8 5

Table 11. Means, standard deviations and root-mean-square errors (RMSE) of differences between observed and fitted x values in NRC curves for Bombay

September rainfall

Mean Standard deviation RMSE

Equation (2) - 1 . 3 4 6 . 0 9 6 . 2 3 Equation (1) - 0 . 0 9 0 . 6 7 0 . 6 7 Weibull (y = 1 /2) - 2 . 2 6 1 .18 2 . 5 5 Weibull ( y = 2/3) 10.66 4 . 7 0 11.65 Weibull ( y = 1/3) - 14.99 8 . 6 6 17.30

NRC of a data set the CV clearly gives some guidance. The CV for a Weibull with y = 112 is 224 per cent, precisely that of the Bombay data, while the CVs for Weibull distributions with y = 213, y = 113 are 154 per cent and 436 per cent respectively. These distributions give much worse fits-see Table 11, which also suggests that to achieve a mean difference (bias) as close to zero as possible y should be taken slightly above 112.

The values of b fitted by AS using equation (1) are 3.10 and 15.15 for Mangalore and Bombay respectively. The upper and lower bounds for rainfall values implied by these NRCs are bexp(-b), p(1 + b)]. The mean p is unknown, but if it is estimated by the sample mean F, we find a range of rainfall values of [1.6 mm, 145.1 mm], [4 x lop6 mm, 229 mm] for Mangalore and Bombay respectively.

Page 7: REPRESENTATION OF DAILY RAINFALL DISTRIBUTIONS USING NORMALIZED RAINFALL CURVES

DAILY RAINFALL DISTRIBUTION CURVES 1163

If equation (2) is fitted, there is no upper bound, but the lower bound is 2.8 mm for Mangalore and 0.005 mm for Bombay. Although the truncation of values for Bombay seems to be of little practical importance, there is a real possibility of observing values for Mangalore outside the allowable range, using either equation (1) or (2).

5 . DISCUSSION

Normalized rainfall curves provide a useful means of representing rainfall distributions. However, the choice of formulae for NRCs is problematic. Empirical formulae (1) and (2) are found to correspond to probability distributions for which the p.d.f. cannot be written down explicitly; nor can their means or variances. Furthermore, the distributions are truncated in the sense that rainfall values above and below certain thresholds have zero probability of occurrence. This truncation may not be too serious in practice, but it is nevertheless more realistic, for example when simulating rainfall amounts, to work with NRCs that correspond to standard probability distributions. Unfortunately, although the p.d.f., means and variances can be written down for families of distributions such as gamma and Weibull, there are difficulties in finding explicit expressions for NRCs for such probability distributions; although some progress is possible for Weibull distributions. It appears that, as yet, there is no ideal solution to the problem of modelling NRCs.

ACKNOWLEDGEMENTS

We are grateful to two anonymous referees, whose insights and comments helped considerably in our understanding of the relationships between the NRCs of equations (1) and (2) and corresponding probability distributions.

REFERENCES

Ananthakrishnan, R. and Soman, M. K. 1989. ‘Statistical distribution of daily rainfall and its association with the coefficient of variation of

Olascoago, M. J. 1950. ‘Some aspects of Argentinian rainfall’, Tellus, 2, 3 12-3 18. Stem, R. D. and Coe, R. 1984. ‘A model fitting analysis of daily rainfall data’, 1 Roy. Statist. SOC. Ser: A , 147, 1-34 (including discussion).

rainfall series’, Inr. J Climatol., 9, 485-500.