16. regionalization and multivariate analysis. the ...hydrologie.org/act/bernier/ber_0273.pdftask...

12
16. Regionalization and multivariate analysis. The canonical correlation approach G.S. Cavadias Invited Professor at INRS-Eau 21 Alopekis, Athens 106 75 Greece Abstract The purpose of this paper is to present a method of regionalization of flood flows. The paper starts with a comparison of various concepts of homogeneous regions (geographically or hydrologically defined and à priori defined or basin-centered) and goes on to describe the canonical correlation method as applied to the delineation of basin-centered homogeneous regions and the estimation of flood characteristics of ungauged basins. The proposed method enables the user to study the relations between the basin-related and flood-related variables in an intuitive way and, as such, it is a step towards the physical modelling of the flood phenomenon. Résumé Cet article présente une discussion du concept des régions homogènes et propose une méthode d’estimation régionale des débits de crue. La première partie de l’article donne une comparaison de différentes définitions des régions homogènes (géographiques, hydrologiques, a priori ou centrées sur le bassin étudié). La deuxième partie présente une description de la méthode de corrélation canonique appliquée à la détermination de la région homogène ayant comme centre le bassin étudié et appliquée aussi à l’estimation des caractéristiques de crue de bassins non jaugés. La méthode proposée donne la possibilité d’étudier les relations entre les variables météorologiques du bassin considéré et ses caractéristiques de crue, et par conséquent, elle peut être considérée comme un pas vers la modélisation physique du phénomène. 16.1. Introductory remarks The problem of estimating floods with small exceedance probabilities is inherently fraught with difficulties. Even where long records at the site do exist, they are frequently not sufficiently long to estimate the floods required for the design of water resources projects; moreover, large floods may belong to a population other than that of the recorded floods. Strictly speaking, the flood estimation problem cari be described as follows : assuming that river flow is a multivariate, non-stationary, seasonal stochastic process in which the variables are the components of the hydrologie cycle we want to estimate the value of the peak discharge q~ that Will be exceeded with probability (I/7’) and to determine the probabilities of exceedance of other components of the flood hydrograph (volume, duration, starting date, etc) taking into account that the floods at different sites of a region are dependent and should really 273

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

16. Regionalization and multivariate analysis. The canonical correlation approach

G.S. Cavadias Invited Professor at INRS-Eau 21 Alopekis, Athens 106 75 Greece

Abstract

The purpose of this paper is to present a method of regionalization of flood flows. The paper starts with a comparison of various concepts of homogeneous regions (geographically or hydrologically defined and à priori defined or basin-centered) and goes on to describe the canonical correlation method as applied to the delineation of basin-centered homogeneous regions and the estimation of flood characteristics of ungauged basins. The proposed method enables the user to study the relations between the basin-related and flood-related variables in an intuitive way and, as such, it is a step towards the physical modelling of the flood phenomenon.

Résumé

Cet article présente une discussion du concept des régions homogènes et propose une méthode d’estimation régionale des débits de crue. La première partie de l’article donne une comparaison de différentes définitions des régions homogènes (géographiques, hydrologiques, a priori ou centrées sur le bassin étudié). La deuxième partie présente une description de la méthode de corrélation canonique appliquée à la détermination de la région homogène ayant comme centre le bassin étudié et appliquée aussi à l’estimation des caractéristiques de crue de bassins non jaugés. La méthode proposée donne la possibilité d’étudier les relations entre les variables météorologiques du bassin considéré et ses caractéristiques de crue, et par conséquent, elle peut être considérée comme un pas vers la modélisation physique du phénomène.

16.1. Introductory remarks

The problem of estimating floods with small exceedance probabilities is inherently fraught with difficulties. Even where long records at the site do exist, they are frequently not sufficiently long to estimate the floods required for the design of water resources projects; moreover, large floods may belong to a population other than that of the recorded floods. Strictly speaking, the flood estimation problem cari be described as follows : assuming that river flow is a multivariate, non-stationary, seasonal stochastic process in which the variables are the components of the hydrologie cycle we want to estimate the value of the peak discharge q~ that Will be exceeded with probability (I/7’) and to determine the probabilities of exceedance of other components of the flood hydrograph (volume, duration, starting date, etc) taking into account that the floods at different sites of a region are dependent and should really

273

Page 2: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

be analyzed together. It is apparent that, thus formulated, the flood estimation problem is intractable.

Currently used methodologies are based on simplifying assumptions which do not adequately reflect the complexity of the phenomenon.

In the case of stations with long records, the usual approach is to Select a probability distribution, to use the available maximum annual flood data for estimating its parameters and to compute the flood magnitudes corresponding to specified return periods. In addition to sampling uncertainties, the estimation of flood magnitudes beyond the range of observed floods is also subject to mode1 uncertainties.

In the case of ungauged basins, the flood characteristics are estimated using the known geographical, physiographical and meteorological data of the basin of interest along with the corresponding data of basins with «similar» characteristics which have long flow records. This creates the need for determining «homogeneous» regions the basins of which behave similarly with respect to maximum annual floods.

16.2. The concept of «homogeneous region»

The partition of a large area (continent, country, province, state, etc) into homogeneous regions for a given purpose [Grigg (1976), Lvovich (1973), Falkenmark (1976), WMO (1986)], is a standard tool in the geophysical sciences. It must be noted, however, that the mapping of hydrological variables in the form of isolines or homogeneous regions becomes less feasible for smaller areas and shorter time periods over which the variables are averaged. While in tue case of maximum floods, geographically defined homogeneous regions are convenient for the design engineer, this definition has the disadvantage that geographically contiguous basins may not be hydrologically similar [Linsley (1982), Cunnane (1986), Wiltshire (1986), U.S. Water Resources Council (1977)]. This difficulty led a number of researchers [Mosley (198 l), Gottschalk (1985), Wiltshire (1986)] to define homogeneous regions in the space of flood-related variables e.g. specific mean annual flood, coefficient of variation, coefficient of skew, etc, using multivariate analysis methods with particular emphasis on cluster analysis. Cluster analysis is used to discover «natural» clusters [Dillon and Goldstein (1984)] and is based on the assumption that such clusters exist; however, the existence of natural clusters cannot be taken for granted without prior testing [Rogers (1974), Dubes and Zeng (1987)]. If such clusters do not exist, the final set of «homogeneous regions» depends on the clustering method and the initial partitioning of the space. In addition, if the «regions» are defined in the space of flood-related variables, it is difficult to relate the pattern of homogeneous regions to the topographical, physiographical and meteorological conditions of the area. TO overcome this difficulty, some proponents of this approach seek to relate empirically the homogeneous regions in the space of flood-related variables to the geographical coordinates [Mosley (198 l), Gottschalk (1985), Wiltshire (1986)]. Because the assignment of a basin to a region presents some problems, particularly near the boundaries, some researchers [Wiltshire (1986)] have introduced the concept of fractional membership. Another way of defining homogeneous regions is to consider each basin as having its own region [Acreman and Wiltshire (1989), Burn (1990a), Cavadias (1990)]. In summary, the determination of homogeneous regions cari be made in the spaces of geographical, physiographical, or hydrological variables and the basin of interest may be assigned to a region or be the centre of its own region. A comparison of different types of homogeneous regions is presented in tables (16.1) and (16.2), which show that each type of region has advantages and disadvantages. Consequently it is in the interest of the design engineer to investigate the homogeneous regions in a11 relevant spaces.

274

Page 3: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

Table 16.1. Comparison of geographically and hydrologically defined homogeneous regions

Geographically defined regions

Hydrologically defined regions

Advantages Disadvantages 1) Commonly used in climatology and 1) Neighboring basins may not be

meteorology. hydrologically similar. 2) Easy to understand and to relate to 2) Not applicable to small areas.

geographical, physiographical and meteorological factors.

3) Facilitates the standardization of flood estimation procedures

1) Definition of homogeneous regions is based 1) Delineation of homogeneous regions on variables relevant to the flood estimation by cluster analysis which presupposes problem. the existence of clusters.

2) Possibility of defining homogeneous regions 2) Difficult to relate to physical causative centered on the basin of interest. factors.

Table 16.2. Comparison of a priori defined and basin centered homogeneous regions (neighborhoods)

Advantages Disadvantages A priori defined Usually defined by government services and Number and boundaries of regions depend homogeneous easy to use by the design engineer. on the algorithm, particularly in the regions absence of «natural» clusters. Basin -centered Elimination of the problem of assignment (or Delineation by the design engineer homogeneous fractional assignment) of the basin of interest in depends on his experience. regions a homogeneous region. (neighborhoods)

16.3. The canonical correlation method

It is the purpose of this paper to show that the multivariate method of canonical correlation [Hotelling (1936)] which deals with the relations between two or more groups of variables i.e. variables belonging to different spaces, is the appropriate tool for accomplishing this complex task [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged basin cari be subdivided into two stages :

a) The delineation of homogenous regions. b) The estimation of the maximum annual floods using the data of the basins in the

homogeneous region. The canonical correlation method cari be used for both stages but, once the

homogeneous region has been delineated by canonical correlation, any of the other available methods may be used to estimate the maximum flood. A brief description of the canonical correlation method in the context of flood estimation is given in the appendix. The application of canonical correlation to flood estimation is carried out in two stages :

Stage 1. Analysis of gauged basins 1.1. Selection of the geographical physiographical, and meteorological basin variables

(XI> ‘..> x,,) and the flood-related variables (q,, . . . . q,,J where usually m -<p. The inclusion of the three types of variables in the set (XI, . . . . x,,) enables the user to

determine the relative importance of each type of causative variable of the flood phenomenon and choose the appropriate spaces for the homogeneous regions.

1.2. Calculation of the two sets of canonical variables (VI, . . . . v,) and (WI, . . . . w,). 1.3. Estimation of the flood-related canonical variables (%,, . . . . klnI) from the

corresponding basin-related canonical variables using equation (16.13) of the appendix. 1.4. Estimation of the flood-related variables (QI, . . . . Q,,,) using the regressions Qj =~(VI,

. ..> v,) on the canonical basin variables. It is important to note that the regression equation Qj = f(w1, ..*, w,) is equivalent to the regression equation Qj = f (xl, . . . . x,J. Thus, the canonical

275

Page 4: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

variables achieve a conditional reduction of the dimensionality of the space of basin variables from p to m, based on their relations with the flood variables. In practical terms, this means that the number of flood-related variables that cari be estimated using the linear regressions on the basin variables is equal to the number of significant canonical correlations, which is less than or equal to m.

1.5. Examination of the patterns of points in the scatterplots of the pairs of canonical variables (VI, vz), (WI, ~2) etc with the purpose of determining :

a) the existence of «natural» clusters in these spaces b) the existence, number and locations of outliers c) the degree of similarity of point-patterns in the spaces (VI, vz), and (WI, ~2). 1.6. Analysis of the relative importance of various groups of explanatory variables

(geographical, physiographical, meteorological). This analysis cari be accomplished using the matrices R,, and R,, of the coi-relation coefficients between the original and the canonical variables (Equations 16.11 and 16.12 of the appendix).

1.7. Study of the estimation errors of the canonical variables (WI, . . . . w,) for the gauged stations. This is achieved by calculating the residual vectors (wi-Dvi), i = 1, 2, . . . . 12 for each gauged basin in the m-dimensional space (w,, . . . . w,). Given the properties of canonical variables, the components of the residual vectors are uncorrelated and independent of the location of the points in the space of the canonical variables (WI, wj, .,. w,). Consequently, the scatter diagrams (w, , wz), (w, , wj), etc cari serve to verify the independence of the error vectors, and the existence of outliers. The study of the error vectors described above should be suplemented by a corresponding analysis in the space of the original flood-related variables. For example, if the flood-related variables are the 2-year flood (qz) and the ratio (qiao /q2) of the lOO-year flood to the 2-year flood, the canonical spaces (v, , ~2) and (WI , WZ) are two- dimensional. In this case, we cari also plot the scatter diagram (q2 , 4100 /‘qz) and include the estimated points (QI, 8100 &) and the corresponding residual vectors. This scatter diagram has the advantage of being directly interpretable and is a useful complement to the (WI , ~2) diagram.

Stage 2. Estimation of the floods of an ungauged basin z 2.1. Calculation of the canonical variables VI (z), . . . . vm (z) from equation (16.1) of the

appendix, using the basin variables xl (z), . . . . x,, (z). This calculation is based on the assumption that the coefficients of equation (16.1) are valid for the ungauged basin z.

2.2. Estimation of the flood-related canonical variables {af, (z), . . . . km (z)/ from the regression equations : %j (z) = rj vj (z) (Eq. 16.13 of the Appendix).

2.3. Estimation of the (l-a) - neighborhood of the basin z in the space (WI , . . . . w,,) using equation (16.14) or (16.15) of the Appendix. The choice of the level (1-a) is the result of a compromise between the number and the degree of homogeneity of the basins in the neighborhood [Ribeiro-Correa et al. (1995)].

2.4. Estimation of the flood-related variables of basin z, using the basins of its neighborhood by canonical correlation or any other regional flood estimation method (Index flood, regression etc).

Regional flood estimates cari also be used for basins with long flow records as a complement to single-station estimates because they help clarify the relations between the floods and their causative factors. It is also possible to combine locally obtained and regional estimates using a bayesian approach. Professor Jacques Bernier contributed substantially to the introduction of bayesian methods to hydrology and water resources problems [Bernier (1967, 1981)]. In the case of combination of regional and local flood estimates the empirical bayesian approach is used [Vicens et al. (1975), Kuczera (1982), Bernier (1992)]. It is important to note that the variante of the combined estimate does not exceed the smaller of the variantes of the two components [Granger and Newbold (1977)]. This result may

276

Page 5: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

contribute to a wider use of regional analysis and empirical bayesian methodologies for flood estimation of both gauged and ungauged sites.

16.4. Application

The canonical correlation method described above is applied to the estimation of the maximum annual floods of the province of Newfoundland in Canada. This case is discussed in detail in Cavadias (1989).

Figure (16.1) shows a map of the province of Newfoundland on which the locations of 21 drainage basins are indicated. The flood -related variables selected are based on the fitted three-parameter log-normal distribution and are :

= the two-year quantile of the distribution yi,oo /qz) = the ratio of the hundred-year to the two-year quantile of the distribution.

(This variable is a measure of the dispersion of the distribution). Based on a preliminary analysis, the following basin variables are selected :

XI = log [Drainage Area (Km * )] x2 = PLS = log (P er cent of drainage area controlled by lakes and swamps) ~3 = MAR = log (mean annual runoff mm). Given that in this case m = 2, there are two pairs of canonical variables (VI , VZ), (WI ,

w2). The values of the canonical correlation coefficients are rI = 0.989 and Q = 0.76 and both are significant at the 5 per cent level. Consequently the basin variables are significantly related to both the median and the dispersion of the distribution of maximum annual floods.

Fig. 16.1. Map of Newfoundland

277

Page 6: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

1.5 F 1

0.5

0

-0.5

-1 t -0.5

-2

-2.5

-2

.T %l

“1

.E eA. . . . .c H --

‘P) b .S

, -

1 _

i.

1:

-3 -2 -1 0 1 2

Fig. 16.2. (v, , vz) - Diagram for a11 basins

1.5 -’ !

.L

1 -’ l F l I : : .*N’

Q 0.5 -

.J ‘G ‘F.

‘H _

o-.- j .- ?.c _

l T l p l h

-0.5 - .B

%l -1 -’

Floou : .A

-1.5 - I l s ‘0 -

-21. ..I i

-3 -2 -1 0 1

Fig. 16.3. (w,, wJ) - Diagramfor a11 basins

Figures (16.2) and (16.3) show the scatter diagrams of the canonical variables (Y, , 11~) and (w, , w2) where a11 21 basins have been used in the computation. An examination of these figures shows that the point-patterns in the two scatter diagrams are similar. The next step is to examine the stability of these patterns by excluding each basin in turn and plotting the two scatter diagrams, including the estimated point for the omitted basin.

1.5

1

0.5

0

0.5

1

0.5

2

2.5

3

l U 1

l h! i -I

3 2 1 0 1 2

Fig. 16.4. (II,, v?) - Diagram without bnsin G Fig. 16.5. (w,, w,) - Diagram without basin G

1.5

1

0.5

0

-0.5

-1

-1.5

-2

.L l F

.*.N

.J Q

06 OK 00

l C

.T ‘P ‘E

l H -

Figures (16.4) and (16.5) show the(v, , vz) and (w, , w2) diagrams resulting from the computation without basin G. The coordinates of this basin in the diagram (w, , w2) are computed using equation ( 16.13) of the appendix. The omission of basin G does not change the point-patterns appreciably.

278

- -----7 --I

Page 7: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

2- 1.5

1

0.5

0

-0.5

-1

-1.5

-2

4 1

4 T

t

L

M

2.5

3

(2, ,i?>j. 43 00 4/0«

Fig. 10.6. Diagranz of CM’, , w3-j Fig. 16.7. Diagrarn of (@? ,-). (CI2 t-) 42 42

Figure (16.6) shows the {(a, , fi?), (w, , IV~)} scatter plot including the error vectors for each basin. An examination of this plot reveals that :

a) As expected. the vertical components of the error vectors are generally larger than the horizontal components.

b) There are some basins (e.g. L. M, R) with large error vectors (This may be explained by the small drainage areas of the basins L and R and the location of basin M). Figure (16.6) is complemented by the intuitively interpretable figure (16.7) which is a scatter diagram of { (QJ, (@,oo/~I), (si, (qlol,/q2) } including the error vectors.

Before proceeding to the determination of the homogenous region (neighborhood) of an ungauged basin (e.g. basin G) it is useful to examine the importance of each esplanatory variable by studying the diagram (Fig. 16.8) of squared correlations between the basin, canonical and flood variables derived from the matrices IX,,, and RR,,. of the appendix (Eq. 16.11 and 16.12). This diagram includes only the squared correlations that are signifkant at the 5 per cent level.

279

Page 8: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

log (drainage area)

bl (% area controlled

by lakes)

,og

(mean annual run off)

Fig. 16.8. Diagranz of squared correlation coefficients

This diagram shows that : a) As expected, the variable log q2 is highly correlated with the first canonical variable

WI which, in turn, is correlated with VI and the basin variable log (drainage area). b) The variable (q,oo /qz) is highly negatively correlated with w2 which is in turn

correlated with the canonical variable ~2. This variable is negatively correlated with the variable log (mean annual runoff) and positively correlated with the variable log (per cent of area controlled by lakes and swamps).

The neighborhood of the ungauged basin G is defined by equation (16.15) of the appendix. Using (1-a) = 0.75 an elliptical contour is obtained containing the basins (F, Q, K, E). At this point it is also interesting to determine the corresponding 75 per cent neighborhood in the space (VI , VI). This neighborhood consists of the basins (F, Q, K, 0, E) i.e. it includes the additional basin 0. An examination of figure (16.1) shows that the geographical locations and the other basin variables of these basins are reasonably close to those of the ungauged basin G.

Table 16.3. Flood variable estimation ungauged basin G, Neighborhood (F, Q, K, 0, E)

BASIN DRAINAGE AREA

PLS MAR QICO Km* per cent mm m”/sec m’kec

.. . ............................................................................ ..- .......................... ?Y ..- ...................... 9.4 .................... .!56 _ ....................... 7ftT ..................... !.!.3. Q 764 91 1024 240 376 ................................................................................. ..-...................................- ......................... ..- ............ . ............... . ............................ _ ..............................

... . ........................................................................... ..- ........................ !2?! ._ ...................... 92 .................... .E8 _ ...................... ‘74 ................... .296

.. . ........................................................................... ..- ........................ !.!70 .- ...................... 96 .................. !.!!G - ..................... 2x.. .................. .3.67 .. E 470 100 1162 86.3 183 ................................................................................... - ................................... -. .......................... _ ............................ _ ............................ _. ............................. OVERALL RANGE 3.90 - 2640 60 - 100 788 - 1364 2.94 - 590.0 5.37 - 760 ................................................................................. ..-...................................- ...................... . _____ ............................ . ............................. ............................. RANGE IN NEIGHBORHOOD 391- 1290 91 -100 8546 -1082 74 - 240 118-376 ................................................................................... -. ................................ ..- ......................... ..~ .......................... ..-............................~ ............................. _ MEAN IN NEIGHBORHOOD 105.7 183.4 ................................................................................... -. ................................ ..- ........................... _ .......................... ..- ............................. ............................. OBSERVED FOR BASIN G 529 95 984 91.3 164

280

Page 9: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

Given the small number of basins in the (w, , PV?) - neighborhood and that basin 0 is near the boundary of this neighborhood, we may estimate the flood characteristics of basin G using the (VI , ~2) - neighborhood (F, Q, K, 0, E). A rough approximation of the variables q2 (G) and ( Y/OO KWqdW cari be obtained using the means of the corresponding variables of the basins in the neighborhood (Tab. 16.3).

The above application shows that the canonical correlation method gives an insight into the relations of the flood-related variables with the geographical, physiographical and hydrological characteristics of the basin and thus brings the user closer to bridging the gap between the statistical mode1 used for the estimation and the physically based mode1 of the flood phenomenon which is not feasible at the present time.

Appendix

Given YE basins, p standardized basin-related variables Xj and nz standardized flood-related variables qi (e.g. quantiles of a fitted probability distribution), where usually p-> m, we seek HZ pairs of standardized canonical variables vj and wj given by the equations :

Vj = Uj/ Xl + Uj~X~ + . . . + UjpXp (16.1) WJ = bji qj + b/2 ~2 + ... + bjm q,u ( 16.2) j = 1, 2, . . ., nz

which have the following properties : 1) r (uj , Vk) = 0 (16.3)

1, (Wj , W/o = 0 ( 16.4) r (Vj , Vk) = 0 ( 16.5)

for,j f k i.e. the canonical variables with different indices are uncorrelated.

2) The first pair of canonical variables (VI, w,) has the largest correlation coefficient 1-1 (II,, IV,) of a11 linear combinations of the sets of variables (XI, . . . . x,,) and (q,, . . . . q,,, ).

3) The second pair of canonical variables (~2, ~2) has the largest correlation coefficient rz (vZ, w2 ) of a11 linear combinations of the sets of variables (xl, . . . . x,,) and (y,, . . . . y,,! ) which are uncorrelated with the canonical variables (VI, WI ) etc.

The canonical variables are calculated as follows : 1) We form the nxp matrix X= [xg] and the (nxm) matrix Q = (q;j] where

i = 1, 2, . . . . IZ and find the m eigenvalues and m eigenvectors of the (mxm) matrix :

F = R,; RC,, L-k, (16.6)

where R is the partitioned correlation matrix of the two sets of variables.

R= (16.7)

2) We use the (nzxrn) diagonal matrix Ë of the eigenvalues ël 2 ë2 2 . . . > ë,,, and the (17~x11~) matrix B of the eigenvectors of the matrix F. In addition, we compute the matrix :

A = R.v,,: ’ R,,, BË %

(16.8)

3) The columns of the matrices A and B are the coefficients of the canonical variables v, and V?i respectively given by equations ( 16.1) and ( 16.2) or in a matrix form :

281

-

Page 10: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

V = [vu] = XA (16.9) W = [wtj] = QB (16.10)

The square roots of the eigenvalues (c,, cJ, . . . . c,,,) are the correlation coefficients I; of the corresponding pairs of canonical variables (v,, . . . . v,,,) and (w,, . . . . w,,,). The matrices of the correlation coefficients of the original and canonical variables are given by the equations :

R,v,. = R,, A (16.1 1) R,,w = R,, B (16.12)

These equations, along with the canonical correlations, help to determine the contributions of each of the original basin variables to the flood variables and therefore to the similarity of point patterns in the diagrams (vi, vk) and (wj, wk).

The canonical variables (w,, . . . . \Vi, . . . . cv,,,) cari also be estimated using the canonical variables of the first set from the equation :

i$=˔v (16.13)

Given an estimated point ?Y= (%,, a)?, . . . . %j, ‘AJ,,,) in the m-dimensional space of the canonical variables and under the normality assumption, the (l-u) per cent confidence region for the point % = (%, , . . . . k,,,) is given by the equation :

( w-^w,i (I,,, - ËJ’ ( w-?v) 5x’ (u, m) (16.14) In the special case of 172=2 the (l-a) elliptical confidence region is given by the

equation : (w, - tbJ2 (w-, - hz) r y (a, 2)

+ (16.15) 1 - r,’ 1 - i-2’

If CU is a point representing a basin for which the flood-related variables (@,, . . . . @,,,) are estimated from the basin-related variables (x,, . . . . x,,), this confidence interval cari be interpreted as the (1-a) per cent neighborhood of the point W (Ribeiro-Correa et al. 1994). The differences (wj -‘i%J, ,i=l, . . . . 112 are the residuals of the regressions of the canonical variables W, on the corresponding variables vj (Eq. 16.13).

It must be noted that the determination of the neighborhoods using weighted distances (Burn, 1990 a and b) or Mahalanobis metrics (Formula 16.14) are only approximations to the hydrologie neighborhood of the basin.

Computer programs for canonical correlation usually plot diagrams of (v,, NJ,), (VA W) etc., i.e. the pairs of canonical variables having maximum correlation coefficients. Given the difficulties in interpreting the canonical variables (e.g. Kendall and (Stuart, 1968), it is preferable to plot the uncorrelated pairs of canonical variables (v,, vz), (v,,vj) . . . (vj, v,J etc., where jf k along with the corresponding scatter diagrams (w,, w2) . . . (wj, wk) of uncorrelated flood-related canonical variables. The pairs of canonical variables (v,, ~2) (w,, WZ) etc. respectively define the spaces of linearly transformed basin- and flood-related variables in which the points represent individual basins. If the basin variables are good predictors of the flood-related variables, the patterns of points in the corresponding scatter diagrams are similar.

282

Page 11: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

Bibliography

ACREMAN, M.C. and S.E. WILTSHIRE. (1989) ’ The regions are dead: long live the regions. Methods of identifying and dispensing with regions for flood frequency analysis ’ . IAHS Pub]. no. 187, 175-1988.

BERNIER, J. (1967) ’ Les méthodes bayesiennes en hydrologie statistique ’ . (Essai de reconciliation de l’hydrologie et du statisticien). First International Hydrology Symposium, Fort Collins: 46 I-470.

BERNIER, J. (198 1) ’ Eléments de statistique bayesienne ’ . EDF Report HE 40/8 1.06. BERNIER, J. (1992) ’ Modèle regional à deux niveaux d’aléas ’ . Interim Report NSERC

Strategic Grant No STR 0118482, 11 p.p. BURN, D.H. (1990a) ’ An appraisal of the «region of influence» approach to flood frequency

analysis ’ . Hydrological Sciences, Journal, 35 (2) 149-165. BURN, D.H. (1990b) ’ Evaluation of regional flood frequency analysis with a region of

influence approach ’ . Water Resources Research 26 (10) 2257-2265. CAVADIAS, G.S. (1989) ’ Regional flood estimation by canonical correlation ’ . Paper

presented to the 1989 Annual Conference of the Canadian Society of Civil Engineering, St. John’s Newfoundland.

CAVADIAS, G.S. (1990) ’ The canonical correlation approach to regional flood estimation ’ . Regionalization in Hydrology. Proc. of the Ljubljana Symposium, IAHS. Publ. No. 191:171-178.

CUNNANE, C. (1986) ’ Review of statistical models for flood frequency estimation ’ . Keynote paper in: International Symposium on Flood Frequency and Risk Analysis (Baton Rouge, May 1986). Reidel.

DILLON, W.E. and M. GOLDSTEIN. (1984) Multivuriute Analysis, p. 139. John Wiley. DUBES, R. and G. ZENG. (1987) ’ A test for spatial homogeneity in cluster analysis.

Classification 4, 33-56. FALKENMARK, M. (1976) Wuter in a S&wing World. Westview press, Boulder, Colorado. GOTTSCHALK, L. (1985) ’ Hydrological regionalization in Sweden ’ . Hydrol. Sci. J. 30 (1). GRANGER, C.W. J. and P. NEWBOLD. ( 1977) Forecasting Economie Time Series,

Academic Press. GRIGG, D. (1976) ’ Regions Mode]s and Classes in Intergrated Models in Geogruphy ’ (ed.

by R.J. Chorley) Methuen, London. HOTELLING, H. (1936) ’ Relations between two sets of variates ’ . Biometrica 28: 32 I-377. KENDALL, M.G. and STUART A. (1968) The udvunced Theory of Stutistics, Vol 3.2nd ed.

Charles Griffin & CO. London. KUCZERA, G. (1982) ’ Combining site - specific and regional information: an empirical

Bayes’ approach ’ . Water Resour. Res. Vol. 8, No. 2 pp. 306-314. LINSLEY, R.K. (1982) I Flood estimates. How good are they? ’ Wat. Resour. Res. 22 (9). LVOVICH, M.I. (1973) The World’s Wuter. Mir Publishers, MOSCOW. MOSLEY, M.P. (198 1) ’ Delimitation of New Zealand hydrological regions ’ . J. Hydrol. 49,

173- 192.

283

Page 12: 16. Regionalization and multivariate analysis. The ...hydrologie.org/ACT/bernier/BER_0273.pdftask [Cavadias (1989, 1990), Ribeiro-Correa et al. (19941. The flood estimation of an ungauged

RIBEIRO-CORREA, B.; G.S., CAVADIAS; B. CLEMENT and J. ROUSELLE. (1994) ’ Identification of hydrological neighborhoods using canonical correlation analysis ’ . Journal of Hydrology 173 (1995) 7 l-89.

ROGERS, A. (1974) Statistical Anulysis of Spatial Dispersion. Pion Ltd. UNITED STATES WATER RESOURCES COUNCIL. (1977) Guidelinesfor Determining

Flow Frequency. USWRC, 2 120 Long Island NW, Washington, DC. VICENS, G.J.; 1. RODRIGUEZ-ITURBE and J.C. Jr. SCHAAKE. (1975) ’ A Bayesian

framework for the use of regional information in hydrology ’ . Water Resources Res. Vol 11, No. 3 p.p. 405-414.

WILTSHIRE, S.E. (1986) ’ Regional flood analysis II: multivariate classification of drainage basins in Britain ’ . Hydrol. Sci. J. 3 1 (3).

WORD METEOROLOGICAL ORGANIZATION. (1986) ’ Water resource assessment in different hydrological regions ’ . Paper presented by WMO and UNESCO to the Workshop on Comparative Hydrology, Budapest, 112-212 July 1986.

284