simultaneous estimation of monotone trends and seasonal patterns in time series of environmental...
TRANSCRIPT
Simultaneous estimation of monotone trends and seasonal patterns in time
series of environmental data
By
Mohamed Hussian and Anders Grimvall
2
Outline
Examples of monotone relationships in environmental data
Monotone regression in one or more independent variable
Simultaneous estimation of monotone trends and seasonal
patterns
Monte Carlo methods for constrained least squares regression
Simple averaging techniques
3
Tot-P concentrations (Brunsbüttel) versus water discharge(NeuDarchau) in the Elbe
River, Mean values for April 1985-2000
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1000 2000 3000 4000 5000 6000 7000
Monthly mean Runoff (m3/s)
Mon
thly
mea
n to
t-P
con
cent
ratio
ns
(mg/
l)
4
20
30
40
50
60
70
80
90
100
110
40 50 60 70 80 90 100
Humidity (%)
Mo
nth
ly a
vera
ge
ozo
ne
(m g
/m3 )
season1_(Jan-Mar) season2_(Apr-Jun) season3_(Jul-Sept) season4_(Oct-Dec)
Average monthly ozone concentrations versus humidity at Ähtäri in central
Finland
5
1980
1990
2000
0 2 4 6 8 10 12
2
4
6
8
10
19851990
19952000 0
5
10
15
2
4
6
8
10
Tot-N concentrations (mg/l)
Monthly mean concentrations of total
nitrogen at Brunsbüttel in the Elbe River
Year
YearMonthMonth Month
6
Tot-N concentrations (Brunsbüttel) in the
Elbe River, Mean values for July 1985-2000
0
1
2
3
4
5
6
7
8
1984 1986 1988 1990 1992 1994 1996 1998 2000
Year
Tot-
N c
on
cen
trat
ion
s (m
g/l)
Tot-N concentrations
7
Monotone regression
(isotonic or antitonic regression)
Given a set of two-dimensional data
Sort the data by x into
Minimise under the constraints
or
A well-known algorithm used to solve the problem is the PAV Algorithm
(Pool-Adjacent-Violators Algorithm), (Ayer, 1955; Barlow et al., 1972;
Hanson et al., 1973)
2)(1 )(
1 )ˆ( ini i yyn
)()2()1( ˆˆˆ nyyy
niii yx
1)()( ,
niii yx 1,
)()2()1( ˆˆˆ nyyy
8
Tot-N concentrations (Brunsbüttel) in the
Elbe River, Mean values for July 1985-2000
0
1
2
3
4
5
6
7
8
1984 1986 1988 1990 1992 1994 1996 1998 2000
Year
Tot-
N c
on
cen
trat
ion
s (m
g/l)
Tot-N concentrations The PAV output
9
If the data are already monotone, then the PAV algorithm will
reproduce them
The solution is a step function
If there are outliers, then the PAV algorithm will produce long,
flat levels.
The impact of outliers can be reduced by first smoothing the
data (Friedman and Tibshirani, 1984).
The PAV Algorithm
10
An algorithm for computing the least squares regression function which is constrained to be nondecreasing in each of several independent variables was developed by R. Dykstra & T. Robertson, 1984. The algorithm was written specifically for two independent
variables, and it is to produce the solution of
where is a given two-dimensional array of the original values;
is a nonnegative array of weights; and K is the class of two-dimensional arrays, G=( ) such that
whenever
Monotone regression in two independent variables
ji
ijijijKg wxgMinimise,
2)(
ijx
ijwijg
ljandki klij gg
11
Inefficient for relatively small data sets
Can not handle typical multiple regression data where at least
one of the explanatory variables is continuous
Unclear how seasonality can be handled
Limitations of classical algorithms for monotone regression in two or more
explanatory variables
12
1980
1990
2000
0 2 4 6 8 10 12
2
4
6
8
10
19851990
19952000 0
5
10
15
2
4
6
8
10
Tot-N concentrations (mg/l)
Monthly mean concentrations of total
nitrogen at Brunsbüttel in the Elbe River
Year
YearMonthMonth Month
13
1
4
7
10
13 Jan
Apr
Jul
Oct0
1
2
3
4
5
6
Example of linear trend with a superimposed trigonometric seasonal
pattern
y
MonthYear
14
Monte Carlo methods for constrained least squares regression
Let denote a time series of data collected
over m seasons
Let denote the sum of the trend and seasonal components
at time i
Determine by minimising
under the following constraints:
iy
i
ii yyS 2)ˆ(
iy
nyyy ,...,,
21
Method I
15
Monte Carlo methods for constrained least squares regression
Monotonicity constraints
is either decreasing or increasing for each season
or
Seasonality constraints
The seasonal pattern is composed of convex and concave curve pieces, i.e.,
for all time points belonging to a given season.
iy
,,...,1,ˆˆ mNiyy mii
.,...,1,ˆˆ mNiyy mii
,0)ˆˆ2ˆ)(ˆˆ2ˆ( 1111 mimimiiii yyyyyy.1,...,2 mNi
Method I
16
Algorithm
General Information
The problem is a classical quadratic optimisation problem
The computational burden increases rapidly with the number of
variables and constraints
This burden can be a serious problem if the suggested
algorithms do not take into considerations the special features of
the constraints
Method I
17
Algorithm
Theoretical Solution Given a crude initial estimate of
form new estimates , k = 1, 2, …,
by employing an updating formula
: is a vector defining the shape of the adjustment
: is a scaling factor
Niyi ,...,1,ˆ 0
Niyi ,...,1,ˆ
Niy ki ,...,1,ˆ
,,...,1,ˆˆ 1 Nibhyy ki
ki
ki
Nib ki ,...,1,
h
Method I
18
Algorithm
1
4
7
10
13
16 S1
S6
S110
0.2
0.4
0.6
0.8
1
YearSeason
1 4
7
10
13
16 S1
S5
S90
0.2
0.4
0.6
0.8
1
Year
Season
Shapes of the functions used for updating the response surface
Method I
19
is determined in such a way that
is minimised and the desired constraints are satisfied.
Applying such a solution will reduce the original multivariate
optimisation problem to a sequence of univariate optimisation
problems.
i
ki
kii bhyyhS 2)ˆ()(
Algorithm
h
Method I
20
1985
1990
1995
2000 0
5
10
15
2
4
6
8
10
Response surface satisfying monotonicity
and convexity constraints
MonthYear
Fitted Tot-N concentration (mg/l)
Method I
21
Simple averaging techniques
Consider satisfying ,
where denotes a vector of m explanatory
variables,
is assumed to be monotone in ( nondecreasing or non-
increasing ).
Nondecreasing case, let be an initial estimate of which
could be the data itself, then consider
and
xZ
m )(xy
),...,,( 21 mxxxx
)(xm x
xxZxM x ':min)(1
Method II
)(xm
xxZxM x ':max)(2
y
22
Simple averaging techniques
Method II
M2 values
M1 values
0
1
2
3
4
5
6
7
8
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
Year
Tot_
N c
on
cen
trat
ion
s (m
g/l)
Tot-N conc. Lower limit Upper limit
23
For , the set of estimators
are non decreasing in , and work well for light-tailed error
distributions (Strand, 2003; Mukerjee & Stern, 1994).
The estimate of is the value that minimises
which is
x
10 )()()1()( 21 xMxMxM
Method II
2
)]([ x
x xMZ
x
x x
xMxM
xMxMxMZ2
12
121
)]()([
)]()()][([
Simple averaging techniques
24
Nonincreasing case, the same steps to create estimates based on
instead of and changing the signs on the
estimates back at the end to get the nonincreaing function.
Seasonality was handled by defining two monotone function
with respect to the seasons having high and low concentration
values.
),( ii YX ),( ii YX
Method II
Simple averaging techniques
25
Method II
Simple averaging techniques
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12 14
observed response function1 function2
Maximum
Minimum
26
1985
1990
1995
2000 0
5
10
152
4
6
8
10
Monthly mean concentrations of total
nitrogen at Brunsbüttel in the Elbe River
Year
Monthly mean Tot-N concentration (mg/l)
Method II Month
27
Method II
1985
1990
1995
2000 0
5
10
152
4
6
8
10
Fitted monthly mean concentrations of
total nitrogen at Brunsbüttel in the Elbe
River Fitted monthly mean Tot-N concentration (mg/l) based on the observed data.
YearMonth
28
1985
1990
1995
2000 02
46
810
12
2
3
4
5
6
7
8
9
Smoothed monthly mean concentrations of
total nitrogen at Brunsbüttel in the Elbe
River Lightly Smoothed monthly mean Tot-N concentration (mg/l) bandwidth( 0.05 )
Method II Year
Month
29
1985
1990
1995
2000 02
46
810
12
2
4
6
8
10
Smoothed monthly mean concentrations of
total nitrogen at Brunsbüttel in the Elbe
River Strongly Smoothed monthly mean Tot-N concentration (mg/l) bandwidth( 0.3 )
Method II Year
Month
30
0
5
10
1985 1990 1995 2000
0
10
20
30
40
50
60
Method II
Simple averaging techniquesin multidimensional case
Year
Water discharge levels
(10^9 m^3/month)
Smoothed Tot_N transport kton/month), February values 1985-2000
31
1985 1990 1995 2000 0
5
10
0
10
20
30
40
50
60
Method II
Simple averaging techniquesin multidimensional case
Fitted Tot_N transport kton/month), February values 1985-2000
Water discharge levels
(10^9 m^3/month)
Year
32
Results
The two Algorithms have performed satisfactorily on water quality
data from the Elbe River and other rivers,
Regardless of the features of the data sets that were examined, the
obtained sequences of fitted surfaces converges to a function that
could be interpreted as a sum of trend and seasonal components,
The components representing irregular variation provided a good
starting point for the detection of outliers,
The major drawback of The Monte Carlo Algorithm was the
computational burden
33
Results
Simple averaging techniques are efficient and work well for
initial estimates that have light-tailed
Simple averaging techniques are sensitive to outliers, and can
have problems with sparse data
For monotone functions, an alternative to using large bandwidth
is to use a slightly smaller bandwidth and then improve the
accuracy by making the estimates monotone
34
Conclusions
It is possible to combine non-parametric procedures with very natural constraints on the trend and seasonal components of time series of environmental data
The proposed procedures are so generally applicable that they can form the basis of fully-automatic systems for quality assessment and decomposition of time series of environmental data
Applications involving several explanatory variables or sparse data sets require further methodological work