simultaneous estimation of monotone trends and seasonal patterns in time series of environmental...

Simultaneous estimation of monotone trends and seasonal patterns in time

series of environmental data

By

Mohamed Hussian and Anders Grimvall

2

Outline

Examples of monotone relationships in environmental data

Monotone regression in one or more independent variable

Simultaneous estimation of monotone trends and seasonal

patterns

Monte Carlo methods for constrained least squares regression

Simple averaging techniques

3

Tot-P concentrations (Brunsbüttel) versus water discharge(NeuDarchau) in the Elbe

River, Mean values for April 1985-2000

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1000 2000 3000 4000 5000 6000 7000

Monthly mean Runoff (m3/s)

Mon

thly

mea

n to

t-P

con

cent

ratio

ns

(mg/

l)

4

20

30

40

50

60

70

80

90

100

110

40 50 60 70 80 90 100

Humidity (%)

Mo

nth

ly a

vera

ge

ozo

ne

(m g

/m3 )

season1_(Jan-Mar) season2_(Apr-Jun) season3_(Jul-Sept) season4_(Oct-Dec)

Average monthly ozone concentrations versus humidity at Ähtäri in central

Finland

5

1980

1990

2000

0 2 4 6 8 10 12

2

4

6

8

10

19851990

19952000 0

5

10

15

2

4

6

8

10

Tot-N concentrations (mg/l)

Monthly mean concentrations of total

nitrogen at Brunsbüttel in the Elbe River

Year

YearMonthMonth Month

6

Tot-N concentrations (Brunsbüttel) in the

Elbe River, Mean values for July 1985-2000

0

1

2

3

4

5

6

7

8

1984 1986 1988 1990 1992 1994 1996 1998 2000

Year

Tot-

N c

on

cen

trat

ion

s (m

g/l)

Tot-N concentrations

7

Monotone regression

(isotonic or antitonic regression)

Given a set of two-dimensional data

Sort the data by x into

Minimise under the constraints

or

A well-known algorithm used to solve the problem is the PAV Algorithm

(Pool-Adjacent-Violators Algorithm), (Ayer, 1955; Barlow et al., 1972;

Hanson et al., 1973)

2)(1 )(

1 )ˆ( ini i yyn

)()2()1( ˆˆˆ nyyy

niii yx

1)()( ,

niii yx 1,

)()2()1( ˆˆˆ nyyy

8

Tot-N concentrations (Brunsbüttel) in the

Elbe River, Mean values for July 1985-2000

0

1

2

3

4

5

6

7

8

1984 1986 1988 1990 1992 1994 1996 1998 2000

Year

Tot-

N c

on

cen

trat

ion

s (m

g/l)

Tot-N concentrations The PAV output

9

If the data are already monotone, then the PAV algorithm will

reproduce them

The solution is a step function

If there are outliers, then the PAV algorithm will produce long,

flat levels.

The impact of outliers can be reduced by first smoothing the

data (Friedman and Tibshirani, 1984).

The PAV Algorithm

10

An algorithm for computing the least squares regression function which is constrained to be nondecreasing in each of several independent variables was developed by R. Dykstra & T. Robertson, 1984. The algorithm was written specifically for two independent

variables, and it is to produce the solution of

where is a given two-dimensional array of the original values;

is a nonnegative array of weights; and K is the class of two-dimensional arrays, G=( ) such that

whenever

Monotone regression in two independent variables

ji

ijijijKg wxgMinimise,

2)(

ijx

ijwijg

ljandki klij gg

11

Inefficient for relatively small data sets

Can not handle typical multiple regression data where at least

one of the explanatory variables is continuous

Unclear how seasonality can be handled

Limitations of classical algorithms for monotone regression in two or more

explanatory variables

12

1980

1990

2000

0 2 4 6 8 10 12

2

4

6

8

10

19851990

19952000 0

5

10

15

2

4

6

8

10

Tot-N concentrations (mg/l)



Year

YearMonthMonth Month

13

1

4

7

10

13 Jan

Apr

Jul

Oct0

1

2

3

4

5

6

Example of linear trend with a superimposed trigonometric seasonal

pattern

y

MonthYear

14


Let denote a time series of data collected

over m seasons

Let denote the sum of the trend and seasonal components

at time i

Determine by minimising

under the following constraints:

iy

i

ii yyS 2)ˆ(

iy

nyyy ,...,,

21

Method I

15


Monotonicity constraints

is either decreasing or increasing for each season

or

Seasonality constraints

The seasonal pattern is composed of convex and concave curve pieces, i.e.,

for all time points belonging to a given season.

iy

,,...,1,ˆˆ mNiyy mii

.,...,1,ˆˆ mNiyy mii

,0)ˆˆ2ˆ)(ˆˆ2ˆ( 1111 mimimiiii yyyyyy.1,...,2 mNi

Method I

16

Algorithm

General Information

The problem is a classical quadratic optimisation problem

The computational burden increases rapidly with the number of

variables and constraints

This burden can be a serious problem if the suggested

algorithms do not take into considerations the special features of

the constraints

Method I

17

Algorithm

Theoretical Solution Given a crude initial estimate of

form new estimates , k = 1, 2, …,

by employing an updating formula

: is a vector defining the shape of the adjustment

: is a scaling factor

Niyi ,...,1,ˆ 0

Niyi ,...,1,ˆ

Niy ki ,...,1,ˆ

,,...,1,ˆˆ 1 Nibhyy ki

ki

ki

Nib ki ,...,1,

h

Method I

18

Algorithm

1

4

7

10

13

16 S1

S6

S110

0.2

0.4

0.6

0.8

1

YearSeason

1 4

7

10

13

16 S1

S5

S90

0.2

0.4

0.6

0.8

1

Year

Season

Shapes of the functions used for updating the response surface

Method I

19

is determined in such a way that

is minimised and the desired constraints are satisfied.

Applying such a solution will reduce the original multivariate

optimisation problem to a sequence of univariate optimisation

problems.

i

ki

kii bhyyhS 2)ˆ()(

Algorithm

h

Method I

20

1985

1990

1995

2000 0

5

10

15

2

4

6

8

10

Response surface satisfying monotonicity

and convexity constraints

MonthYear

Fitted Tot-N concentration (mg/l)

Method I

21


Consider satisfying ,

where denotes a vector of m explanatory

variables,

is assumed to be monotone in ( nondecreasing or non-

increasing ).

Nondecreasing case, let be an initial estimate of which

could be the data itself, then consider

and

xZ

m )(xy

),...,,( 21 mxxxx

)(xm x

xxZxM x ':min)(1

Method II

)(xm

xxZxM x ':max)(2

y

22


Method II

M2 values

M1 values

0

1

2

3

4

5

6

7

8

1984 1986 1988 1990 1992 1994 1996 1998 2000 2002

Year

Tot_

N c

on

cen

trat

ion

s (m

g/l)

Tot-N conc. Lower limit Upper limit

23

For , the set of estimators

are non decreasing in , and work well for light-tailed error

distributions (Strand, 2003; Mukerjee & Stern, 1994).

The estimate of is the value that minimises

which is

x

10 )()()1()( 21 xMxMxM

Method II

2

)]([ x

x xMZ

x

x x

xMxM

xMxMxMZ2

12

121

)]()([

)]()()][([


24

Nonincreasing case, the same steps to create estimates based on

instead of and changing the signs on the

estimates back at the end to get the nonincreaing function.

Seasonality was handled by defining two monotone function

with respect to the seasons having high and low concentration

values.

),( ii YX ),( ii YX

Method II


25

Method II


0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14

observed response function1 function2

Maximum

Minimum

26

1985

1990

1995

2000 0

5

10

152

4

6

8

10



Year

Monthly mean Tot-N concentration (mg/l)

Method II Month

27

Method II

1985

1990

1995

2000 0

5

10

152

4

6

8

10

Fitted monthly mean concentrations of

total nitrogen at Brunsbüttel in the Elbe

River Fitted monthly mean Tot-N concentration (mg/l) based on the observed data.

YearMonth

28

1985

1990

1995

2000 02

46

810

12

2

3

4

5

6

7

8

9

Smoothed monthly mean concentrations of


River Lightly Smoothed monthly mean Tot-N concentration (mg/l) bandwidth( 0.05 )

Method II Year

Month

29

1985

1990

1995

2000 02

46

810

12

2

4

6

8

10

Smoothed monthly mean concentrations of


River Strongly Smoothed monthly mean Tot-N concentration (mg/l) bandwidth( 0.3 )

Method II Year

Month

30

0

5

10

1985 1990 1995 2000

0

10

20

30

40

50

60

Method II

Simple averaging techniquesin multidimensional case

Year

Water discharge levels

(10^9 m^3/month)

Smoothed Tot_N transport kton/month), February values 1985-2000

31

1985 1990 1995 2000 0

5

10

0

10

20

30

40

50

60

Method II

Simple averaging techniquesin multidimensional case

Fitted Tot_N transport kton/month), February values 1985-2000

Water discharge levels

(10^9 m^3/month)

Year

32

Results

The two Algorithms have performed satisfactorily on water quality

data from the Elbe River and other rivers,

Regardless of the features of the data sets that were examined, the

obtained sequences of fitted surfaces converges to a function that

could be interpreted as a sum of trend and seasonal components,

The components representing irregular variation provided a good

starting point for the detection of outliers,

The major drawback of The Monte Carlo Algorithm was the

computational burden

33

Results

Simple averaging techniques are efficient and work well for

initial estimates that have light-tailed

Simple averaging techniques are sensitive to outliers, and can

have problems with sparse data

For monotone functions, an alternative to using large bandwidth

is to use a slightly smaller bandwidth and then improve the

accuracy by making the estimates monotone

34

Conclusions

It is possible to combine non-parametric procedures with very natural constraints on the trend and seasonal components of time series of environmental data

The proposed procedures are so generally applicable that they can form the basis of fully-automatic systems for quality assessment and decomposition of time series of environmental data

Applications involving several explanatory variables or sparse data sets require further methodological work

simultaneous estimation of monotone trends and seasonal patterns in time series of environmental...

Documents

mean values

violators algorithm

wellknown algorithm

pav algorithman algorithm

monotone regressionisotonic

pav algorithm pooladjacent

whenevermonotone regression

data friedman