time series in r: forecasting and visualisation · ggplot2 package (for graphics) fma package (for...

Post on 24-May-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Time Series in R:Forecasting andVisualisation

Time series in R

29 May 2017

Outline

1 ts objects

2 Time plots

3 Lab session 1

4 Seasonal plots

5 Seasonal or cyclic?

6 Lag plots and autocorrelation

7 Lab session 2

2

Time series

Time series consist of sequences of observationscollected over time.We will assume the time periods are equallyspaced.

Time series examplesDaily IBM stock pricesMonthly rainfallAnnual Google profitsQuarterly Australian beer production

3

ts objects and ts function

A time series is stored in a ts object in R:a list of numbersinformation about times those numbers were recorded.

Example

Year Observation

2012 1232013 392014 782015 522016 110

y <- ts(c(123,39,78,52,110), start=2012)

4

ts objects and ts function

For observations that are more frequent than onceper year, add a frequency argument.E.g., monthly data stored as a numerical vector z:

y <- ts(z, frequency=12, start=c(2003, 1))

5

ts objects and ts function

ts(data, frequency, start)

Type of data frequency start example

Annual 1 1995Quarterly 4 c(1995,2)Monthly 12 c(1995,9)Daily 7 or 365.25 1 or c(1995,234)Weekly 52.18 c(1995,23)Hourly 24 or 168 or 8,766 1Half-hourly 48 or 336 or 17,532 1

6

ts objects

Class: “ts”Print and plotting methods available.

ausgdp

## Qtr1 Qtr2 Qtr3 Qtr4## 1971 4612 4651## 1972 4645 4615 4645 4722## 1973 4780 4830 4887 4933## 1974 4921 4875 4867 4905## 1975 4938 4934 4942 4979## 1976 5028 5079 5112 5127## 1977 5130 5101 5072 5069## 1978 5100 5166 5244 5312## 1979 5349 5370 5388 5396## 1980 5388 5403 5442 5482## 1981 5506 5531 5560 5583## 1982 5568 5524 5452 5358## 1983 5303 5320 5408 5531## 1984 5624 5669 5697 5736## 1985 5811 5894 5952 5965## 1986 5943 5924 5935 5979## 1987 6035 6097 6167 6227## 1988 6256 6272 6295 6345## 1989 6413 6468 6497 6511## 1990 6514 6512 6490 6442## 1991 6390 6346 6328 6340## 1992 6362 6389 6433 6491## 1993 6541 6566 6602 6671## 1994 6765 6847 6890 6918## 1995 6962 7018 7083 7134## 1996 7173 7212 7242 7276## 1997 7332 7400 7478 7550## 1998 7618

7

ts objects

start(ausgdp)

## [1] 1971 3

end(ausgdp)

## [1] 1998 1

frequency(ausgdp)

## [1] 4

8

ts objects

Residential electricity sales

elecsales

## Time Series:## Start = 1989## End = 2008## Frequency = 1## [1] 2354 2380 2319 2469 2386 2569 2576 2763 2844## [10] 3001 3108 3358 3076 3181 3222 3176 3431 3527## [19] 3638 3655

9

ts objects

start(elecsales)

## [1] 1989 1

end(elecsales)

## [1] 2008 1

frequency(elecsales)

## [1] 1

10

fpp2

Main package used in this course> library(fpp2)

This loads:

some data for use in examples and exercisesforecast package (for forecasting functions)ggplot2 package (for graphics)fma package (for lots of time series data)expsmooth package (for more time series data)

11

Outline

1 ts objects

2 Time plots

3 Lab session 1

4 Seasonal plots

5 Seasonal or cyclic?

6 Lag plots and autocorrelation

7 Lab session 2

12

ts objects

autoplot(ausgdp)

5000

6000

7000

1975 1980 1985 1990 1995

Time

ausg

dp

13

Time plots

autoplot(a10) + ylab("$ million") + xlab("Year") +ggtitle("Antidiabetic drug sales")

10

20

30

1995 2000 2005

Year

$ m

illio

n

Antidiabetic drug sales

14

Outline

1 ts objects

2 Time plots

3 Lab session 1

4 Seasonal plots

5 Seasonal or cyclic?

6 Lag plots and autocorrelation

7 Lab session 2

15

Lab Session 1

16

Outline

1 ts objects

2 Time plots

3 Lab session 1

4 Seasonal plots

5 Seasonal or cyclic?

6 Lag plots and autocorrelation

7 Lab session 2

17

Time plot

autoplot(a10) + ylab("$ million") + xlab("Year") +ggtitle("Antidiabetic drug sales")

10

20

30

1995 2000 2005

Year

$ m

illio

n

Antidiabetic drug sales

18

Seasonal plot

ggseasonplot(a10, year.labels=TRUE,year.labels.left=TRUE) +ylab("$ million") +ggtitle("Seasonal plot: antidiabetic drug sales")

1991 19911992 1992199319931994 19941995 1995

1996 199619971997

1998199819991999

2000 2000

20012001

20022002

2003 20032004

20042005 2005

2006 2006

2007

2007

2008

2008

10

20

30

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Month

$ m

illio

n

Seasonal plot: antidiabetic drug sales

19

Seasonal polar plotsggseasonplot(a10, polar=TRUE) + ylab("$ million")

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan/

10

20

Month

$ m

illio

n

year1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

Seasonal plot: a10

20

Seasonal subseries plots

ggsubseriesplot(a10) + ylab("$ million") +ggtitle("Subseries plot: antidiabetic drug sales")

10

20

30

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Month

$ m

illio

n

Subseries plot: antidiabetic drug sales

21

Quarterly Australian Beer Production

beer <- window(ausbeer,start=1992)autoplot(beer)

400

450

500

1995 2000 2005 2010

Time

beer

22

Quarterly Australian Beer Production

ggseasonplot(beer,year.labels=TRUE)

1992

1993

19941995

1996

199719981999

20002001

2002

2003

2004

20052006

2007

20082009

2010

400

450

500

Q1 Q2 Q3 Q4

Quarter

Seasonal plot: beer

23

Quarterly Australian Beer Production

ggsubseriesplot(beer)

400

450

500

Q1 Q2 Q3 Q4

Quarter

beer

24

Outline

1 ts objects

2 Time plots

3 Lab session 1

4 Seasonal plots

5 Seasonal or cyclic?

6 Lag plots and autocorrelation

7 Lab session 2

25

Time series patterns

Trend pattern exists when there is a long-termincrease or decrease in the data.

Seasonal pattern exists when a series is influencedby seasonal factors (e.g., the quarter ofthe year, the month, or day of the week).

Cyclic pattern exists when data exhibit rises andfalls that are not of fixed period (durationusually of at least 2 years).

26

Time series patterns

autoplot(window(elec, start=1980)) +ggtitle("Australian electricity production") +xlab("Year") + ylab("GWh")

8000

10000

12000

14000

1980 1985 1990 1995

Year

GW

h

Australian electricity production

27

Time series patterns

autoplot(bricksq) +ggtitle("Australian clay brick production") +xlab("Year") + ylab("million units")

200

300

400

500

600

1960 1970 1980 1990

Year

mill

ion

units

Australian clay brick production

28

Time series patterns

autoplot(ustreas) +ggtitle("US Treasury Bill Contracts") +xlab("Day") + ylab("price")

86

88

90

0 20 40 60 80 100

Day

pric

e

US Treasury Bill Contracts

29

Time series patterns

autoplot(lynx) +ggtitle("Annual Canadian Lynx Trappings") +xlab("Year") + ylab("Number trapped")

0

2000

4000

6000

1820 1840 1860 1880 1900 1920

Year

Num

ber

trap

ped

Annual Canadian Lynx Trappings

30

Seasonal or cyclic?

Differences between seasonal and cyclic patterns:seasonal pattern constant length; cyclic patternvariable lengthaverage length of cycle longer than length ofseasonal patternmagnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

31

Seasonal or cyclic?

Differences between seasonal and cyclic patterns:seasonal pattern constant length; cyclic patternvariable lengthaverage length of cycle longer than length ofseasonal patternmagnitude of cycle more variable thanmagnitude of seasonal pattern

The timing of peaks and troughs is predictable withseasonal data, but unpredictable in the long termwith cyclic data.

31

Outline

1 ts objects

2 Time plots

3 Lab session 1

4 Seasonal plots

5 Seasonal or cyclic?

6 Lag plots and autocorrelation

7 Lab session 2

32

Example: Beer production

beer <- window(ausbeer, start=1992)gglagplot(beer)

33

Example: Beer production

lag 7 lag 8 lag 9

lag 4 lag 5 lag 6

lag 1 lag 2 lag 3

400 450 500 400 450 500 400 450 500

400

450

500

400

450

500

400

450

500

Quarter1

2

3

4

34

Lagged scatterplots

Each graph shows yt plotted against yt−k fordifferent values of k.The autocorrelations are the correlationsassociated with these scatterplots.

35

Autocorrelation

Results for first 9 lags for beer data:

r1 r2 r3 r4 r5 r6 r7 r8 r9

-0.102 -0.657 -0.060 0.869 -0.089 -0.635 -0.054 0.832 -0.108

ggAcf(beer)

−0.5

0.0

0.5

4 8 12 16

Lag

AC

F

Series: beer

36

Autocorrelation

r4 higher than for the other lags. This is due tothe seasonal pattern in the data: the peakstend to be 4 quarters apart and the troughs tendto be 2 quarters apart.r2 is more negative than for the other lagsbecause troughs tend to be 2 quarters behindpeaks.Together, the autocorrelations at lags 1, 2, . . . ,make up the autocorrelation or ACF.The plot is known as a correlogram

37

Trend and seasonality in ACF plots

When data have a trend, the autocorrelations forsmall lags tend to be large and positive.When data are seasonal, the autocorrelationswill be larger at the seasonal lags (i.e., atmultiples of the seasonal frequency)When data are trended and seasonal, you see acombination of these effects.

38

Aus monthly electricity production

elec2 <- window(elec, start=1980)autoplot(elec2)

8000

10000

12000

14000

1980 1985 1990 1995

Time

elec

2

39

Aus monthly electricity production

ggAcf(elec2, lag.max=48)

0.00

0.25

0.50

0.75

0 12 24 36 48

Lag

AC

F

Series: elec2

40

Google stock price

autoplot(goog)

400

500

600

700

800

0 200 400 600 800 1000

Time

goog

41

Google stock price

ggAcf(goog, lag.max=100)

0.00

0.25

0.50

0.75

1.00

0 20 40 60 80 100

Lag

AC

F

Series: goog

42

Which is which?

40

60

80

0 20 40 60

chir

ps p

er m

inut

e

1. Daily temperature of cow

7

8

9

10

11

1974 1976 1978

thou

sand

s

2. Monthly accidental deaths

200

400

600

1950 1952 1954 1956 1958 1960

thou

sand

s

3. Monthly air passengers

30

60

90

1860 1880 1900

thou

sand

s

4. Annual mink trappings

0.0

0.5

1.0

12 246 18

AC

F

A

0.0

0.5

1.0

5 10 15

AC

F

B

0.0

0.5

1.0

5 10 15

AC

F

C

0.0

0.5

1.0

12 246 18

AC

F

D

43

Outline

1 ts objects

2 Time plots

3 Lab session 1

4 Seasonal plots

5 Seasonal or cyclic?

6 Lag plots and autocorrelation

7 Lab session 2

44

Lab Session 2

45

top related