visualization and forecasting of big time series data

Rob J Hyndman

Visualizing and forecasting

big time series data

Victoria: scaled

Outline

1 Examples of biggish time series

2 Time series visualisation

3 BLUF: Best Linear Unbiased Forecasts

4 Application: Australian tourism

5 Fast computation tricks

6 hts package for R

7 References

Visualising and forecasting big time series data Examples of biggish time series 2

1. Australian tourism demand

Quarterly data on visitor night from1998:Q1 – 2013:Q4From: National Visitor Survey, based onannual interviews of 120,000 Australiansaged 15+, collected by Tourism ResearchAustralia.Split by 7 states, 27 zones and 76 regions(a geographical hierarchy)Also split by purpose of travel

HolidayVisiting friends and relatives (VFR)BusinessOther

304 bottom-level series

2. Labour market participation

Australia and New Zealand StandardClassification of Occupations

8 major groups43 sub-major groups

97 minor groups– 359 unit groups

* 1023 occupations

Example: statistician2 Professionals

22 Business, Human Resource and MarketingProfessionals224 Information and Organisation Professionals

2241 Actuaries, Mathematicians and Statisticians224113 Statistician

2. Labour market participation

Australia and New Zealand StandardClassification of Occupations

8 major groups43 sub-major groups

97 minor groups– 359 unit groups

* 1023 occupations

Example: statistician2 Professionals

22 Business, Human Resource and MarketingProfessionals224 Information and Organisation Professionals

2241 Actuaries, Mathematicians and Statisticians224113 Statistician

3. Spectacle sales

Monthly UK sales data from 2000 – 2014Provided by a large spectacle manufacturerSplit by brand (26), gender (3), price range(6), materials (4), and stores (600)About 1 million bottom-level series

3. Spectacle sales

Outline

6 hts package for R

7 References

Visualising and forecasting big time series data Time series visualisation 6

Kite diagrams0

Line graph profile

Duplicate & fliparound the hori-zontal axis

Fill the colour

Kite diagrams: Victorian tourism20

Victoria

Kite diagrams: Victorian tourism

Kite diagrams: Victorian tourism20

Victoria: scaled

An STL decompositionSTL decomposition of tourism demandfor holidays in Peninsula

2000 2005 2010

timeVisualising and forecasting big time series data Time series visualisation 9

Seasonal stacked bar chart

Place positive values above the originwhile negative values below the originMap the bar length to the magnitudeEncode quarters by colours

Seasonal stacked bar chart

Place positive values above the originwhile negative values below the originMap the bar length to the magnitudeEncode quarters by colours

−1.0

−0.5

Holiday

BAA BAB BAC BBABCABCBBCCBDABDBBDCBDDBDEBDF BEA BEBBECBEDBEE BEFBEGRegions

Seasonal stacked bar chart: VIC

Corrgram of remainder

Compute the correlationsamong the remaindercomponents

Render both the sign andmagnitude using a colourmapping of two hues

Order variables according tothe first principal component ofthe correlations.

Corrgram of remainder: VIC

Visualising and forecasting big time series data Time series visualisation 13−1

−0.8

−0.6

−0.4

−0.2

BEEHolBEFOthBEEOthBDEOthBEBOthBEABusBEFBusBDCOthBACHolBEBBusBEAVisBBAHolBDEHolBABOthBAAVisBAAHolBDCHolBBABusBCBHolBEGBusBDDVisBABVisBDAVisBEAOthBDFHolBEEBusBAAOthBACOthBDAOthBDEBusBCBOthBACBusBEBVisBACVisBCAOthBEFVisBCBVisBEDHolBEGOthBDBHolBABBusBEBHolBDFBusBECHolBCAHolBDBOthBEAHolBDCBusBECVisBDBVisBCCHolBBAVisBABHolBBAOthBCCOthBCBBusBCCVisBEGVisBDDHolBECOthBDCVisBAABusBCCBusBECBusBCAVisBDFVisBEGHolBDDOthBEDOthBEDVisBDDBusBDEVisBEFHolBEEVisBDBBusBDABusBDAHolBCABusBDFOthBEDBus

Corrgram of remainder: TAS

Visualising and forecasting big time series data Time series visualisation 14−1

−0.8

−0.6

−0.4

−0.2

FCAHol

FBBHol

FBAHol

FAAHol

FCBHol

FCAVis

FBBVis

FAAVis

FCBBus

FAAOth

FCAOth

FBBOth

FBABus

FBAOth

FCBVis

FCABus

FBAVis

FCBOth

FBBBus

FAABus

Feature analysis

Summarize each time series with a featurevector:

strength of trendlumpiness (variance of annual variances ofremainder)strength of seasonalitysize of seasonal peaksize of seasonal troughACF1linearity of trendcurvature of trendspectral entropy

Do PCA on feature matrix

Feature analysis

106 107

108109

117118

129130

133134

142143

158159

169170 171

181182

186187

194 195

197 198

206207

209210

211212

218219

226227

235236

242 243 244

262263

266 267

274275

279280

288289

295296

299300

301 302303304

season

entropy

fo.acf

peaktrough

linearity

curvature

−2 0 2standardized PC1 (43.2% explained var.)

Feature analysis

2000 2005 2010Time

Feature analysis

2000 2005 2010Time

Feature analysis

NSW NT QLD SA

TAS VIC WA

−7.5−5.0−2.5 0.0 2.5 5.0−7.5−5.0−2.5 0.0 2.5 5.0−7.5−5.0−2.5 0.0 2.5 5.0PC1

Feature analysis

Bus Hol

Oth Vis

−7.5 −5.0 −2.5 0.0 2.5 5.0−7.5 −5.0 −2.5 0.0 2.5 5.0PC1

Outline

6 hts package for R

7 References

Visualising and forecasting big time series data BLUF: Best Linear Unbiased Forecasts 17

Hierarchical time series

A hierarchical time series is a collection ofseveral time series that are linked together ina hierarchical structure.

AA AB AC

BA BB BC

CA CB CC

ExamplesNet labour turnoverTourism by state and region

AA AB AC

BA BB BC

CA CB CC

AA AB AC

BA BB BC

CA CB CC

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

yt = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

YA,tYB,tYC,t

1 1 11 0 00 1 00 0 1

︸︷︷︸

YA,tYB,tYC,t

1 1 11 0 00 1 00 0 1

︸︷︷︸

YA,tYB,tYC,t

︸︷︷︸

1 1 11 0 00 1 00 0 1

︸︷︷︸

YA,tYB,tYC,t

︸︷︷︸

Btyt = SBt

Hierarchical time seriesTotal

AX AY AZ

BX BY BZ

CX CY CZ

YtYA,tYB,tYC,tYAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

YAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

︸︷︷︸

AX AY AZ

BX BY BZ

CX CY CZ

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

AX AY AZ

BX BY BZ

CX CY CZ

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

yt = SBt

Forecasting notation

Let yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as yt.(They may not add up.)

Reconciled forecasts are of the form:yn(h) = SPyn(h)

for some matrix P.

P extracts and combines base forecastsyn(h) to get bottom-level forecasts.

S adds them up

for some matrix P.

S adds them up

for some matrix P.

S adds them up

for some matrix P.

S adds them up

for some matrix P.

S adds them up

General properties: bias

yn(h) = SPyn(h)

Assume: base forecasts yn(h) are unbiased:E[yn(h)|y1, . . . , yn] = E[yn+h|y1, . . . , yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|y1, . . . , yn].Then E[yn(h)] = Sβn(h).We want the revised forecasts to beunbiased: E[yn(h)] = SPSβn(h) = Sβn(h).

Revised forecasts are unbiased iff SPS = S.Visualising and forecasting big time series data BLUF: Best Linear Unbiased Forecasts 22

yn(h) = SPyn(h)

General properties: variance

yn(h) = SPyn(h)

Let variance of base forecasts yn(h) be givenby

Σh = Var[yn(h)|y1, . . . , yn]

Then the variance of the revised forecasts isgiven by

Var[yn(h)|y1, . . . , yn] = SPΣhP′S′.

yn(h) = SPyn(h)

Σh = Var[yn(h)|y1, . . . , yn]

yn(h) = SPyn(h)

Σh = Var[yn(h)|y1, . . . , yn]

BLUF via trace minimizationTheoremFor any P satisfying SPS = S, then

= trace[SPΣhP′S′]

has solution P = (S′Σ†hS)−1S′Σ†h.

Σ†h is generalized inverse of Σh.

yn(h) = S(S′Σ†hS)−1S′Σ†hyn(h)

Revised forecasts Base forecasts

Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Σh).

Problem: Σh hard to estimate.Visualising and forecasting big time series data BLUF: Best Linear Unbiased Forecasts 24

Optimal combination forecasts

Solution 1: OLSAssume εh ≈ SεB,h where εB,h is theforecast error at bottom level.

Then Σh ≈ SΩhS′ where Ωh = Var(εB,h).

If Moore-Penrose generalized inverse used,then (S′Σ†hS)

−1S′Σ†h = (S′S)−1S′.

yn(h) = S(S′S)−1S′yn(h)

−1S′Σ†h = (S′S)−1S′.

Solution 2: WLSSuppose we approximate Σ1 by itsdiagonal.Easy to estimate, and places weight wherewe have best forecasts.Empirically, it gives better forecasts thanother available methods.

yn(h) = S(S′ΛS)−1S′Λyn(h)Visualising and forecasting big time series data BLUF: Best Linear Unbiased Forecasts 26

Challenges

Computational difficulties in bighierarchies due to size of the S matrix andsingular behavior of (S′ΛS).

Loss of information in ignoring covariancematrix in computing point forecasts.

Still need to estimate covariance matrix toproduce prediction intervals.

yn(h) = S(S′ΛS)−1S′Λyn(h)

Challenges

Outline

6 hts package for R

7 References

Visualising and forecasting big time series data Application: Australian tourism 28

Australian tourism

Hierarchy:States (7)

Zones (27)

Regions (82)

Australian tourism

Hierarchy:States (7)

Zones (27)

Regions (82)

Base forecastsETS (exponentialsmoothing) models

Base forecasts

Domestic tourism forecasts: Total

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: NSW

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: VIC

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: Nth.Coast.NSW

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: Metro.QLD

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: Sth.WA

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: X201.Melbourne

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: X402.Murraylands

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: X809.Daly

1998 2000 2002 2004 2006 2008

Reconciled forecasts

2000 2005 2010

2000 2005 20101000

2000 2005 2010

er2000 2005 201018

2000 2005 20104000

2000 2005 2010

LD2000 2005 201060

ital c

2000 2005 2010

Forecast evaluation

Select models using all observations;

Re-estimate models using first 12observations and generate 1- to8-step-ahead forecasts;

Increase sample size one observation at atime, re-estimate models, generateforecasts until the end of the sample;

In total 24 1-step-ahead, 232-steps-ahead, up to 17 8-steps-ahead forforecast evaluation.

Forecast evaluation

Hierarchy: states, zones, regions

MAPE h = 1 h = 2 h = 4 h = 6 h = 8 AverageTop Level: Australia

Bottom-up 3.79 3.58 4.01 4.55 4.24 4.06OLS 3.83 3.66 3.88 4.19 4.25 3.94WLS 3.68 3.56 3.97 4.57 4.25 4.04Level: States

Bottom-up 10.70 10.52 10.85 11.46 11.27 11.03OLS 11.07 10.58 11.13 11.62 12.21 11.35WLS 10.44 10.17 10.47 10.97 10.98 10.67Level: Zones

Bottom-up 14.99 14.97 14.98 15.69 15.65 15.32OLS 15.16 15.06 15.27 15.74 16.15 15.48WLS 14.63 14.62 14.68 15.17 15.25 14.94Bottom Level: Regions

Bottom-up 33.12 32.54 32.26 33.74 33.96 33.18OLS 35.89 33.86 34.26 36.06 37.49 35.43WLS 31.68 31.22 31.08 32.41 32.77 31.89

Outline

6 hts package for R

7 References

Visualising and forecasting big time series data Fast computation tricks 34

Fast computation: hierarchical data

AX AY AZ

BX BY BZ

CX CY CZ

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

yt = SBt

Fast computation: hierarchical data

AX AY AZ

BX BY BZ

CX CY CZ

YtYA,tYAX,tYAY,tYAZ,tYB,tYBX,tYBY,tYBZ,tYC,tYCX,tYCY,tYCZ,t

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 01 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 1 10 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

yt = SBt

Fast computation: hierarchies

Think of the hierarchy as a tree of trees:

T1 T2 . . . TK

Then the summing matrix contains k smaller summingmatrices:

1′n1

1′n2· · · 1′nK

S1 0 · · · 00 S2 · · · 0...

... . . . ...0 0 · · · SK

where 1n is an n-vector of ones and tree Ti has niterminal nodes.

Think of the hierarchy as a tree of trees:

T1 T2 . . . TK

Then the summing matrix contains k smaller summingmatrices:

1′n1

1′n2· · · 1′nK

S1 0 · · · 00 S2 · · · 0...

... . . . ...0 0 · · · SK

where 1n is an n-vector of ones and tree Ti has niterminal nodes.

S′ΛS =

S′1Λ1S1 0 · · · 0

0 S′2Λ2S2 · · · 0... ... . . . ...0 0 · · · S′KΛKSK

+λ0 Jn

λ0 is the top left element of Λ;Λk is a block of Λ, corresponding to tree Tk;Jn is a matrix of ones;n =

∑k nk.

Now apply the Sherman-Morrison formula . . .

S′ΛS =

S′1Λ1S1 0 · · · 0

0 S′2Λ2S2 · · · 0... ... . . . ...0 0 · · · S′KΛKSK

+λ0 Jn

λ0 is the top left element of Λ;Λk is a block of Λ, corresponding to tree Tk;Jn is a matrix of ones;n =

∑k nk.

Now apply the Sherman-Morrison formula . . .

(S′ΛS)−1 =

(S′1Λ1S1)

−1 0 · · · 00 (S′2Λ2S2)

−1 · · · 0...

.... . .

...0 0 · · · (S′KΛKSK)

−cS0

S0 can be partitioned into K2 blocks, with the (k, `)block (of dimension nk × n`) being

(S′kΛkSk)−1Jnk,n`(S

′`Λ`S`)

Jnk,n` is a nk × n` matrix of ones.

c−1 = λ−10 +

1′nk(S′kΛkSk)

−11nk .

Each S′kΛkSk can be inverted similarly.S′Λy can also be computed recursively.

(S′ΛS)−1 =

(S′1Λ1S1)

−1 0 · · · 00 (S′2Λ2S2)

−1 · · · 0...

.... . .

...0 0 · · · (S′KΛKSK)

−cS0

S0 can be partitioned into K2 blocks, with the (k, `)block (of dimension nk × n`) being

(S′kΛkSk)−1Jnk,n`(S

′`Λ`S`)

Jnk,n` is a nk × n` matrix of ones.

c−1 = λ−10 +

1′nk(S′kΛkSk)

−11nk .

Each S′kΛkSk can be inverted similarly.S′Λy can also be computed recursively.

The recursive calculations can bedone in such a way that we neverstore any of the large matricesinvolved.

Fast computation

A similar algorithm has been developed forgrouped time series with two groups.When the time series are not strictlyhierarchical and have more than two groupingvariables:

Use sparse matrix storage and arithmetic.

Use iterative approximation for invertinglarge sparse matrices.

Paige & Saunders (1982)ACM Trans. Math. Software

Fast computation

Outline

6 hts package for R

7 References

Visualising and forecasting big time series data hts package for R 41

hts package for R

hts: Hierarchical and grouped time seriesMethods for analysing and forecasting hierarchical and groupedtime series

Version: 4.5Depends: forecast (≥ 5.0), SparseMImports: parallel, utilsPublished: 2014-12-09Author: Rob J Hyndman, Earo Wang and Alan LeeMaintainer: Rob J Hyndman <Rob.Hyndman at monash.edu>BugReports: https://github.com/robjhyndman/hts/issuesLicense: GPL (≥ 2)

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# nodes describes the hierarchical structurey <- hts(bts, nodes=list(2, c(3,2)))

AX AY AZ

# Forecast 10-step-ahead using WLS combination method# ETS used for each series by defaultfc <- forecast(y, h=10)

forecast.gts functionUsageforecast(object, h,method = c("comb", "bu", "mo", "tdgsf", "tdgsa", "tdfp"),fmethod = c("ets", "rw", "arima"),weights = c("sd", "none", "nseries"),positive = FALSE,parallel = FALSE, num.cores = 2, ...)

Argumentsobject Hierarchical time series object of class gts.h Forecast horizonmethod Method for distributing forecasts within the hierarchy.fmethod Forecasting method to usepositive If TRUE, forecasts are forced to be strictly positiveweights Weights used for "optimal combination" method. When

weights = "sd", it takes account of the standard deviation offorecasts.

parallel If TRUE, allow parallel processingnum.cores If parallel = TRUE, specify how many cores are going to be

Outline

6 hts package for R

7 References

Visualising and forecasting big time series data References 46

ReferencesRJ Hyndman, RA Ahmed, G Athanasopoulos, andHL Shang (2011). “Optimal combination forecasts forhierarchical time series”. Computational statistics &data analysis 55(9), 2579–2589.RJ Hyndman, AJ Lee, and E Wang (2014). Fastcomputation of reconciled forecasts for hierarchicaland grouped time series. Working paper 17/14.Department of Econometrics & Business Statistics,Monash UniversityRJ Hyndman, AJ Lee, and E Wang (2014). hts:Hierarchical and grouped time series.cran.r-project.org/package=hts.RJ Hyndman and G Athanasopoulos (2014).Forecasting: principles and practice. OTexts.OTexts.org/fpp/.

å Papers and R code:

robjhyndman.com

å Email: Rob.Hyndman@monash.edu

visualization and forecasting of big time series data

quarterly data

monthly uk sales data

spectacle salesvisualising

level series2

price range6

level series3

unit groups

human resource

Education

decision support via big multidimensional data visualization

big data visualization - atlas.irit.fr · data...

interactive visualization of big data leveraging databases...

dynamic planning & forecasting w big data

challenges of big data visualization in internet-of ... of...

data glut: big data visualization in utilities -...

using visualization to understand big data

how linkedin democratizes big data visualization

visualization of big data - chalmers · visualization of...

workshop: big data visualization for security

3d web visualization of continuous integration big...

2014: treparel big data text analytics & visualization

big thompson river forecasting

urban accessibility measurement and visualization — a big

visualization of big data in web apps

big data visualization framework

forecasting students future academic performance using big

big data visualization: turning big data into big...

hadoop big data visualization

big data: baseline forecasting with exponential...