a large canadian database for macroeconomic analysis · a large canadian database for macroeconomic...

37
A Large Canadian Database for Macroeconomic Analysis * Olivier Fortin-Gagnon Maxime Leroux Dalibor Stevanovic § St´ ephane Surprenant This version: June 28, 2019 Abstract This paper describes a large-scale Canadian macroeconomic database in monthly fre- quency. The dataset contains hundreds of Canadian and provincial economic indicators. It is designed to be updated regularly and real-time vintages are publicly available. It relieves users to deal with data changes and methodological revisions. A balanced and stationary panel starting from 1981 is provided. We show five useful features of this dataset for macroeconomic research. First, the factor structure explains a sizeable part of variation in Canadian and provincial aggregate series. Second, the dataset is useful to capture turning points of the Canadian business cycle. Third, the dataset has substantial predictive power when forecasting key macroeconomic indicators. Fourth, the panel can be used to construct measures of macroeconomic uncertainty. Fifth, the dataset can serve for structural analysis through the factor-augmented VAR model. Keywords: Big Data, Factor Model, Forecasting, Structural Analysis. * The third author acknowledges financial support from the Fonds de recherche sur la soci´ et´ e et la culture (Qu´ ebec) and the Social Sciences and Humanities Research Council. Hugo Couture provided excellent research assistance. Desjardins epartement des sciences ´ economiques, Universit´ e du Qu´ ebec ` a Montr´ eal. 315, Ste-Catherine Est, Montr´ eal, QC, H2X 3X2 § Corresponding Author: [email protected]. epartement des sciences ´ economiques, Universit´ e du Qu´ ebec ` a Montr´ eal. 315, Ste-Catherine Est, Montr´ eal, QC, H2X 3X2 epartement des sciences ´ economiques, Universit´ e du Qu´ ebec ` a Montr´ eal. 315, Ste-Catherine Est, Montr´ eal, QC, H2X 3X2

Upload: others

Post on 01-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

A Large Canadian Database for Macroeconomic Analysis∗

Olivier Fortin-Gagnon† Maxime Leroux‡ Dalibor Stevanovic§

Stephane Surprenant¶

This version: June 28, 2019

Abstract

This paper describes a large-scale Canadian macroeconomic database in monthly fre-

quency. The dataset contains hundreds of Canadian and provincial economic indicators.

It is designed to be updated regularly and real-time vintages are publicly available. It

relieves users to deal with data changes and methodological revisions. A balanced and

stationary panel starting from 1981 is provided. We show five useful features of this

dataset for macroeconomic research. First, the factor structure explains a sizeable part

of variation in Canadian and provincial aggregate series. Second, the dataset is useful to

capture turning points of the Canadian business cycle. Third, the dataset has substantial

predictive power when forecasting key macroeconomic indicators. Fourth, the panel can

be used to construct measures of macroeconomic uncertainty. Fifth, the dataset can serve

for structural analysis through the factor-augmented VAR model.

Keywords: Big Data, Factor Model, Forecasting, Structural Analysis.

∗The third author acknowledges financial support from the Fonds de recherche sur la societe et la culture(Quebec) and the Social Sciences and Humanities Research Council. Hugo Couture provided excellent researchassistance.†Desjardins‡Departement des sciences economiques, Universite du Quebec a Montreal. 315, Ste-Catherine Est, Montreal,

QC, H2X 3X2§Corresponding Author: [email protected]. Departement des sciences economiques, Universite

du Quebec a Montreal. 315, Ste-Catherine Est, Montreal, QC, H2X 3X2¶Departement des sciences economiques, Universite du Quebec a Montreal. 315, Ste-Catherine Est, Montreal,

QC, H2X 3X2

Page 2: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

1 Introduction

Large datasets are now very popular in the empirical macroeconomic research. Stock and

Watson (2002a,b) have initiated the breakthrough by providing the econometric theory and

showing the benefits in terms of macroeconomic forecasting, while Bernanke et al. (2005) have

inspired the literature on impulse response analysis in the so-called data-rich environment.

Since then, many theoretical and empirical improvements have been made, see Stock and

Watson (2016) for a recent overview. Most of this literature is built on US datasets. Therefore,

McCracken and Ng (2016) proposed a standardized version of a large monthly US dataset that

is regularly updated and publicly available at the FRED (Federal Reserve Economic Data)

website. No such developments have been made with Canadian macroeconomic data, so the

objective of this work is to fill the gap and provide a user-friendly version of a large Canadian

dataset suitable for many types of macroeconomic research. Since Canada is a typical example

of a small open economy, this dataset will also be of interest for a wide range of applications

in international economics.

In this paper we construct a large-scale Canadian macroeconomic database in monthly

frequency and show how it can be useful for empirical macroeconomic analysis with several

illustrative examples. The dataset contains hundreds of Canadian and provincial raw economic

indicators observed from 1914. It is designed to be updated regularly in real-time through

StatCan database and is publicly available.1 It relieves users to deal with data changes and

methodological revisions. We provide a balanced and stationary panel starting from 1981 that

is suitable for work in business cycle fluctuations.

Early attempts to construct large Canadian macroeconomic datasets are Gosselin and Tkacz

(2001) and Galbraith and Tkacz (2007). Boivin et al. (2010) updated and merged data from

those previous studies yielding a panel that covered the period 1969 - 2008 and had 348 monthly

and 87 quarterly series. Then, Bedock and Stevanovic (2017) constructed a new dataset of 124

monthly variables observed from 1981 to 2012. The selection of series was based on Canadian

counterparts of US series in Bernanke et al. (2005) and constrained by availability of StatCan

tables. More recently, Sties (2017) has built a much smaller monthly dataset containing mostly

financial series and few real activity indicators. Stephen Gordon has also been updating some

relevant Canadian indicators2, while the Bank of Canada released its Staff Economic Projections

database3. Our data selection is mainly inspired by McCracken and Ng (2016) when it comes to

major groups of economic variables. Given that Canada is a small open economy, the dataset

contains many international trade, financial flows and natural ressource indicators.

1Data can be accessed here: http://www.stevanovic.uqam.ca/DS LCMD.html.2See Project Link at https://www.ecn.ulaval.ca/sgor.3Data are available here: https://www.bankofcanada.ca/rates/staff-economic-projections/. See also Cham-

pagne et al. (2018)

2

Page 3: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

We illustrate five useful features of this dataset for macroeconomic research. First, we show

that our panel is likely to present a factor structure and that common factors explain a sizeable

portion of variation in Canadian and provincial aggregate series. The principal component

analysis of the dataset identifies few driving forces of the Canadian economy such as housing

and bond markets, production, wages, exchange rates and energy prices. Second, the dataset is

useful to capture turning points of the Canadian business cycle. Using Probit, Lasso and factor

models we show that this dataset has substantial explanatory power in addition to the standard

term spread predictor. Third, the dataset has substantial predictive power when forecasting

key macroeconomic indicators. Factor models and regularized complete subset regressions

show good performance in forecasting real activity variables such as industrial production,

employment and unemployment rate, as well as CPI and Core CPI inflation. Gains are still

present, though less important, in case of the housing market. Fourth, the panel can be

used to construct measures of macroeconomic uncertainty. Following Jurado et al. (2015), we

estimate the Canadian macroeconomic uncertainty. Our measure suggests only two significant

episodes of uncertainty in Canada: 1981 and 2009 recessions. Fifth, the dataset can serve for

the structural impulse response analysis. We identify and estimate the effects of uncertainty

shocks using the factor-augmented VAR methodology a la Boivin et al. (2018) with our measure

of macroeconomic uncertainty. An exogenous increase in uncertainty is associated with a decline

in real activity and financial markets. The findings are in line with the literature using US data.

The rest of the paper is organized as follows. Section 2 describes the construction of datasets

and performs the factor analysis. Section 3 shows the informational content of this dataset in

detecting recession dates. In Section 4 we conduct a pseudo-out-of-sample forecasting exercise

to test the capability of the dataset to help predicting main Canadian macroeconomic variables.

The measure of macroeconomic uncertainty is proposed in Section 5. Section 6 performs an

impulse response analysis and Section 7 concludes.

2 Datasets

In this section we start by describing the construction of the dataset, and in particular how we

deal with several issues related to availability and statistical properties of the data. Then, we

explore the factor structure of this dataset.

2.1 Construction of datasets

The Canadian monthly database comprises eight different groups of variables: production,

labour, housing, manufacturers’ inventories and orders, money and credit, international trade

and financial flows, prices and stock markets. Whenever available, we included regional data

3

Page 4: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

covering the Atlantic provinces, Quebec, Ontario, the Prairies and British Columbia, as well

as provincial data. The complete list of series is available in the data appendix. We decided to

include a large number of housing market series since the housing cycle is an important feature

of the business cycle (Leamer (2015)). In addition, given that Canada is a small open economy,

we added more international trade, financial flows and natural ressource indicators than one

usually finds in the US applications.

In building this database, we encountered several problems. Some tables have unfortunately

been discontinued and the new tables seldom go sufficiently far back in time to afford us a

sizeable time frame. Therefore, we combined old and new time series to cope with this problem.

This happened with data on production, housing, orders and import and export. For instance,

GDP data for the period starting in January 1981 and ending today is split across two tables:

379-0027, going from 1981/01 to 2012/01 and 379-0031, starting only on 1997/01. There exist

several procedures to combine two time series that share an overlapping period. de la Escosura

(2016) reviews three splicing procedures and introduces a new one of his own. As he notes, this

aspect of data analysis generally receives little attention with researchers often going for what

he calls retropolation whereby the new time series is reprojected using the growth rates of the

old time series. If the oldest observation of the new series is made at time T , the retropolated

series over the previous time interval is given by:

yt :=

(ynewT

yoldT

)yoldt . (1)

This corresponds to assigning all the measurement adjustment to the level of the old time series.

However, by construction, all increasing time series will be retrospectively skewed upward.

As de la Escosura (2016) notes, this is an undesirable feature if we are studying long term

growth, although it is mostly accurate over long time periods and in economies undergoing deep

structural change, such as developing economies. Linear and non-linear interpolation schemes

would, on the other hand, force the levels of the new series at T and of the old series at some

other reference date, to be preserved, which means assigning all the modification to the observed

growth rates of the old time series in between both references dates.4 The choice of a splicing

method therefore depends on the application and the beliefs of the researchers concerning what

is best measured. In the construction of this database, we privilege the retropolation approach

because we prefer to leave observed growth rates intact. For some series, this involves making

hardly any changes as we can see in Figure 1.

For imports and export series, there usually was a need to aggregate old series before splicing

4Interpolation schemes spare observed levels at specific dates, at the expense of modifying growth rates, thedates being strategically chosen because measurements are believed to be more accurate. de la Fuente Moreno(2014) also proposed a mixed splicing method that allows for a middle ground to be chosen by the researcherthrough a tuning parameter.

4

Page 5: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

since old and new trade data do not share a common classification system. In the example

provided below of exported consumption goods, we aggregated: section 2 data on food, feed,

beverages and tobacco; major group 4.23 on textile fabricated materials; and major group 5.11

on other consumption goods to approximate the consumer good class of the North American

Product Classification System (NAPCS). As is evident from the examples provided in Figure

1, viewing the old time series as noisy indexes of new time series seems justified by the high

correlations in the overlapping periods.

Another problem concerns the seasonal behavior of five aggregate and ten provincial unem-

ployment duration series, as they were not readily available in a seasonally adjusted format. To

deal with this, we use the SEATS model based decomposition method that is provided along

the X11 type capabilities of the ARIMA-X13-SEATS program of the US Census Bureau.5 As

a sanity check on the viability of the procedure, the Kruskal-Wallis test (Kruskal and Wallis

(1952)) for seasonal behavior is conducted both prior to and after the seasonal adjustment is

performed. The result of the Kruskal-Wallis tests are shown in table 7 in Appendix A. The

tests imply a rejection of the absence of seasonal behavior prior to the adjustments, but do not

allow for rejection of the null hypothesis after the adjustments have been made, as anticipated.

Figure 10, in Appendix A, shows the behavior of the model based adjustment procedure for

two aggregate unemployment duration series in Canada.

Most of the series included in the database must be stationarized. We roughly follow

McCracken and Ng (2016) and Bedock and Stevanovic (2017): most I(1) series are transformed

in the first difference of logarithms; a first difference of levels is applied to unemployment rate

and interest rates; first difference of logarithms is used for all price indexes; and housing data

is featured in logarithms. Transformation codes are reported in data appendix.

Our last concern is to balance the resulting panel since some series have missing observations.

We opted to apply an expectation-maximization algorithm by assuming a factor model to fill

in the blanks as in Stock and Watson (2002b) and McCracken and Ng (2016). We initialize the

algorithm by replacing missing observations with their unconditional mean and then proceed

to estimate a factor model by principal component. The fitted values of this model are used

to replace missing observations. Examples of missing values include export and import series

since the old tables went back only to 1988/01. The resulting balanced and stationary panel

will be used in the rest of this paper.6

5This approach relies on a factorization of the AR lag polynomial of an ARIMA model whereby differentroots of the polynomial can be assigned, respectively, to trend, transitory and seasonal components based on thefact each component will exhibit a different signature in the frequency domain. The ARIMA model is selectedbased on the automatic selection procedure provided by the program which relies on minimizing the BIC. For theimplementation details, the reader is referred to the user manual of the US Census Bureau ARIMA-X13-SEATSprogram. Reader is referred to US Census Bureau (2017) X-13ARIMA-SEATS Reference Manual, available athttps://www.census.gov/ts/x13as/docX13AS.pdf

6This dataset ends on 2018/12 and have been constructed from June 2018 vintage. Some of the series from

5

Page 6: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 1: Examples of data splicing

1000000

1500000

2000000

1980 1990 2000 2010 2020

Val

ue GDPGDP_new

(a) Gross domestic product (Canada)

100

200

1960 1980 2000 2020

Val

ue hstart_CANhstart_CAN_new

(b) Housing starts (units, Canada)

0

2000

4000

6000

1980 2000 2020

Val

ue EX_CONS_BP_newEX_CONS_BP

(c) Exports of consumer goods (Canada)

6

Page 7: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Table 1: Estimating the number of factors in CAN MD

Canada Canada and regionsBN02 9 7ABC 9 7ON 1 1AH 1 1HL 3 6BN07 7 9AW 6 6

2.2 Number of Factors

Estimating the number of factors is an empirical challenge. Usually the first step is to plot

the eigenvalues of the correlation matrix of data (scree plot) as well as the average explanatory

power of consecutive principal components (trace). These are reported in Figure 2. The results

are typical for macroeconomic panels. There is no clear cut separation among eigenvalues,

except after the first, and the explanatory power grows slowly with the number of factors.

However, we remark that 9 principal components explain 50% of variance of all variables, which

is quite satisfactory. This suggests that the factor representation of Canadian macroeconomy

is an appropriate mean of dimension reduction.

Many statistical decision procedures have been proposed to select the number of factors,

see Mao Takongmo and Stevanovic (2015) for review. Table 1 reports the number of factors

estimated by the following methods: (BN02) Bai and Ng (2002) ICp2 information criterion;

(ABC) modified version of (BN02) by Alessi et al. (2010); (ON) Onatski (2010) test based

on the empirical distribution of eigenvalues; (AH) Ahn and Horenstein (2013) eigenvalue ratio

test; (HL) Hallin and Liska (2007) test for the number of dynamic factors; (BN07) Bai and Ng

(2007) test for the number of dynamic factors; and finally (AW) Amengual and Watson (2007)

information criterion for the number of dynamic factors. (ON) and (AH) are known to be

very conservative and they indeed identify only one source of common variation. (BN02) and

(ABC) suggest 9 static factors for the aggregate panel and 7 in the case of the panel including

additional regional series. The number of factors is estimated between 6 and 9 according to

(BN07) and (AW), but only 3 and 6 following Hallin and Liska (2007) procedure.

It is also common in the literature to verify the stability of the factor structure in terms of

the number of common components. Figure 3 plots the number of factors selected recursively

by Bai and Ng (2002) and Hallin and Liska (2007) methods. We observe that the number

that vintage are not available any more, such that CERI new: Canadian-Dollar Effective Exchange Rate Index.The list of series in data appendix is for this June 2018 vintage. Only few changes have been made since thenfor the actual version of the dataset.

7

Page 8: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 2: Eigenvalues and explanatory power of factors

of static factors is increasing since 1990, while the number of dynamic factors is fluctuating

around 3. Many explanations on the time-varying nature of the number of static factors are

plausible: structural changes in terms of the correlation structure, presence of group-specific

factors, finite-sample sensitivity of selection procedures, and so on. We are not investigating

those possibilities but practitioners should be aware of this instability.

2.3 Estimated Factors

The factors estimated over the full sample by principal components are depicted in Figures 4

and 5 alongside their main series. The first factor seems to closely track the evolution of building

permits in Canada, capturing much of the movements related to business cycle frequencies. The

variable best explained by the second factor is the 6-month treasury bills. This variable seems to

dominate the movement in the factor until the last crisis when monetary policy begun holding

rates very low, reducing its volatility. Production in the finance, real estate and insurance

sectors are closely related to the third factor. Same for the late 1980s, both series are very

similar. The fourth and eighth factors capture variations in the slope of the yield curve at two

different points, the eighth factor leaning more heavily on shorter maturities. Unsurprisingly,

that factor becomes less variable after last crisis due to the aforementioned actions by the Bank

of Canada. USD to CAD exchange rate movements seem to dominate the fifth factor, much like

8

Page 9: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 3: Number of factors over time

energy prices, especially from 2000 onwards. As the main series of the seventh factor shows,

energy prices became much more volatile in the last two decades, hence the closer tracking in

the later half of the sample. The sixth factor incorporates information about wages in Canada

through the construction wage index. The importance of exchange rates and energy prices

confirms the intuition that a small open economy business cycle should be heavily exposed to

international markets. The stability of factors’ interpretation is analyzed in section A.1.

3 Predicting Recessions

In this section we verify the ability of the dataset in analyzing the Canadian business cycle.

To begin, we need an operational definition of a recession. We assume peaks and troughs are

observed and they coincide with the dates from Cross and Bergevin (2012). Since 1981, C.D.

Howe committee has identified three recessions: June 1981 - October 1982, March 1990 - April

1992 and October 2008 - May 2009. Hence, these are fairly rare events in our dataset so we

will not be able to do a pseudo-out-of-sample forecasting evaluation. We will focus only on the

in-sample capability to correctly identify the turning points and to discover important leading

indicators of the business cycle.

We adopt the static Probit to model the probability of recession since this is the standard

9

Page 10: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 4: Factors 1 to 4 and their main series

(a) Factor 1, All building permits (CAN) (b) Factor 2, T.Bill (6 months)

(c) Factor 3, Finance production (d) Factor 4, Bonds* minus bank rate

Notes: Factors are displayed in black and their main components in red. Factors have been estimated over the

full sample and the chosen rotation is indicated by (+) or (-). Factors and series have been reduced by their

respective standard deviation. (*): Average yield on governmental bonds of 5 to 10 year maturity.

approach in the literature. Let Zh,t be a latent lead indicator:

Zh,t = αh + βhXt + uh,t, (2)

with uh,t ∼ N(0, 1), and which satisfies:

Rt+h =

{1 if Zh,t > 0

0 otherwise,

where h is the forecasting horizon. Since Estrella and Mishkin (1998) it is standard practice

to consider the slope of the yield curve as the only predictor. It is usually proxied by the term

spread (TS) which is the difference between the 10-year and 3-month Treasury bills. This will

be our benchmark model. Therefore, the probability of recession is

P(Rt+h = 1|TSt) = Φ(αh + βhTSt). (3)

Then, we consider two ways of including the information from our large macroeconomic

dataset in predicting business cycle turning points. The first is the static Probit where instead

10

Page 11: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 5: Factors 5 to 8 and their main series

(a) Factor 5, USDCAD (b) Factor 6, Construction wage (CAN)

(c) Factor 7, IPPI Energy (CAN) (d) Factor 8, Bonds** minus bank rate

Notes: Factors are displayed in black and their main components in red. Factors have been estimated over the

full sample and the chosen rotation is indicated by (+) or (-). Factors and series have been reduced by their

respective standard deviation. (**): Average yield on governmental bonds of 1 to 5 year maturity.

of Xt we consider factors estimated as principal components of Xt:

Zh,t = αh + βhFt + uh,t (4)

Xt = ΛFt + et (5)

The probability of recession is then

P(Rt+h = 1|Ft) = Φ(αh + βhFt). (6)

This is a two-step procedure. First, K principal components are constructed. Second,

Probit model is estimated with those K factors as inputs. Note that this is considered as dense

modelling since all series in Xt are first used to construct Ft.

Another popular way to include a large number of predictors is through a Lasso model.

Following Sties (2017), we use the Logit Lasso model:

P(Rt+h = 1|Xt) =e(αh+βhXt)

1 + e(α+βhXt), (7)

11

Page 12: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 6: Predicting recessions: full sample probabilities

(a) 1-month ahead (b) 3-months ahead

(c) 6-months ahead (d) 12-months ahead

Note: This Figure reports the estimated probabilities of recessions from all three models and for horizons 1, 3, 6 and 12 months

ahead. The shaded areas correspond to C.D. Howe recession dates.

with

arg minαh,βh

[RSS + λ

N∑i=1

|βh,i|

].

This is known as sparse modeling since many elements of βh are set to zero. The hyperparameter

λ is selected by cross-validation. As opposed to the factor-Probit model (6), this is a one-step

procedure.

Those three models are evaluated through Estrella and McFadden pseudo-R2s, quadratic

probability score (QPS) and log probability score (LPS). The forecasting horizons are 1 to

12 months. Figure 6 shows the full-sample estimated probabilities for horizons 1, 3, 6 and

12-month ahead. Overall, all three models produce high probabilities during the C.D. Howe

recession dates. Spread and factor probit models produce more volatile probabilities and present

a lot of ‘false’ signals.7 Some of them are interpretable. The peak in 1987 is caused by the

stock market crash, while the 2001 increase in recession probability reflects the U.S. recession.

On the other hand, the Lasso model probabilities are much smoother across all horizons.

Figure 7 shows goodness of fit measures across horizons for all three models. In terms of

7We call a false signal when the estimated probability is high while the C.D. Howe did not classify thatobservation as a recession. Of course, this false signal may also reveal some economic disturbances that werenot pervasive or big enough to be judged as recession by the committee.

12

Page 13: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 7: Predicting recessions: goodness of fit

Note: This Figure shows several in-sample goodness-of-fit measures for all three models and for all horizons.

pseudo-R2, Spread model performance is maximized around 8-month ahead which has been

already reported in the literature at least for US economy. Factors have better explanatory

power at short and long horizons. In terms of LPS and QPS, the Lasso model is preferred to

the Probit alternatives, especially in case of the quadratic probability score.

Table 2 reports the 10 most important series of Xt selected by Lasso procedure for horizons

1, 3, 6 and 12 months ahead. For short-term horizons, 1 and 3 months, the most important

predictor is the manufacturing inventory to sales ratio, followed by housing starts in Prairies

and British Columbia. Several labour market indicators are relevant such as the unemployment

duration 27+ weeks and the aggregate employment growth. There are also financial series like

the spread between commercial paper and bank rate, stock market as well as the consumer credit

and money supply. For 6-month horizon the most important predictor is durables inventory to

sales ratio, followed by housing starts and unemployment duration. Term spreads start entering

the top 10 in case of one year ahead prediction. Overall, we find that inventory to sales ratios,

housing starts, labour market and term spreads are the most important leading indicators when

monitoring the Canadian business cycle.

13

Page 14: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Table 2: Predicting recessions: top 10 series in Lasso

h=1 h=3 h=6 h=121 MANU-INV-RAT-new MANU-INV-RAT-new DUR-INV-RAT-new G-AVG-5.10.Bank-rate2 hstart-PRA-new hstart-PRA-new hstart-BC-new TBILL-6M.Bank-rate3 UNEMP-DURA-27.-CAN hstart-BC-new hstart-PRA-new PC-PAPER-1M4 EMP-CAN UNEMP-DURA-27.-CAN UNEMP-DURA-27.-CAN hstart-BC-new5 CRED-CONS M3 G-AVG-10p.TBILL-3M hstart-ONT-new6 hstart-BC-new build-Comm-PRA TBILL-6M.Bank-rate UNEMP-DURA-27.-CAN7 PC-3M.Bank-rate EMP-CAN build-Comm-BC CPI-DUR-CAN8 GDP-new build-Ind-BC IPPI-METAL-CAN RT-new9 EMP-CONS-CAN EMP-SALES-CAN PC-3M.Bank-rate hstart-PRA-new10 SP500 WTISPLC DJ-HI CRED-MORT

Note: This table reports 10 most important predictors selected by Lasso.

4 Forecasting Economic Activity

In order to explore the potential for predictive modelling of the CAN-MD database, we perform

a standard out-of-sample forecasting exercise. Let Yt be the variable of interest. If Yt is

stationary, the goal is to forecast its average over h periods:

y(h)t+h = (1200/h)

h∑k=1

yt+k, (8)

where yt ≡ lnYt. If Yt is an I(1) series, then we forecast the average annualized growth rate as

in Stock and Watson (2002b) and McCracken and Ng (2016):

y(h)t+h = (1200/h)ln(Yt+h/Yt). (9)

4.1 Forecasting Models

A large number of forecasting techniques has been proposed to deal with large datasets, see

Kotchoni et al. (2019) for a review and comparison. The goal of this section is to verify whether

the CAN-MD dataset has some relevant and significant forecasting power in predicting key

Canadian macroeconomic series. Therefore, we will use only a subset of time series models and

data-rich methods.

Autoregressive Direct (ARD) The benchmark time series model is the autoregressive

direct (ARD) model, which is specified as:

y(h)t+h = α(h) +

phy∑l=1

ρ(h)l yt−l+1 + et+h, t = 1, . . . , T, (10)

14

Page 15: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

where yt ≡ lnYt − lnYt−1, h ≥ 1 and phy <∞. A direct prediction of y(h)T+h is deduced from the

model above as follows:

y(h)T+h|T = α(h) +

phy∑l=1

ρ(h)l yT−l+1,

where α(h) and ρ(h) are OLS estimators of α(h) and ρ(h). The optimal phy is selected using the

Bayesian Information Criterion (BIC) for every out-of-sample (OOS) period.

Diffusion Indices (ARDI) The first data-rich model is the (direct) autoregression aug-

mented with diffusion indices from Stock and Watson (2002b):

y(h)t+h = α(h) +

phy∑l=1

ρ(h)l yt−l+1 +

phf∑l=1

Ft−l+1β(h)l + et+h, t = 1, . . . , T (11)

Xt = ΛFt + ut (12)

where Xt is the N -dimensional large informational set, Ft are K(h) static factors and the

superscript h stands for the value of K when forecasting h periods ahead. The optimal values

of hyperparameters phy , phf and K(h) are simultaneously selected by BIC. The h-step ahead

forecast is obtained as:

y(h)T+h|T = α(h) +

phy∑l=1

ρ(h)l yT−l+1 +

phf∑l=1

FT−l+1β(h)l .

The feasible ARDI model is obtained after estimating Ft as the first K(h) principal components

of Xt.8

Targeted Diffusion Indices (T-ARDI) A critique of the ARDI model is that not neces-

sarily all series in Xt are equally important to predict y(h)t+h. The ARDIT model of Bai and Ng

(2008) takes this aspect into account. The idea is first to pre-select a subset X∗t of the series

in Xt, that are relevant for forecasting y(h)t+h. Then, predict y

(h)t+h with ARDI model using X∗t to

calculate factors. X∗t can be constructed as follows:

βlasso = arg minβ

[RSS + λ

N∑i=1

|βi|

](13)

X∗t = {Xi ∈ Xt | βlassoi 6= 0} (14)

8See Stock and Watson (2002a) for technical details on the estimation of Ft as well as their asymptoticproperties.

15

Page 16: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

The approach uses the LASSO technique to select X∗t by regressing y(h)t+h on all elements of

Xt and using LASSO penalty to discard uninformative predictors.9 As in Bai and Ng (2008)

λ is selected such that around 30 series are kept in X∗t , while the ARDI hyperparameters are

selected by BIC.

Regularized Data-Rich Model Averaging Kotchoni et al. (2019) proposed a new class

of data-rich model averaging techniques that combines pre-selection and regularization with

the complete subset regressions (CSR) of Elliott et al. (2013). The idea of CSR is to generate

a large number of predictions based on different subsets of Xt and then construct the final

forecast as the simple average of the individual forecasts:

y(h)t+h,m = c+ ρyt + βXt,m + εt,m (15)

y(h)T+h|T =

∑Mm=1 y

(h)T+h|T,m

M(16)

where Xt,m contains L series for each model m = 1, . . . ,M .10

Kotchoni et al. (2019) modified the original model in two ways. First, they propose to

preselect a subset of relevant predictors (first step) before applying the CSR algorithm (second

step). This model is labelled Targeted CSR (T-CSR). The initial step is meant to discipline

the behavior of the CSR algorithm ex ante. In this paper, the set of predictors X is reduced

into a subset X∗t at the first step by Lasso as in (13). We consider T-CSR with 10 and with 20

regressors, labelled T-CSR,10 and T-CSR,20 respectively.

Alternatively, one may choose to use the entire set of predictors but discipline the CSR

algorithm ex-post using a Ridge penalization. This model is labelled (CSR-R). Each CSR

predictive regression is estimated as follows

yt+h,m = c+ ρyt + βXt,m (17)

βridge = arg minβ

[RSS + λ

N∑i=1

β2i

], (18)

and then the final forecast is constructed as usual:

y(h)T+h|T =

∑Mm=1 yT+h|T,m

M

Ridge penalization permits to elude multicolinearity that can occur in some subsets of predic-

9Bai and Ng (2008) have also considered the hard threshold case, where a univariate predictive regression isperformed for each variable Xit at the time. The subset X∗t is then obtained by gathering those series whose

coefficients β(h)i have their t-stat larger than the critical value tc.

10L is usually set to 1, 10 or 20 and M is the total number of models considered (up to 5,000 in this paper).

16

Page 17: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

tors, and produces a well-behaved forecast from every subsample. Ridge parameter λ is selected

as in Hoerl et al. (1975). We consider two specifications of CSR-R: one based on 10 regressors

and another on 20 regressors, labelled CSR-R,10 and CSR-R,20 respectively.

4.2 Pseudo-Out-of-Sample Experiment Design

The pseudo-out-of-sample period is 1991:01 - 2017:12. The forecasting horizons considered

are 1, 3, 6, 9 and 12 months. All models are estimated on the rolling window through the

out-of-sample period. For each model, every evaluation period and forecasting horizon, the

hyperparameters are selected by BIC, except for the Lasso parameter that is tuned by cross

validation. We consider the following variables: industrial production, employment, unem-

ployment rate, consumer price index, core consumer price index, housing starts and the 5-y

mortgage rate. These are typical macroeconomic aggregates that have been forecasted in the

previous literature. All the series except housing starts are modeled as I(1), hence we forecast

the annualized growth rates. The forecasting performance of the above models will be com-

pared on the basis of the Root Mean Square Prediction Error (RMSPE) as is often the case

in forecasting literature. Other metrics could be used but for the sake of simplicity and under

space constraints we stick to the most common one.

4.3 Results

Tables 3 - 5 summarize the results. We report the value of RMSPE ratio with respect to the

reference ARD model as well as the p-value of Diebold-Mariano test. We group the variables

in three categories: real activity (industrial production, employment and unemployment rate),

inflation (CPI and core CPI), housing market (housing starts and 5-year mortgage rate).

Using our large database improves substantially the predictions of real activity series. For

instance, when forecasting industrial production one month ahead, ARDI and CSR-R models

outperform significantly the autoregressive model. For h = 3, improvements are even larger. At

longer horizons, only the CSR-R models show significant ameliorations. The forecasting gains

are between 8% and 19%. We get similar results for two labour market series where CSR-R

models are generally the best.

Table 4 shows that using the large panel greatly improves the prediction of inflation series,

especially for longer horizons. ARDI and CSR-R model are particularly resilient. For instance,

one-year ahead forecasting of CPI inflation is ameliorated by as much as 32% with the CSR-R,20

model. The improvements are in general significant.

Finally, Table 5 reports the results for the housing market. Improvements provided by our

large dataset and the data-rich forecasting techniques are smaller in this case. In case of housing

starts, no model can beat the autoregressive reference at one-month horizon. There are small

17

Page 18: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Table 3: Forecasting real activity

Industrial ProductionModels h=1 h=3 h=6 h=12

RMSPE Relative DM RMSPE Relative DM RMSPE Relative DM RMSPE Relative DMARD 0,010 1,000 0,006 1,000 0,005 1,000 0,004 1,000ARDI 0,009 0,929 0,007 0,006 0,858 0,001 0,005 0,946 0,177 0,004 1,066 0,297T-ARDI 0,010 0,990 0,429 0,006 0,944 0,179 0,005 1,075 0,195 0,005 1,267 0,033T-CSR,10 0,010 0,992 0,445 0,006 0,873 0,005 0,005 0,947 0,229 0,004 1,012 0,438T-CSR,20 0,010 1,020 0,364 0,006 0,891 0,033 0,005 1,054 0,256 0,004 1,055 0,275CSR-R,10 0,009 0,929 0,009 0,006 0,856 0,000 0,005 0,904 0,005 0,004 0,914 0,065CSR-R,20 0,009 0,914 0,004 0,006 0,820 0,000 0,005 0,901 0,024 0,004 0,940 0,228

Employmenth=1 h=3 h=6 h=12

RMSPE Relative DM RMSPE Relative DM RMSPE Relative DM RMSPE Relative DMARD 0,002 1,000 0,001 1,000 0,001 1,000 0,001 1,000ARDI 0,002 0,939 0,118 0,001 1,077 0,238 0,001 1,187 0,033 0,001 1,173 0,082T-ARDI 0,002 1,072 0,116 0,001 1,089 0,119 0,001 1,256 0,024 0,001 1,319 0,023T-CSR,10 0,002 1,077 0,133 0,001 0,996 0,477 0,001 1,001 0,494 0,001 1,016 0,440T-CSR,20 0,002 1,160 0,014 0,001 1,099 0,121 0,001 1,075 0,217 0,001 1,150 0,140CSR-R,10 0,002 0,964 0,252 0,001 0,921 0,117 0,001 0,909 0,043 0,001 0,875 0,083CSR-R,20 0,002 0,959 0,233 0,001 0,908 0,143 0,001 0,905 0,095 0,001 0,886 0,149

Unemployment rateh=1 h=3 h=6 h=12

RMSPE Relative DM RMSPE Relative DM RMSPE Relative DM RMSPE Relative DMARD 0,185 1,000 0,110 1,000 0,088 1,000 0,080 1,000ARDI 0,179 0,930 0,119 0,101 0,845 0,119 0,085 0,940 0,297 0,085 1,149 0,131T-ARDI 0,182 0,962 0,239 0,103 0,869 0,090 0,089 1,035 0,394 0,092 1,337 0,014T-CSR,10 0,181 0,953 0,169 0,098 0,792 0,017 0,086 0,962 0,335 0,080 0,997 0,491T-CSR,20 0,187 1,019 0,364 0,101 0,833 0,047 0,090 1,061 0,283 0,083 1,090 0,253CSR-R,10 0,177 0,916 0,003 0,098 0,790 0,002 0,081 0,855 0,007 0,073 0,827 0,025CSR-R,20 0,177 0,909 0,017 0,096 0,750 0,005 0,080 0,832 0,021 0,073 0,837 0,068

Note: This Table reports the root mean squared predictive error (RMSPE), the ratio with respect to the reference ARD model

(Relative) and the p-value of Diebold-Mariano test (DM).

Table 4: Forecasting inflation

CPIh=1 h=3 h=6 h=12

RMSPE Relative DM RMSPE Relative DM RMSPE Relative DM RMSPE Relative DMARD 0,004 1,000 0,002 1,000 0,002 1,000 0,001 1,000ARDI 0,004 0,971 0,200 0,002 0,964 0,218 0,002 0,938 0,281 0,001 0,715 0,028T-ARDI 0,004 1,016 0,379 0,002 0,970 0,350 0,002 1,110 0,223 0,001 0,934 0,335T-CSR,10 0,004 0,998 0,479 0,002 0,951 0,261 0,002 1,040 0,385 0,001 0,815 0,086T-CSR,20 0,004 1,020 0,321 0,002 0,980 0,398 0,002 1,091 0,278 0,001 0,838 0,162CSR-R,10 0,004 0,915 0,002 0,002 0,903 0,016 0,002 0,965 0,382 0,001 0,718 0,002CSR-R,20 0,004 0,898 0,003 0,002 0,904 0,036 0,002 0,968 0,405 0,001 0,680 0,012

Core CPIh=1 h=3 h=6 h=12

RMSPE Relative DM RMSPE Relative DM RMSPE Relative DM RMSPE Relative DMARD 0,003 1,000 0,002 1,000 0,001 1,000 0,001 1,000ARDI 0,003 0,944 0,093 0,002 0,881 0,069 0,001 0,775 0,052 0,001 0,794 0,147T-ARDI 0,003 1,108 0,010 0,002 0,942 0,249 0,001 1,024 0,437 0,001 1,084 0,325T-CSR,10 0,003 1,053 0,175 0,002 1,048 0,304 0,001 0,934 0,333 0,001 0,907 0,291T-CSR,20 0,003 1,049 0,203 0,002 1,050 0,295 0,001 1,004 0,491 0,001 0,936 0,378CSR-R,10 0,003 0,941 0,139 0,002 0,940 0,186 0,001 0,778 0,005 0,001 0,736 0,011CSR-R,20 0,003 0,921 0,086 0,002 0,926 0,187 0,001 0,759 0,024 0,001 0,703 0,041

Note: See Table 3.

18

Page 19: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Table 5: Forecasting housing market

Housing startsh=1 h=3 h=6 h=12

RMSPE Relative DM RMSPE Relative DM RMSPE Relative DM RMSPE Relative DMARD 0,100 1,000 0,130 1,000 0,163 1,000 0,221 1,000ARDI 0,100 1,000 0,497 0,135 1,067 0,086 0,171 1,094 0,144 0,214 0,942 0,298T-ARDI 0,103 1,078 0,056 0,143 1,205 0,002 0,188 1,321 0,008 0,238 1,158 0,117T-CSR,10 0,101 1,031 0,242 0,132 1,018 0,374 0,170 1,085 0,171 0,223 1,023 0,402T-CSR,20 0,105 1,109 0,058 0,136 1,094 0,074 0,177 1,171 0,041 0,222 1,014 0,442CSR-R,10 0,101 1,023 0,278 0,128 0,962 0,207 0,159 0,946 0,149 0,212 0,921 0,060CSR-R,20 0,100 1,013 0,393 0,128 0,966 0,239 0,159 0,952 0,239 0,213 0,932 0,182

5-Y mortgage rateh=1 h=3 h=6 h=12

RMSPE Relative DM RMSPE Relative DM RMSPE Relative DM RMSPE Relative DMARD 0,272 1,000 0,178 1,000 0,118 1,000 0,084 1,000ARDI 0,270 0,986 0,413 0,181 1,024 0,349 0,124 1,121 0,083 0,097 1,335 0,011T-ARDI 0,256 0,886 0,106 0,178 0,992 0,463 0,135 1,323 0,037 0,115 1,871 0,000T-CSR,10 0,256 0,882 0,082 0,183 1,049 0,259 0,134 1,308 0,025 0,106 1,585 0,000T-CSR,20 0,263 0,931 0,223 0,186 1,084 0,151 0,135 1,330 0,015 0,115 1,876 0,000CSR-R,10 0,260 0,913 0,002 0,172 0,928 0,032 0,116 0,974 0,280 0,088 1,093 0,077CSR-R,20 0,257 0,892 0,016 0,172 0,931 0,098 0,116 0,975 0,334 0,093 1,219 0,007

Note: See Table 3.

improvements at horizons 3 and 6 months, but they are not significant. The only substantial

increase in forecasting power is at the one-year horizon with CSR-R,10 model, which improves

the prediction by 8%. In case of 5-y mortgage rate, all data-rich models outperform the ARD

at h = 1. The best is T-CSR,10 with a 12% improvement. CSR-R,10 is the best at three-month

horizon, but for longer horizons the autoregressive model is generally the best option.

5 Measuring macroeconomic uncertainty

In this section we demonstrate how our panel can be used to measure macroeconomic uncer-

tainty. To do so, we follow Jurado et al. (2015) and construct the uncertainty measure as the

common stochastic volatility of forecasting errors of many macroeconomic indicators. We did

not find any other study that constructs a measure of macroeconomic uncertainty for Canada

as in Jurado et al. (2015). Ozturk and Sheng (2017) approximate it from the survey of profes-

sional forecasters. Therefore, constructing the Canadian macroeconomic uncertainty measure

is an interesting contribution in itself.

Uncertainty is defined as the conditional volatility of forecasting error of variable yj:

Uyjt(h) =

√E[(yjt+h − E[yjt+h|It])2|It] (19)

where E[yjt+h|It] is the optimal h-step ahead forecast and It measures the information available

to economic agents at time t. Since this can be constructed for any variable j = 1, . . . N , it is

19

Page 20: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

natural to aggregate individual uncertainties into a common index:

Uyt (h) =Ny→∞

Ny∑j=1

ωjUyjt(h) ≡ Eω[Uy

jt(h)]. (20)

To implement the procedure, one needs to construct the optimal forecasts for many variables,

produce the conditional volatilities of the forecasting errors and then aggregate them up. For

the first part, following Jurado et al. (2015) the forecasting model is:

Xt = ΛFFt + ut

X2t = ΛWWt + vt

yjt+h = ρ(L)yjt + β(L)Ft + γ(L)F 21,t + δ(L)Wt + ejt+h

This is essentially a factor-augmented predictive regression as the ARDI model from the pre-

vious section. The only difference is the inclusion of nonlinearities by considering factors from

squared data, Wt, and adding the square of the first level factor directly in the predictive

regression, F 21,t.

In this application, Xt is our large dimensional Canadian dataset as used in the previous

section. The number of factors in Ft and Wt is selected by Bai and Ng (2002). The lag orders

of ρ(L) and [β(L), γ(L), δ(L)] are fixed to 4 and 2. Forecasting horizons are h = (1, 2, . . . 12)

months.11 The target yjt+h is the jth variable in the dataset. Once the in-sample forecast is

obtained, an univariate stochastic volatility model is estimated on every time series of fore-

casting errors to get Uyjt(h). Finally, the measure of macroeconomic uncertainty is constructed

by aggregating individual uncertainties Uyjt(h). This can be done by taking the cross-sectional

average or by extracting the first principal component. Also, we can aggregate across all series,

which provides the aggregate macroeconomic uncertainty, or condition across sectors or regions.

Figure 8 shows macroeconomic uncertainty constructed as the cross-sectional average of

all individual uncertainties. Several observations are worth mentioning. Uncertainty has a

clear countercyclical behavior since it systematically increases during recessions. Horizontal

lines indicate 1.65 standard deviation above the mean of each series. There are only few

episodes where the uncertainty exceeds 1.65 standard deviation above the mean: 1981 and

2009 recessions. Recall that shaded areas are C.D. Howe recession dates. The committee did

not classify 2001 as a recession like in U.S., neither 2015 when oil prices went down. There are,

however, some moderate increases in the measured uncertainty during those periods.

11We followed Jurado et al. (2015) for this choice of hyperparameters.

20

Page 21: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 8: Measure of macroeconomic uncertainty

1985 1990 1995 2000 2005 2010 20150.75

0.8

0.85

0.9

0.95

1

1.05

1.1

1.15

Note: This Figure shows aggregate uncertainty measures for horizons h = 1, 3, 12. Horizontal lines indicate 1.65 standard

deviation above the mean of each series.

6 Impulse Response Analysis

In this section we use the CAN-MD dataset to conduct a structural impulse response analysis

in a data-rich environment. The empirical model is a factor-augmented VAR as in Boivin et al.

(2018). Start with the following state-space factor representation

Xt = ΛFt + ut, (21)

Ft = Φ(L)Ft−1 + et, (22)

where Xt contains N economic and financial indicators, Ft represents K unobserved factors

(N >> K), Λ is a N ×K matrix of factor loadings, ut are idiosyncratic components of Xt that

are uncorrelated at all leads and lags with Ft and with the factor innovations et. We assume

a stationary and an approximate factor model, as some limited cross-section correlations are

allowed among the idiosyncratic components.12 The estimation of the model (21)–(22) is based

on a two-step principal components procedure, where factors are approximated in the first step,

and the factors’ VAR process is estimated in the second step.13

12See Stock and Watson (2016) for an overview of the modern factor analysis literature, and the distinctionbetween exact and approximate factor models.

13Stock and Watson (2002a) prove the consistency of such an estimator in the approximate factor model whenboth cross-section and time sizes, N and T , go to infinity, and without restrictions on N/T . Moreover, theyjustify using Ft as regressor without adjustment. Bai and Ng (2006) furthermore show that PCA estimators

21

Page 22: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

The core of IRF analysis is identification of structural shocks. To do so, we apply the

contemporaneous timing restrictions procedure as in Stock and Watson (2016) that identi-

fies shocks by restricting only the impact response of a small number of economic indicators.

Following Boivin et al. (2018), start from the moving-average representation of Xt:

Xt = B(L)et + ut, (23)

= Λ[I − Φ(L)L]−1et + ut.

Assume factor innovations et are linear combinations of structural shocks εt = Het where H is a

nonsingular square matrix and E[εtε′t] = I. Thus, the structural moving-average representation

of Xt is obtained by replacing et in (23):

Xt = B∗(L)εt + ut, (24)

= Λ[I − Φ(L)L]−1H−1εt + ut.

To identify the structural shocks εt, assume a recursive structure implying that K − 1

indicators do not respond on impact to certain shocks. Specifically, the contemporaneous

impact matrix of Xt, B?(0), takes the following form where zeros represent timing restrictions

B?0 ≡ B? (0) =

x 0 · · · 0

x x. . . 0

x x. . . 0

x x · · · x...

.... . .

...

x x · · · x

, (25)

To estimate the matrix H, we proceed as in Stock and Watson (2016), noting that B?0:Kεt =

B0:Ket implies B∗0:KB∗′0:K = B0:KΣeB

′0:K , where B0:K contains the first K rows of B0 ≡ B (0) =

Λ, B?0:K = B0:KH

−1, and Σe is the covariance matrix of et. Since B?0:K is a K × K lower

triangular matrix, then it must be the case that B∗0:K can be obtained by performing a Choleski

decomposition of (B0:KΣeB′0:K), i.e.: B∗0:K = Chol(B0:KΣeB

′0:K). It follows that

H = [Chol(B0:KΣeB′0:K)]−1B0:K . (26)

These restrictions allow to just-identify the matrix H and therefore identify the structural

impulse responses in (24). The estimate of H is obtained by replacing B0:K and Σe with their

are√T consistent and asymptotically normal if

√T/N → 0. Inference should take into account the effect of

generated regressors, except when T/N goes to zero.

22

Page 23: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

estimates in (26). See Boivin et al. (2018) for further details and properties of this identification

procedure.

The goal in this application is to estimate the effects of macroeconomic uncertainty shocks on

Canadian economy. Few studies have been done for US economy, but not for Canada. And since

in this paper we provide a measure of macroeconomic uncertainty, it is an occasion to fill the gap

and estimate its effects on Canadian economy. The uncertainty is proxied by the cross-sectional

average of all individual uncertainties reported in Figure 8. Following Bloom (2009), we impose

the following recursive ordering: [TSX,UNC, TBill3M,CPI,HOURS,EMP, INDPRO], where

TSX stands for the TSX returns, UNC is the uncertainty measure, TBill3M is the 3-month

treasury bill, CPI is the CPI inflation, Hours are worked hours, while EMP and INPROD

are employment and industrial production growth rates respectively. Hence, there are 7 latent

factors in Ft. The uncertainty shock is ordered second in εt. Carriero et al. (2019) also suggest

that macroeconomic uncertainty, as measured by Jurado et al. (2015), should be considered

exogenous for the US economy. The VAR lag order in equation (22) is set to 3.

Figure 9: Dynamic responses to uncertainty shock

Note: This Figure shows dynamic responses of selected variables to an uncertainty shock. The grey areas indicate the 90%

confidence intervals computed using 5000 bootstrap replications.

The results for selected variables are reported in Figure 9. A one-standard deviation increase

23

Page 24: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

in uncertainty causes a decline in financial markets as indicated from the negative reaction of

TSX and treasury bills. The credit spread increases suggesting higher uncertainty is related

to higher external financing premium. As a result, the volume of credit decreases. On the

real side, employment, industrial production hours worked, housing starts and manufacturing

new orders decline, while the unemployment increases, so that uncertainty could be seen as

a contributor to the business cycle. The prices, however, do not react significantly. Overall,

these results are in line with Bloom (2009) and Jurado et al. (2015) who used VAR models

with similar recursive orderings on US data.

Table 6 reports the importance of uncertainty shocks in explaining macroeconomic fluctu-

ations since 1981. The second column shows the contribution of the uncertainty shock to the

variance of the common component (all 7 common shocks in εt) of the series at a 36-month

horizon. The third column reports the fraction of the variance of those series explained by

the common component, which is simply the R2. Finally, the fourth column is the product of

the second and third columns, and presents the contribution of the uncertainty shock to the

total variance of the series. In other words, the variance decomposition in the fourth column

takes into account the idiosyncratic shocks of the series. Uncertainty shocks are particularly

important, among common shocks, for treasury bills, labour market and housing starts where

they explain around 30% of the variance. They are also relevant for hours, industrial produc-

tion, credit spreads and credit volume with variance decomposition between 8% and 18%. The

third column shows that the common component explains a sizeable fraction of the variabil-

ity of those series, particularly for treasury bills, inflation, employment, industrial production,

housing starts and credit.

7 Conclusion

In this paper we proposed a monthly large-scale Canadian macroeconomic database containing

hundreds of Canadian and provincial economic indicators. It is designed to be updated regularly

through the StatCan database and is publicly available. Real-time vintage are collected as well.

It relieves users to deal with data changes and methodological revisions, and we provided an

already balanced and stationary panel starting in 1981.

The panel seems to have a factor structure where the common factors explain a sizeable

portion of variation in Canadian and provincial aggregate series. The principal component

analysis of the dataset revealed few driving forces of the Canadian economy such as housing

and bond markets, production, wages, exchange rates and energy prices.

The advantages of this dataset for macroeconomic research are shown in several examples.

The panel is useful to capture turning points of the Canadian business cycle. Using Lasso and

factor-probit models we showed that this dataset has substantial explanatory power in addition

24

Page 25: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Table 6: Variance decomposition and R2

Series VD: R2: VD:common shocks common comp. total

TSXSP 0,012 0,141 0,002UNC 0,547 0,186 0,102TBill3M 0,318 0,740 0,235CPI 0,008 0,709 0,006HOURS 0,183 0,232 0,043EMP 0,317 0,439 0,139INDPRO 0,102 0,618 0,063NEW ORDERS 0,038 0,341 0,013UNEMP 0,334 0,252 0,084HOUSE STARTS 0,364 0,629 0,229CREDIT SPREAD 0,148 0,118 0,017CREDIT 0,081 0,526 0,043

Notes: The second column reports the contribution of the uncertainty shock to the variance of the common component of the

series at a 36-month horizon. The third column contains the fraction of the total variability of the series explained by the common

component. The fourth column is the product of the previous two and represents the fraction of the total variance of the series.

to the standard term spread predictor.

The information contained in the dataset has substantial predictive power when forecasting

key macroeconomic indicators. Factor models and regularized complete subset regressions were

very resilient in forecasting real activity variables such as industrial production, employment

and unemployment rate, as well as CPI and Core CPI inflation. Gains were less important for

the housing market.

The panel was also used to construct first, to the best of our knowledge, Canadian measures

of macroeconomic uncertainty a la Jurado et al. (2015). The results suggest two significant

episodes of uncertainty: 1981 and 2009 recessions. Finally, we used that measure to show

how the dataset can serve for structural impulse response analysis. We estimated the effects

of uncertainty shocks using the factor-augmented VAR methodology a la Boivin et al. (2018)

with our measure of macroeconomic uncertainty. The results suggested that an increase in

uncertainty is associated with a decline in real activity and financial markets. The findings are

in line with the literature using US data.

References

Ahn, S. K. and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors.Econometrica, 81(3):1203–1227.

Alessi, L., Barigozzi, M., and Capasso, M. (2010). Improved penalization for determining the

25

Page 26: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

number of factors in approximate factor models. Statistics and Probability Letters, 80(23-24):1806–1813.

Amengual, D. and Watson, M. (2007). Consistent estimation of the number of dynamic factorsin large N and T panel. Journal of Business and Economic Statistics, 25(1):91–96.

Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models.Econometrica, 70(1):191–221.

Bai, J. and Ng, S. (2006). Confidence intervals for diffusion index forecasts and inference forfactor-augmented regressions. Econometrica, 74:1133–1150.

Bai, J. and Ng, S. (2007). Determining the number of primitive shocks. Journal of Businessand Economic Statistics, 25(1):52–60.

Bai, J. and Ng, S. (2008). Forecasting economic time series using targeted predictors. Journalof Econometrics, 146:304–317.

Bedock, N. and Stevanovic, D. (2017). An empirical study of credit shock transmission in asmall open economy. Canadian Journal of Economics, 50(2):541–570.

Bernanke, B., Boivin, J., and Eliasz, P. (2005). Measuring the effects of monetary policy: afactor-augmented vector autoregressive (FAVAR) approach. Quarterly Journal of Economics,120:387–422.

Bloom, N. (2009). The impact of uncertainty shocks. Econometrica, 77(3):623–685.

Boivin, J., Giannoni, M., and Stevanovic, D. (2010). Monetary transmission in a small openeconomy: more data, fewer puzzles. Technical report, Columbia Business School, ColumbiaUniversity.

Boivin, J., Giannoni, M., and Stevanovic, D. (2018). Dynamic effects of credit shocks in adata-rich environment. Journal of Business & Economic Statistics, 0(0):1–13.

Carriero, A., Clark, T., and Marcellino, M. (2019). The identifying information in vectorautoregressions with time-varying volatilities: An application to endogenous uncertainty.Technical report.

Champagne, J., Poulin-Bellisle, G., and Sekkel, R. (2018). The real-time properties of the bankof canada’s staff output gap estimates. Journal of Money, Credit and Banking, 50(6):1167–1188.

Cross, P. and Bergevin, P. (2012). Turning points: Business cycles in canada since 1926.Technical Report 366, C.D. Howe Institute, Vancouver, British Columbia.

de la Escosura, L. P. (2016). Mismeasuring long-run growth: the bias from splicing nationalaccounts?the case of spain. Cliometrica, 10(3):251–275.

de la Fuente Moreno, A. (2014). A” mixed” splicing procedure for economic time series. Es-tadıstica espanola, 56(183):107–121.

Elliott, G., Gargano, A., and Timmermann, A. (2013). Complete subset regressions. Journalof Econometrics, 177(2):357–373.

Estrella, A. and Mishkin, F. (1998). Predicting us recessions: Financial variables as leadingindicators. Review of Economics and Statistics, 80:45–61.

26

Page 27: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Galbraith, J. W. and Tkacz, G. (2007). How far can forecasting models forecast? Forecastcontent horizons for some important macroeconomic variables. Technical report, Bank ofCanada Working Paper No. 2007-1.

Gosselin, M.-A. and Tkacz, G. (2001). Evaluating factor models: An application to forecastinginflation in Canada. Technical report, Bank of Canada Working Paper No. 2001-18.

Hallin, M. and Liska, R. (2007). Determining the number of factors in the general dynamicfactor model. Journal of the American Statistical Association, 102:603–617.

Hoerl, A. E., Kennard, R. W., and Baldwin, K. F. (1975). Ridge regression: some simulations.Communications in Statistics - Theory and Methods, 4:105–123.

Jurado, K., Ludvigson, S., and Ng, S. (2015). Measuring uncertainty. American EconomicReview, 105(3):1177–1216.

Kotchoni, R., Leroux, M., and Stevanovic, D. (2019). Macroeconomic forecast accuracy in adata-rich environment. Journal of Applied Econometrics, (Forthcoming).

Kruskal, W. H. and Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis.Journal of the American statistical Association, 47(260):583–621.

Leamer, E. E. (2015). Housing really is the business cycle: What survives the lessons of 2008-09?Journal of Money, Credit and Banking, 47(S1):43–50.

Mao Takongmo, C. and Stevanovic, D. (2015). Selection of the number of factors in presenceof structural instability: a monte carlo study. Actualite Economique, 91:177–233.

McCracken, M. W. and Ng, S. (2016). Fred-md: A monthly database for macroeconomicresearch. Journal of Business and Economic Statistics, 34(4):574–589.

Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenval-ues. The Review of Economics and Statistics, 92(4):1004–1016.

Ozturk, E. A. and Sheng, X. S. (2017). Measuring global and country-specific uncertainty.Technical report, IMF Working Paper.

Sties, M. (2017). Forecasting recessions in a big data environment. Technical report, Depart-ment of Economics, University of Alberta.

Stock, J. H. and Watson, M. W. (2002a). Forecasting using principal components from a largenumber of predictors. Journal of the American Statistical Association, 97:1167–1179.

Stock, J. H. and Watson, M. W. (2002b). Macroeconomic forecasting using diffusion indexes.Journal of Business and Economic Statistics, 20(2):147–162.

Stock, J. H. and Watson, M. W. (2016). Factor models and structural vector autoregressionsin macroeconomics. Handbook of Macroeconomics, 2A:415–526.

27

Page 28: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

A Additional results

The series under investigation is split into 12 subsamples, each consisting of (τj)12j=1 observations

specific to a given month. Under the null hypothesis of no seasonal behavior, these subsamplesmust have the same mean. The Kruskal-Wallis test (Kruskal and Wallis (1952)) offers a non-parametric approach to test this hypothesis. In each subsample, observations are assigned arank Rij following their relative magnitudes. If T is the total number of observations, theKruskal-Wallis statistic is given by:

KW =12

T (T + 1)

12∑j=1

τj

(∑τi=1 jRij

τj

)2

− 3(T + 1)approx.∼ χ2(12− 1). (27)

Table 7: Kruskal-Wallis Rank Sum Test Results

Unemployment duration Unajusted Ajustedchi-squared p-value chi-squared p-value

1-4 weeks 425.96 0 1.1603 0.99995-13 weeks 367.71 0 0.58898 114-24 weeks 384.76 0 0.51579 127 weeks and over 153.15 0 1.6229 0.9994Average duration 115.25 0 0.4043 1Average duration by provinceNew Foundland 285.43 0 0.61417 1Prince Edward Island 102.87 0 0.77239 1Nova Scotia 253.99 0 0.79096 1New Brunswick 154.1 0 0.93345 1Quebec 116.84 0 0.91172 1Ontario 108.3 0 0.97177 1Manitoba 204.74 0 0.22762 0.9999Saskatchewan 109.4 0 1.0233 1Alberta 117.34 0 0.95383 0.9999British Columbia 111.28 0 1.0875 1

A.1 Factor interpretation over time

Here, we study the factor interpretation through time by estimating the factor model recursivelysince 1990M12. The resulting time series form the basis of the heatmaps shown in Figures 11and 12. For convenience, variables are grouped in categories, the exact composition of whichare given in the data appendix. Tables 8 and 9 offer a more granular look in the interpretationand stability of factors, reporting the top ten series in terms of average squared loadings oversubperiods. The subperiods have been chosen to match visual changes in some of the heatmaps,facilitating the parallel between the two.

The first factor weighs heavily on housing, manufacturing orders and to a smaller extentlabour market variables. The factor appears overall very stable and this can be confirmed by

28

Page 29: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 10: Growth rate of unemployment duration in Canada

−0.2

−0.1

0.0

0.1

0.2

1980 1990 2000 2010 2020Years

UN

EM

P_D

UR

A_2

7._C

AN

UnajustedAdjusted

(a) 27 plus weeks

−0.2

0.0

0.2

1980 1990 2000 2010 2020Years

UN

EM

P_D

UR

A_a

vg_C

AN

UnajustedAdjusted

(b) Mean duration

the ranking of series in three selected subperiods reproduced in Table 8. The top three series,total building permits in different regions, are preserved throughout the whole sample, as wellas their order. We also notice the fading of labour series from the factor by noting that the twounemployment series disappear from the top ten in the later two subperiods. The heatmaps offactors two and three suggest a near inversion of the order of factors between 2000 and 2005.The former captures a combination of production, monetary aggregates, credit measures andsome prices, while the latter is more heavily centered on interest rates and their differentials.This analysis is confirmed in Table 8. The fourth factor is much harder to interpret, exceptperhaps for the earlier half of the sample when it captures mostly production and labourmarket movements. It is, however, interesting to note that a rather neat difference shows upright before the last financial crisis in this factor, just as in the second and third factor. Thischange is in fact also present in other factors as shown in Figure 12. The top series featuredin Table 8 confirm the difficulty of pinning down an interpretation for this factor in the timedimension as the main series change a lot across subperiods. However, we do note remarkablymore stability since the 2008 crisis. The fifth factor displayed in Figure 12 seems very muchcentered on labour market movements early in the sample, while it captures the behavior ofseveral key variables of international trade and price indexes since 2008. Table 9 reveals thatexchange rates were important variables since the second subperiod, but came to dominateonly by the end of the sample. The next factor is related to the labour market movements, inparticular to construction wage indexes. The seventh factor captured trade and stock variablesuntil the financial crisis, after which it is related to labour and price movements. Finally, thelast factor is much more diffuse.

29

Page 30: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 11: Heatmaps for factors 1 to 4

(a) Factor 1 (b) Factor 2

(c) Factor 3 (d) Factor 4

Notes: Factors and loadings are estimated recursively using an expanding window. Displayed shades of red

capture squared loadings.

30

Page 31: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Figure 12: Heatmaps for factors 5 to 8

(a) Factor 5 (b) Factor 6

(c) Factor 7 (d) Factor 8

Notes: Factors and loadings are estimated recursively using an expanding window. Displayed shades of red

capture squared loadings.

31

Page 32: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Table 8: Top ten explained series for factors 1 to 4

Factor 11991M1-2005M1 2005M1-2010M1 2010M1-2017M12DUR INV RAT new build Total CAN build Total CANMANU INV RAT new build Total ONT build Total ONTbuild Total ONT build Total QC build Total QCbuild Total CAN build Total ATL build Total ATLbuild Total QC N DUR INV RAT new build Comm CANN DUR INV RAT new MANU INV RAT new N DUR INV RAT newbuild Total ATL DUR INV RAT new build Ind CANbuild Comm QC build Comm CAN build Comm QCbuild Comm ATL build Ind CAN build Comm ONTbuild Ind CAN build Comm QC build Total BCFactor 21991M1-2005M1 2005M1-2010M1 2010M1-2017M12FIN new TBILL 6M GPI newG AVG 10p.TBILL 3M GOV AVG 1 3Y IP newG AVG 5.10.Bank rate TBILL 3M BSI newCRED T PC PAPER 3M TBILL 6MG AVG 3.5.Bank rate BANK RATE L GDP newCRE BUS GOV AVG 3 5Y TBILL 3MG AVG 1.3.Bank rate PC PAPER 2M PC PAPER 3MTBILL 6M GOV AVG 5 10Y BANK RATE LPC PAPER 3M MORTG 1Y DM newTBILL 3M FIN new GOV AVG 1 3YFactor 31991M1-2005M1 2005M1-2010M1 2010M1-2017M12IP new G AVG 5.10.Bank rate FIN newGPI new G AVG 10p.TBILL 3M G AVG 5.10.Bank rateGOV AVG 1 3Y G AVG 3.5.Bank rate G AVG 10p.TBILL 3MTBILL 6M G AVG 1.3.Bank rate G AVG 3.5.Bank rateBSI new FIN new G AVG 1.3.Bank rateGDP new BSI new BSI newGOV AVG 3 5Y GDP new GDP newTBILL 3M IP new PC PAPER 3MBANK RATE L GPI new TBILL 6MDM new DM new BANK RATE LFactor 41991M1-2005M1 2005M1-2010M1 2010M1-2017M12hstart QC new hstart QC new CPI MINUS FOO CANGDP new hstart CAN new G AVG 5.10.Bank rateBSI new CRED MORT G AVG 3.5.Bank ratehstart CAN new CRED HOUS G AVG 1.3.Bank ratehstart ONT new hstart ONT new hstart CAN newUSDCAD new CPI MINUS FOO CAN G AVG 10p.TBILL 3MCRED MORT GDP new CPI ALL CANSPI new BSI new CPI SHEL CANCRED HOUS GOV AVG 1 3Y CRED TGPI new hstart ATL new GOV AVG 1 3Y

Notes: Factor loadings estimated recursively with an expanding window. Rankings are based on mean squared

loadings over the indicated period.

32

Page 33: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

Table 9: Top ten explained series for factors 5 to 8

Factor 51991M1-2005M1 2005M1-2010M1 2010M1-2017M12CAN US SEC NETFLOW CAN EQTY NETFLOW USDCAD newCAN EQTY NETFLOW CAN US SEC NETFLOW CERI newCAN SEC NETFLOW CAN SEC NETFLOW IPPI MOTOR CANbuild Total PRA hstart QC new IPPI MACH CANWAGE CON CAN hstart ATL new DJ LOWAGE CON ONT CRED MORT JPYCAD newhstart QC new WAGE CON CAN DJ HIbuild Comm PRA build Comm PRA SP500hstart ATL new WAGE CON ONT DJ CLObuild Ind BC CRED HOUS MANU UNFIL newFactor 61991M1-2005M1 2005M1-2010M1 2010M1-2017M12WAGE CON CAN WAGE CON CAN WAGE CON CANWAGE CON ONT WAGE CON ONT WAGE CON PRAWAGE CON BC WAGE CON PRA CAN US SEC NETFLOWWAGE CON PRA WAGE CON BC CRED HOUSWAGE CON QC WAGE CON QC CRED MORTWAGE CON ATL WAGE CON ATL CAN SEC NETFLOWCAN SEC NETFLOW CERI new WAGE CON ONTbuild Comm BC hstart QC new WAGE CON BCCAN US SEC NETFLOW CRED MORT WAGE CON QCMANU TOT INV new CRED HOUS hstart QC newFactor 71991M1-2005M1 2005M1-2010M1 2010M1-2017M12CERI new CERI new WAGE CON CANDJ LO USDCAD new WAGE CON ONTDJ CLO WAGE CON CAN WAGE CON QCSP500 IPPI MOTOR CAN WAGE CON BCGBPCAD new IPPI CAN WAGE CON PRAIPPI CAN WAGE CON ONT WAGE CON ATLDJ HI GBPCAD new IPPI ENER CANIPPI MOTOR CAN WAGE CON QC hstart ATL newFIN new WAGE CON PRA CPI GOOD CANJPYCAD new WAGE CON BC FOR SEC NETFLOWFactor 81991M1-2005M1 2005M1-2010M1 2010M1-2017M12MANU TOT INV new SP500 IPPI ENER CANDUR TOT INV new DJ CLO SP500UNEMP CAN EOIL BP new DJ CLOEMP MANU CAN DUR UNFIL new G AVG 1.3.Bank rateDJ CLO WTISPLC G AVG 3.5.Bank rateSP500 MANU UNFIL new IPPI CANCERI new IPPI ENER CAN WTISPLCEMP CAN DJ LO G AVG 5.10.Bank rateUNEMP DURA 27. CAN G AVG 3.5.Bank rate G AVG 10p.TBILL 3MDJ HI G AVG 1.3.Bank rate EOIL BP new

Notes: Factor loadings estimated recursively with an expanding window. Rankings are based on mean squared

loadings over the indicated period.

33

Page 34: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

B Data SetsThe transformation codes are: 1 - no transformation; 2 - first difference; 4 - logarithm; 5 - firstdifference of logarithm.

No. Series Code Series Description Region T-CodePRODUCTION

1 GDP new GDP total CAN 52 BSI new GDP business CAN 53 NBSI new GDP non business CAN 54 GPI new GDP goods CAN 55 SPI new GDP services CAN 56 IP new GDP industrial production CAN 57 NDM new GDP non durable goods CAN 58 DM new GDP durables CAN 59 OILP new GDP mining, petrol and gas CAN 510 CON new GDP construction CAN 511 RT new GDP retail trade CAN 512 WT new GDP wholesale trade CAN 513 PA new GDP public administration CAN 514 FIN new GDP finance and insurance CAN 515 OIL CAN Refined petroleum production CAN 516 EMP CAN Employment total CAN 517 EMP SERV CAN Employment services CAN 518 EMP FOR OIL CAN Employment forestry, fishing, mining, oil and gas CAN 519 EMP CONS CAN Employment construction CAN 520 EMP SALES CAN Employment sales (wholesale and retail trade) CAN 521 EMP FIN CAN Employment finance, insurance and real estate CAN 522 EMP PART CAN Employment part time CAN 523 EMP MANU CAN Employment manufacturing CAN 524 UNEMP CAN Unemployment rate CAN 225 WAGE CON CAN Construction union wage index CAN 526 WAGE CON ATL Construction union wage index ATL 527 WAGE CON QC Construction union wage index QC 528 WAGE CON ONT Construction union wage index ONT 529 WAGE CON PRA Construction union wage index PRA 530 WAGE CON BC Construction union wage index BC 531 UNEMP DURA 1.4 CAN Unemployment duration (1-4 weeks) CAN 5*32 UNEMP DURA 5.13 CAN Unemployment duration (5-13 weeks) CAN 5*33 UNEMP DURA 14.25 CAN Unemployment duration (14-24 weeks) CAN 5*34 UNEMP DURA 27. CAN Unemployment duration (27+ weeks) CAN 5*35 UNEMP DURA avg CAN Unemployment average duration CAN 5*36 TOT HRS CAN Hours worked total CAN 537 GOOD HRS CAN Hours worked goods CAN 5

HOUSING AND CONSTRUCTION38 hstart CAN new Housing starts (units) CAN 439 hstart ATL new Housing starts (units) ATL 440 hstart QC new Housing starts (units) QC 441 hstart ONT new Housing starts (units) ONT 442 hstart PRA new Housing starts (units) PRA 443 hstart BC new Housing starts (units) BC 444 build Total CAN Building permits (tous) CAN 445 build Ind CAN Building permits (industries) CAN 446 build Comm CAN Building permits (commerce) CAN 447 build Total ATL Building permits (tous) ATL 448 build Ind ATL Building permits (industries) ATL 449 build Comm ATL Building permits (commerce) ATL 450 build Total QC Building permits (tous) QC 451 build Ind QC Building permits (industries) QC 452 build Comm QC Building permits (commerce) QC 453 build Total ONT Building permits (tous) ONT 454 build Ind ONT Building permits (industries) ONT 455 build Comm ONT Building permits (commerce) ONT 456 build Total PRA Building permits (tous) PRA 457 build Ind PRA Building permits (industries) PRA 458 build Comm PRA Building permits (commerce) PRA 459 build Total BC Building permits (tous) BC 460 build Ind BC Building permits (industries) BC 461 build Comm BC Building permits (commerce) BC 4

MANUFACTURING, SALES AND INVENTORIES62 MANU N ORD new Manufacturing new orders (total) CAN 563 MANU UNFIL new Manufacturing unfilled orders (total) CAN 564 MANU TOT INV new Manufacturing inventories (total) CAN 565 MANU INV RAT new Manufacturing inventories to shipments ratio (total) CAN 166 N DUR INV RAT new Manufacturing inventories to shipments ratio (durables) CAN 167 DUR N ORD new Manufacturing new orders (durables) CAN 568 DUR UNFIL new Manufacturing unfilled orders (durables) CAN 569 DUR TOT INV new Manufacturing inventories (durables) CAN 570 DUR INV RAT new Manufacturing inventories to shipments ratio (durables) CAN 1

MONEY AND CREDIT71 M3 M3 (gross) CAN 572 M2p M2+ (gross) CAN 573 M BASE1 Monetary base CAN 574 CRED T Total credit CAN 575 CRED HOUS Household credit CAN 576 CRED MORT Mortgage credit CAN 577 CRED CONS Consumption credit CAN 578 CRE BUS Business credit CAN 579 BANK RATE L Bank rate CAN 280 PC PAPER 1M Corporate paper rate (1 month) CAN 281 PC PAPER 2M Corporate paper rate (2 months) CAN 282 PC PAPER 3M Corporate paper rate (3 months) CAN 283 GOV AVG 1 3Y Governmental bonds (average rate) (1-3 years) CAN 284 GOV AVG 3 5Y Governmental bonds (average rate) (3-5 years) CAN 285 GOV AVG 5 10Y Governmental bonds (average rate) (5-10 years) CAN 2

34

Page 35: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

86 GOV AVG 10pY Governmental bonds (average rate) (10+ years) CAN 287 MORTG 1Y Mortgage rate (1 year) CAN 288 MORTG 5Y Mortgage rate (5 years) CAN 289 TBILL 3M Treasury bills (3 months) CAN 290 TBILL 6M Treasury bills (6 months) CAN 291 PC 3MBank rate Corporate paper rate (3 months) ? Bank rate CAN 192 G AVG 1-3 Bank rate Government bonds (1-3 years) - Bank rate CAN 193 G AVG 1-5 Bank rate Government bonds (3-5 years) - Bank rate CAN 194 G AVG 5-10 Bank rate Government bonds (5-10 years) - Bank rate CAN 195 TBILL 6MBank rate Treasury bond (6 months) - Bank rate CAN 196 G AVG 10pTBILL 3M Government Bonds (10+ years) - Treasury Bond (3 months) CAN 1

INTERNATIONAL TRADE97 RES TOT Total Canada’s official international reserves CAN 598 RES USD Canadian USD reserves CAN 599 RES IMF Canadian reserve position at the IMF CAN 5100 Imp BP new Imports total CAN 5101 IOIL BP new Imports oil CAN 5102 Exp BP new Exports total CAN 5103 EOIL BP new Exports oil CAN 5104 EX ENER BP new Export energy products CAN 5105 EX MINER BP new Exports non-metallic ores CAN 5106 EX METAL BP new Exports metal and other mineral products CAN 5107 EX IND EQUIP BP new Exports industrial machinery, pieces and equipment CAN 5108 EX TRANSP BP new Exports motor vehicules and parts CAN 5109 EX CONS BP new Exports consumption goods CAN 5110 IMP METAL BP new Imports metal and other mineral products CAN 5111 IMP IND EQUIP BP new Imports industrial machinery, pieces and equipment CAN 5112 IMP TRANSP BP new Imports motor vehicules and parts CAN 5113 IMP CONS BP new Imports consumption goods CAN 5114 USDCAD new Exchange rate CADUSD CAN 5115 JPYCAD new Exchange rate CADJPY CAN 5116 GBPCAD new Exchange rate CADGBP CAN 5117 CERI new Canadian-Dollar Effective Exchange Rate Index (CERI) CAN 5

PRICES118 CPI ALL CAN Consumption price index (CPI) (all) CAN 5119 CPI SHEL CAN CPI (shelter) CAN 5120 CPI CLOT CAN CPI (clothing and footwear) CAN 5121 CPI HEA CAN CPI (health and personal care) CAN 5122 CPI MINUS FOO CAN CPI (all minus food) CAN 5123 CPI MINUS FEN CAN CPI (all minus food and energy) CAN 5124 CPI DUR CAN CPI (durable goods) CAN 5125 CPI GOOD CAN CPI (goods) CAN 5126 CPI SERV CAN CPI (services) CAN 5127 IPPI CAN Industrial production price index (IPPI) (all) CAN 5128 IPPI ENER CAN IPPI (energy) CAN 5129 IPPI WOOD CAN IPPI (wood) CAN 5130 IPPI METAL CAN IPPI (metal and construction materials) CAN 5131 IPPI MOTOR CAN IPPI (motor vehicles and parts) CAN 5132 IPPI MACH CAN IPPI (industrial machinery and equipment) CAN 5133 WTISPLC Petroleum price Western Intermediate Select (WTI) (FRED) 5

STOCK MARKETS134 TSX val Toronto stock exchange (value of stocks) 5135 TSX vol Toronto stock exchange (volume of stocks) 5136 DJ HI Dow Jones index (high) 5137 DJ LO Dow Jones index (low) 5138 DJ CLO Dow Jones index (close) 5139 SP500 Standard and Poor’s (500) index 5

PROVINCIAL/REGIONAL SERIES

PRODUCTION140 OIL QC Refined petroleum production QC 5141 OIL ALB Refined petroleum production ALB 5

LABOR MARKET142 EMP NF Employment total NF 5143 EMP SERV NF Employment services NF 5144 EMP FOR OIL NF Employment forestry, fishing, mining, oil and gas NF 5145 EMP CONS NF Employment construction NF 5146 EMP SALES NF Employment sales (wholesale and retail trade) NF 5147 EMP FIN NF Employment finance, insurance and real estate NF 5148 EMP PEI Employment total PEI 5149 EMP SERV PEI Employment services PEI 5150 EMP FOR OIL PEI Employment forestry, fishing, mining, oil and gas PEI 5151 EMP CONS PEI Employment construction PEI 5152 EMP SALES PEI Employment sales (wholesale and retail trade) PEI 5153 EMP FIN PEI Employment finance, insurance and real estate PEI 5154 EMP NS Employment total NS 5155 EMP SERV NS Employment services NS 5156 EMP FOR OIL NS Employment forestry, fishing, mining, oil and gas NS 5157 EMP CONS NS Employment construction NS 5158 EMP SALES NS Employment sales (wholesale and retail trade) NS 5159 EMP FIN NS Employment finance, insurance and real estate NS 5160 EMP NB Employment total NB 5161 EMP SERV NB Employment services NB 5162 EMP FOR OIL NB Employment forestry, fishing, mining, oil and gas NB 5163 EMP CONS NB Employment construction NB 5164 EMP SALES NB Employment sales (wholesale and retail trade) NB 5165 EMP FIN NB Employment finance, insurance and real estate NB 5166 EMP QC Employment total QC 5167 EMP SERV QC Employment services QC 5168 EMP FOR OIL QC Employment forestry, fishing, mining, oil and gas QC 5169 EMP CONS QC Employment construction QC 5170 EMP SALES QC Employment sales (wholesale and retail trade) QC 5171 EMP FIN QC Employment finance, insurance and real estate QC 5

35

Page 36: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

172 EMP ONT Employment total ONT 5173 EMP SERV ONT Employment services ONT 5174 EMP FOR OIL ONT Employment forestry, fishing, mining, oil and gas ONT 5175 EMP CONS ONT Employment construction ONT 5176 EMP SALES ONT Employment sales (wholesale and retail trade) ONT 5177 EMP FIN ONT Employment finance, insurance and real estate ONT 5178 EMP MAN Employment total MAN 5179 EMP SERV MAN Employment services MAN 5180 EMP FOR OIL MAN Employment forestry, fishing, mining, oil and gas MAN 5181 EMP CONS MAN Employment construction MAN 5182 EMP SALES MAN Employment sales (wholesale and retail trade) MAN 5183 EMP FIN MAN Employment finance, insurance and real estate MAN 5184 EMP SAS Employment total SAS 5185 EMP SERV SAS Employment services SAS 5186 EMP FOR OIL SAS Employment forestry, fishing, mining, oil and gas SAS 5187 EMP CONS SAS Employment construction SAS 5188 EMP SALES SAS Employment sales (wholesale and retail trade) SAS 5189 EMP FIN SAS Employment finance, insurance and real estate SAS 5190 EMP ALB Employment total ALB 5191 EMP SERV ALB Employment services ALB 5192 EMP FOR OIL ALB Employment forestry, fishing, mining, oil and gas ALB 5193 EMP CONS ALB Employment construction ALB 5194 EMP SALES ALB Employment sales (wholesale and retail trade) ALB 5195 EMP FIN ALB Employment finance, insurance and real estate ALB 5196 EMP BC Employment total BC 5197 EMP SERV BC Employment services BC 5198 EMP FOR OIL BC Employment forestry, fishing, mining, oil and gas BC 5199 EMP CONS BC Employment construction BC 5200 EMP SALES BC Employment sales (wholesale and retail trade) BC 5201 EMP FIN BC Employment finance, insurance and real estate BC 5202 UNEMP NF Unemployment rate NF 2203 UNEMP PEI Unemployment rate PEI 2204 UNEMP NS Unemployment rate NS 2205 UNEMP NB Unemployment rate NB 2206 UNEMP QC Unemployment rate QC 2207 UNEMP ONT Unemployment rate ONT 2208 UNEMP MAN Unemployment rate MAN 2209 UNEMP SAS Unemployment rate SAS 2210 UNEMP ALB Unemployment rate ALB 2211 UNEMP BC Unemployment rate BC 2212 UNEMP DURAvg NF Unemployment average duration NF 5*213 UNEMP DURAvg PEI Unemployment average duration PEI 5*214 UNEMP DURAvg NS Unemployment average duration NS 5*215 UNEMP DURAvg NB Unemployment average duration NB 5*216 UNEMP DURAvg QC Unemployment average duration QC 5*217 UNEMP DURAvg ONT Unemployment average duration ONT 5*218 UNEMP DURAvg MAN Unemployment average duration MAN 5*219 UNEMP DURAvg SAS Unemployment average duration SAS 5*220 UNEMP DURAvg ALB Unemployment average duration ALB 5*221 UNEMP DURAvg BC Unemployment average duration BC 5*

MANUFACTURING, SALES AND INVENTORIES222 MANU NF new Manufacturing new orders (total) NF 5223 DUR NF new Manufacturing new orders (durables) NF 5224 MANU PEI new Manufacturing new orders (total) PEI 5225 DUR PEI new Manufacturing new orders (durables) PEI 5226 MANU NS new Manufacturing new orders (total) NS 5227 DUR NS new Manufacturing new orders (durables) NS 5228 MANU NB new Manufacturing new orders (total) NB 5229 DUR NB new Manufacturing new orders (durables) NB 5230 MANU QC new Manufacturing new orders (total) QC 5231 DUR QC new Manufacturing new orders (durables) QC 5232 MANU ONT new Manufacturing new orders (total) ONT 5233 DUR ONT new Manufacturing new orders (durables) ONT 5234 MANU MAN new Manufacturing new orders (total) MAN 5235 DUR MAN new Manufacturing new orders (durables) MAN 5236 MANU SAS new Manufacturing new orders (total) SAS 5237 DUR SAS new Manufacturing new orders (durables) SAS 5238 MANU ALB new Manufacturing new orders (total) ALB 5239 DUR ALB new Manufacturing new orders (durables) ALB 5240 MANU BC new Manufacturing new orders (total) BC 5241 DUR BC new Manufacturing new orders (durables) BC 5

PRICES242 CPI ALL NF CPI (all) NF 5243 CPI MINUS FEN NF CPI (all minus food and energy) NF 5244 CPI ALL PEI CPI (all) PEI 5245 CPI MINUS FEN PEI CPI (all minus food and energy) PEI 5246 CPI ALL NS CPI (all) NS 5247 CPI MINUS FEN NS CPI (all minus food and energy) NS 5248 CPI ALL NB CPI (all) NB 5249 CPI MINUS FEN NB CPI (all minus food and energy) NB 5250 CPI ALL QC CPI (all) QC 5251 CPI MINUS FEN QC CPI (all minus food and energy) QC 5252 CPI ALL ONT CPI (all) ONT 5253 CPI MINUS FEN ONT CPI (all minus food and energy) ONT 5254 CPI ALL MAN CPI (all) MAN 5255 CPI MINUS FEN MAN CPI (all minus food and energy) MAN 5256 CPI ALL SAS CPI (all) SAS 5257 CPI MINUS FEN SAS CPI (all minus food and energy) SAS 5258 CPI ALL ALB CPI (all) ALB 5259 CPI MINUS FEN ALB CPI (all minus food and energy) ALB 5260 CPI ALL BC CPI (all) BC 5261 CPI MINUS FEN BC CPI (all minus food and energy) BC 5262 CPI GOOD NF CPI (goods) NF 5263 CPI SERV NF CPI (services) NF 5

36

Page 37: A Large Canadian Database for Macroeconomic Analysis · A Large Canadian Database for Macroeconomic Analysis Olivier Fortin-Gagnony Maxime Lerouxz Dalibor Stevanovicx St ephane Surprenant{

264 CPI GOOD PEI CPI (goods) PEI 5265 CPI SERV PEI CPI (services) PEI 5266 CPI GOOD NS CPI (goods) NS 5267 CPI SERV NS CPI (services) NS 5268 CPI GOOD NB CPI (goods) NB 5269 CPI SERV NB CPI (services) NB 5270 CPI GOOD QC CPI (goods) QC 5271 CPI SERV QC CPI (services) QC 5272 CPI GOOD ONT CPI (goods) ONT 5273 CPI SERV ONT CPI (services) ONT 5274 CPI GOOD MAN CPI (goods) MAN 5275 CPI SERV MAN CPI (services) MAN 5276 CPI GOOD SAS CPI (goods) SAS 5277 CPI SERV SAS CPI (services) SAS 5278 CPI GOOD ALB CPI (goods) ALB 5279 CPI SERV ALB CPI (services) ALB 5280 CPI GOOD BC CPI (goods) BC 5281 CPI SERV BC CPI (services) BC 5

37