Download - Deep Learning for Macroeconomists
![Page 1: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/1.jpg)
Deep Learning for Macroeconomists
Jesus Fernandez-Villaverde1
June 10, 2022
1University of Pennsylvania
![Page 2: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/2.jpg)
Background
• Presentation based on joint work with different coauthors:
1. Solving High-Dimensional Dynamic Programming Problems using Deep Learning, with Galo Nuno,
Roberto Rafael Maura, George Sorg-Langhans, and Maximilian Vogler.
2. Exploiting Symmetry in High-Dimensional Dynamic Programming, with Mahdi Ebrahimi Kahou, Jesse
Perla, and Arnav Sood.
3. Financial Frictions and the Wealth Distribution, with Galo Nuno and Samuel Hurtado.
4. Programming FPGAs for Economics, with Bhagath Cheela, Andre DeHon, and Alessandro Peri.
5. Structural Estimation of Dynamic Equilibrium Models with Unstructured Data, with Sara Casella and
Stephen Hansen.
• All the papers share a common thread: how to compute and take to the data the aggregate dynamics
of models with heterogeneous agents.
1
![Page 3: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/3.jpg)
Resources
• These slides are available at:
https://www.sas.upenn.edu/~jesusfv/deep-learning_chicago.pdf
• My teaching slides:
https://www.sas.upenn.edu/~jesusfv/teaching.html
• Examples and code at:
1. https://colab.research.google.com/drive/1_4wL6lqA-BsgGWjDZv4J7UtnuZk6TpqW?usp=sharing
2. https://github.com/jesusfv/financial-frictions
2
![Page 4: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/4.jpg)
Deep learning as a tool for solving models
• Solving models in macro (and I.O., international trade, finance, game theory, ...) is, at its core, a
functional approximation problem.
• Given some states x = {x1, x2, ..., xN}, we want to compute a function (or, more generally, an
operator):
y = h(x)
such as a value function, a policy function, a best response function, a pricing kernel, an allocation, a
probability distribution, ...
• We need to find a function that satisfies some optimality/equilibrium conditions.
• Usually, we do not know much about the functional form of h(·) or x is highly dimensional.
• Deep learning provides an extremely powerful framework to work through these problems.
3
![Page 5: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/5.jpg)
Why is deep learning is a great functional approximation tool?
• Theory reasons:
1. Deep learning is a universal nonlinear approximator. For example, it can tackle correspondences (e.g.,
multiplicity of equilibria).
2. Deep learning breaks the “curse of dimensionality” (compositional approximations that require only
“local” computations instead of additive ones).
• Practical reasons ⇒ deep learning involves algorithms that are:
1. Easy to code.
2. Implementable with state-of-the-art libraries.
3. Stable.
4. Scalable through massive parallelization.
5. Can take advantage of dedicated hardware.
4
![Page 6: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/6.jpg)
A basic example
• Take the canonical RBC model:
maxE0
∞∑t=0
βtu (ct , lt)
ct + kt+1 = eztkαt l
1−αt + (1− δ) kt , ∀ t > 0
zt = ρzt−1 + σεt , εt ∼ N (0, 1)
• Examples of objects we are interested in approximating:
1. Decision rules: ct = h(kt , zt).
2. Conditional expectations: h(kt , zt) = Et
{u′(ct+1,lt+1)
u′(ct ,lt )
(1 + αezt+1kα−1
t+1 l1−αt+1 − δ
)}.
3. Value functions: h (kt , zt) = max{ct ,lt} {u (ct , lt) + βEth (kt+1, zt+1)}.
5
![Page 7: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/7.jpg)
A parameterized solution
• General idea: substitute h (x) by hj (x, θ) where θ is a vector of coefficients to be determined by
satisfying some criterium indexed by j .
• Two classical approaches based on the addition of functions:
1. Perturbation methods:
hPE (x, θ) = θ0 + θ1(x− x0) + (x− x0)′θ2(x− x0) + H.O.T .
We use implicit-function theorems to find θ.
2. Projection methods:
hPR (x, θ) = θ0 +M∑
m=1
θmϕm (x)
where ϕm is, for example, a Chebyshev polynomial.
We pick a basis {ϕm (x)}∞i=0 and “project” the optimality/equilibrium conditions against that basis to
find θ.
6
![Page 8: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/8.jpg)
A neural network
• A (one-layer) neural network approximates h(x) by using M times an activation function φ(·):
y = h(x) ∼= hNN (x; θ) = θ0 +M∑
m=1
θmφ
θ0,m +N∑
n=1
θn,mxn︸ ︷︷ ︸zm
• M is the width of the network.
• We can add more layers (i.e., xn is transformed into x1n by a similar composition of an activation
function, and so on), but notation becomes heavy.
• The number of layers J is the depth of the network.
• We select θ such that hNN (x; θ) is as close to h(x) as possible given some relevant metric (e.g., L2).
• This is called “training” the network (a lot of details need to be filled in here!).
7
![Page 9: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/9.jpg)
8
![Page 10: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/10.jpg)
Architecture of the network
• We pick the simplest activation function possible:
• Easier to take derivatives (key while training the network).
• We select M and J following the same ideas that one uses to select how many grid points we use in
value function iteration or how many polynomials with Chebyshev polynomials:
1. We start with some default M and J and train the network. We assess approximation error.
2. We vary M and J until the approximation error is minimized.
9
![Page 11: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/11.jpg)
-4 -3 -2 -1 0 1 2 3 40
0.5
1
1.5
2
2.5
3
3.5
4
10
![Page 12: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/12.jpg)
Two classic (yet remarkable) results
Universal approximation theorem: Hornik, Stinchcombe, and White (1989)
A neural network with at least one hidden layer can approximate any Borel measurable function mapping
finite-dimensional spaces to any desired degree of accuracy.
• Under some additional technical conditions:
Breaking the curse of dimensionality: Barron (1993)
A one-layer neural network achieves integrated square errors of order O(1/M), where M is the number
of nodes. In comparison, for series approximations, the integrated square error is of order O(1/(M2/N))
where N is the dimensions of the function to be approximated.
• We can rely on more general theorems by Leshno et al. (1993) and Bach (2017).
11
![Page 13: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/13.jpg)
-4 -2 0 2 40
1
2
3
4
-4 -2 0 2 40
1
2
3
4
5
-4 -2 0 2 40
1
2
3
-4 -2 0 2 40
2
4
6
-4 -2 0 2 40
2
4
6
-4 -2 0 2 40
1
2
3
4
-4 -2 0 2 4-4
-3
-2
-1
0
-4 -2 0 2 4-4
-3
-2
-1
0
-4 -2 0 2 4-6
-4
-2
0
12
![Page 14: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/14.jpg)
-2 -1.5 -1 -0.5 0 0.5 1 1.5-3
-2
-1
0
1
2
3
4
13
![Page 15: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/15.jpg)
-2 -1 0 1-3
-2
-1
0
1
2
3
4
-2 -1 0 1-4
-3
-2
-1
0
1
2
3
4
-2 -1 0 1-4
-3
-2
-1
0
1
2
3
4
-2 -1 0 1-4
-3
-2
-1
0
1
2
3
4
-2 -1 0 1-4
-3
-2
-1
0
1
2
3
4
-2 -1 0 1-4
-3
-2
-1
0
1
2
3
4
14
![Page 16: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/16.jpg)
Practical reasons
• Deep learning involves algorithms that are:
1. Easy to code.
2. Implementable with state-of-the-art libraries.
3. Stable.
4. Scalable through massive parallelization.
5. Can take advantage of dedicated hardware.
15
![Page 17: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/17.jpg)
16
![Page 18: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/18.jpg)
Programming field-programmable gate arrays for economics
17
![Page 19: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/19.jpg)
Dynamic programming
![Page 20: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/20.jpg)
Solving high-dimensional dynamic programming problems using Deep Learning
• Solving High-Dimensional Dynamic Programming Problems using Deep Learning.
• Our goal is to solve the recursive continuous-time Hamilton-Jacobi-Bellman (HJB) equation globally:
ρV (x) = maxα
r(x,α) +∇xV (x)f (x,α) +1
2tr(σ(x))T∆xV (x)σ(x))
s.t. G (x,α) ≤ 0 and H(x,α) = 0,
• Think about the case where we have many state variables.
• Why continuous time?
• Alternatives for this solution?
18
![Page 21: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/21.jpg)
Neural networks
• We define four neural networks:
1. V (x;ΘV ) to approximate the value function V (x).
2. α(x;Θα) to approximate the policy function α.
3. µ(x;Θµ) and λ(x;Θλ) to approximate the Karush-Kuhn-Tucker (KKT) multipliers µ and λ.
• To simplify notation, we accumulate all weights in the matrix Θ = (ΘV ,Θα,Θµ,Θλ).
19
![Page 22: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/22.jpg)
Error criterion I
• The HJB error:
errHJB(x;Θ) ≡ r(x, α(s;Θα)) +∇x V (x;ΘV )f (x, α(x;Θα))+
+1
2tr [σ(x)T∆x V (x;ΘV )σ(x)]− ρV (x;ΘV )
• The policy function error:
errα(x;Θ) ≡∂r(x, α(x;Θα)
∂α+ Dαf (x, α(x;Θ
α))T∇x V (x;ΘV )
− DαG (x, α(x;Θα))T µ(x;Θµ)− DαH(x, α(x;Θα))λ(x;Θλ),
where DαG ∈ RL1×M , DαH ∈ RL2×M , and Dαf ∈ RN×M are the submatrices of the Jacobian
matrices of G , H and f respectively containing the derivatives with respect to α.
20
![Page 23: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/23.jpg)
Error criterion II
• The constraint error is itself composed of the primal feasibility errors:
errPF1(x;Θ) ≡ max{0,G (x, α(x;Θα))}errPF2(x;Θ) ≡ H(x, α(x;Θα)),
the dual feasibility error:
errDF (x;Θ) = max{0,−µ(x;Θµ},
and the complementary slackness error:
errCS(x;Θ) = µ(x;Θ)TG (x, α(x;Θα)).
• We combine these four errors by using the squared error as our loss criterion:
E(x;Θ) ≡∣∣∣∣errHJB(x;Θ)
∣∣∣∣22+∣∣∣∣errα(x;Θ)
∣∣∣∣22+∣∣∣∣errPF1(x;Θ)
∣∣∣∣22+
+∣∣∣∣errPF2(x;Θ)
∣∣∣∣22+∣∣∣∣errDF (x;Θ)
∣∣∣∣22+∣∣∣∣errCS(x;Θ)
∣∣∣∣22
21
![Page 24: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/24.jpg)
Training
• We train our neural networks by minimizing the error criterion through mini-batch gradient descent
over points drawn from the ergodic distribution of the state vector.
• We start by initializing our network weights, and we perform K learning steps called epochs.
• For each epoch, we draw I points from the state space by simulating from the ergodic distribution.
• Then, we randomly split this sample into B mini-batches of size S . For each mini-batch, we define
the mini-batch error, by averaging the loss function over the batch.
• Finally, we perform mini-batch gradient descent for all network weights, with ηk being the learning
rate in the k-th epoch.
22
![Page 25: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/25.jpg)
(a) Value with closed-form policy23
![Page 26: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/26.jpg)
(c) Consumption with closed-form policy24
![Page 27: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/27.jpg)
(e) HJB error with closed-form policy25
![Page 28: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/28.jpg)
A coda
• Exploiting Symmetry in High-Dimensional Dynamic Programming.
• We introduce the idea of permutation-invariant dynamic programming.
• Intuition.
• Solution has a symmetry structure we can easily exploit using representation theorems, concentration
of measure, and neural networks.
• More in general: how do we tackle models with heterogeneous agents?
26
![Page 29: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/29.jpg)
Models with heterogeneous
agents
![Page 30: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/30.jpg)
The challenge
• To compute and take to the data models with heterogeneous agents, we need to deal with:
1. The distribution of agents Gt .
2. The operator H(·) that characterizes how Gt evolves:
Gt+1 = H(Gt , St)
or∂Gt
∂t= H(Gt , St)
given the other aggregate states of the economy St .
• How do we track Gt and compute H(Gt ,St)?
27
![Page 31: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/31.jpg)
A common approach
• If we are dealing with N discrete types, we keep track of N − 1 weights.
• If we are dealing with continuous types, we extract a finite number of features from Gt :
1. Moments.
2. Q-quantiles.
3. Weights in a mixture of normals...
• We stack either the weights or features of the distribution in a vector µt .
• We assume µt follows the operator h(µt ,St) instead of H(Gt ,St).
• We parametrize h(µt ,St) as hj(µt ,St ; θ).
• We determine the unknown coefficients θ such that an economy where µt follows hj(µt ,St ; θ)
replicates as well as possible the behavior an economy where Gt follows H(·).
28
![Page 32: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/32.jpg)
Example: Basic Krusell-Smith model
• Two aggregate variables: aggregate productivity shock and household distribution Gt(a, z) where:∫Gt(a, z)da = Kt
• We summarize Gt(a, ·) with the log of its mean: µt = logKt (extending to higher moments is simple,
but tedious).
• We parametrize logKt+1︸ ︷︷ ︸µt+1
= θ0(st) + θ1(st) logKt︸ ︷︷ ︸hj (µt ,st ;θ)
.
• We determine {θ0(st), θ1(st)} by OLS run on a simulation.
29
![Page 33: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/33.jpg)
Problems
• No much guidance regarding feature and parameterization selection in general cases.
• Yes, keeping track of the log of the mean and a linear functional form work well for the basic model.
But, what about an arbitrary model?
• Method suffers from “curse of dimensionality”: difficult to implement with many state variables or high
N/higher moments.
• Lack of theoretical foundations (Does it converge? Under which metric?).
30
![Page 34: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/34.jpg)
How can deep learning help?
• Deep learning addresses challenges:
1. How to extract features from an infinite-dimensional object efficiently.
2. How to parametrize the non-linear operator mapping how distributions evolve.
3. How to tackle the “curse of dimensionality.”
• Given time limitations, today I will discuss the last two points.
• In our notation of y = h(x):
1. y = µt+1.
2. x = (µt , St).
31
![Page 35: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/35.jpg)
The algorithm in “Financial Frictions and the Wealth Distribution”
• Model with a distribution of households over asset holding and labor productivity.
• We keep track, as in Krusell-Smith, of moments of distribution.
• Algorithm:
1) Start with h0, an initial guess for h.
2) Using current guess hn, solve for agent’s problem.
3) Construct time series x = {x1, x2, ..., xJ} and y = {y1, y2, ..., yJ} for aggregate variables by simulating J
periods the cross-sectional distribution of agents (starting at a steady state and with a burn-in).
4) Use (y, x) to train hn+1, a new guess for h.
5) Iterate steps 2)-4) until hn+1 is sufficiently close to hn.
32
![Page 36: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/36.jpg)
1 1.5 2 2.5 3 3.5-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.5 1 1.5 2 2.5 3-0.15
-0.1
-0.05
0
0.05
0.1
-0.15
-0.1
-0.05
0
1
0.05
0.1
1.53
2 2.52
2.5 1.5
33
![Page 37: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/37.jpg)
1 1.5 2 2.5 3 3.5-0.15
-0.1
-0.05
0
0.05
0.1
0.5 1 1.5 2 2.5 3-0.1
-0.05
0
0.05
0.1
-0.15
-0.1
-0.05
0
1
0.05
0.1
1.53
2 2.52
2.5 1.534
![Page 38: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/38.jpg)
1.5-0.1
-0.08
2
-0.06
-0.04
-0.02
0
0.02
2.6
0.04
0.06
0.08
2.4
0.1
2.2 2.52 1.8 1.6 1.4 31.2 1 0.8
35
![Page 39: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/39.jpg)
-5 -4 -3 -2 -1 0 1 2 3 4 5
10-3
0
500
1000
1500
36
![Page 40: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/40.jpg)
Structural estimation with
unstructured data
![Page 41: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/41.jpg)
New data
• Unstructured data: Newspaper articles, business reports, congressional speeches, FOMC meetings
transcripts, satellite data, ...
37
![Page 42: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/42.jpg)
Motivation
• Unstructured data carries information on:
1. Current state of the economy (Thorsrud, 2017, Bybee et al., 2019).
2. Beliefs about current and future states of the economy.
• An example:
From the minutes of the FOMC meeting of September 17-18, 2019
Participants agreed that consumer spending was increasing at a strong pace. They also expected that, in
the period ahead, household spending would likely remain on a firm footing, supported by strong labor
market conditions, rising incomes, and accommodative financial conditions. [...]
Participants judged that trade uncertainty and global developments would continue to affect firms’
investment spending, and that this uncertainty was discouraging them from investing in their businesses.
[...]
38
![Page 43: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/43.jpg)
Our goal
• Since:
1. This information might go over and above observable macro series (e.g., agents’ expectations and
sentiment).
2. And it might go further back in history, is available for developing countries, or in real time.
• How do we incorporate unstructured data in the estimation of structural models?
• Potential rewards:
1. Determine more accurately the latent structural states.
2. Reconcile agents’ behavior and macro time series.
3. Could change parameters values (medium-scale DSGE models typically poorly identified).
39
![Page 44: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/44.jpg)
Application and data
• Our application:
Text Data: Federal Open Market Committee (FOMC) meeting transcripts.
Model: New Keynesian dynamic stochastic general equilibrium (NK-DSGE) model.
• Our strategy:
Right Now: 1. Latent Dirichlet Allocation (LDA) for dimensionality reduction ⇒ from words to topic
shares.
2. Cast the linearized DSGE solution in a state-space form.
3. Use LDA output as additional observables in the measurement equation.
4. Estimation with Bayesian techniques.
Going Forward: Model the data generating process for text and macroeconomic data jointly.
40
![Page 45: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/45.jpg)
Preliminary findings
1. Using FOMC data for estimation sharpens the likelihood.
2. Posterior distributions more concentrated.
3. Especially true for parameters related to the hidden states of the economy and to fiscal policy.
4. FOMC data carries extra information about fiscal policy and government intentions
41
![Page 46: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/46.jpg)
Latent Dirichlet Allocation (LDA) LDA Details
• How does it work?
• LDA is a Bayesian statistical model.
• Idea: (i) each document is described by a distribution of K (latent) topics and (ii) each topic is
described by a distribution of words.
• Use word co-occurrence patterns + priors to assign probabilities.
• Key of LDA dimensionality reduction topic shares φt ⇒ amount of time document spends talking about
each topic k.
• Why do we like it?
• Tracks well attention people devote to different topics.
• Automated and easily scalable.
• Bayesian model natural to combine with structural models.
42
![Page 47: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/47.jpg)
DSGE state space representation
Log-linearized DSGE model solution has the form of a generic state-space model:
• Transition equation:
st+1︸︷︷︸Structural States
= Φ1(θ)st +Φϵ(θ)ϵt , ϵt ∼ N(0, I )
• Measurement equation:
Yt︸︷︷︸Macroeconomic Observables
= Ψ0(θ) + Ψ1(θ)st
• θ vector that stacks all the structural parameters.
System Matrices
43
![Page 48: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/48.jpg)
Topic dynamic factor model
• Allow the topic time series φt to depend on the model states:
φt︸︷︷︸Topic Shares
= T0 + T1 st︸︷︷︸Structural States
+Σ ut︸︷︷︸Measurement Error
, ut ∼ N(0, I )
• Interpretable as a dynamic factor model in which the structure of the DSGE model is imposed on
the latent factors.
• Akin to Boivin and Giannoni (2006) and Kryshko (2011).
44
![Page 49: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/49.jpg)
Augmented measurement equation
• Augmented measurement equation(Yt
φt
)︸ ︷︷ ︸
New vector of observables
=
(Ψ0(θ)
T0
)+
(Ψ1(θ)
T1
)st +
(04×4 04×k
0k×4 Σ
)ut , ut ∼ N(0, I )
• If text data carries relevant information, its use should make the estimation more efficient.
• General approach: any numerical machine learning output and structural model (DSGE, IO, labor...)
will work.
45
![Page 50: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/50.jpg)
FOMCs transcripts data
• Available for download from the Federal Reserve website.
• Provide a nearly complete account of every FOMC meeting from the mid-1970s onward.
• Transcripts are divided into two parts:
• FOMC1: members talk about their reading of the current economic situation.
• FOMC2: talk about monetary policy strategy.
• We are interested in the information on the current state of the economy and the beliefs that
policymakers have on it ⇒ focus on FOMC1 section from 1987 to 2009.
• Total of 180 meetings.
46
![Page 51: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/51.jpg)
Topic composition
• Example of two topics with K = 20.
(a) Topic 8 (b) Topic 1447
![Page 52: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/52.jpg)
Topic shares
1990 1995 2000 2005
Time
-4
-3
-2
-1
0
1
2
Norm
aliz
ed topic
share
Topic 14 "Inflation"
Topic 18 "Fiscal Policy"
48
![Page 53: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/53.jpg)
New Keynesian DSGE model
Conventional New Keynesian DSGE model with price rigidities:
Agents: • Representative household.
• Perfectly competitive final good producer.
• Continuum of intermediate good producers with market power.
• Fiscal authority that taxes and spends.
• Monetary authority that sets the nominal interest rate.
States: • 4 Exogenous states.
• Demand, productivity, government expenditure, monetary policy.
• Modelled as AR(1)s
Parameters: • 12 Structural parameters to estimate.
Equilibrium Equations Exogenous Processes Structural Parameters Recap
49
![Page 54: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/54.jpg)
Bayesian estimation
• We estimate two models for comparison:
• New Keynesian DSGE model alone (standard).
• New Keynesian DSGE Model + measurement equation with topic shares (new).
• Total parameters to estimate: 12 structural parameters (θ) + 120 topic dynamic factor model
parameters (T0, T1, Σ assuming covariances are 0).
• Pick standard priors on structural parameters.
• Priors on topic related parameters?
• Use MLE to get an idea of where they are.
• Use conservative approach: center all parameters quite tightly around 0.
• Random-walk Metropolis Hastings to obtain draws from the posterior.
Priors
50
![Page 55: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/55.jpg)
Likelihood comparison
0.9 0.92 0.94 0.96 0.98
110
120
130
15
20
25
30
0 10 20 30 40 50
124
126
128
130
132
28
29
30
31
0 0.1 0.2 0.3 0.4 0.5
120
125
130
26
28
30
P
1 1.5 2 2.5
115
120
125
130
20
25
30
0.3 0.32 0.34 0.36 0.38 0.4
124
126
128
130
26
28
30
g
0.8 0.85 0.9 0.95 1
115
120
125
130
20
25
30D
0.8 0.85 0.9 0.95 1
115
120
125
130
20
25
30A
0.8 0.85 0.9 0.95 1
124
126
128
130
28
29
30
G
0 0.05 0.1 0.15 0.2
80
100
120
-20
0
20
D
0 0.02 0.04 0.06 0.08
100
110
120
130
10
20
30A
0 0.01 0.02 0.03 0.04 0.05
110
120
130
15
20
25
30G
0 0.02 0.04 0.06 0.08 0.1
110
120
130
10
20
30M
No Topics (left) With Topics (right) 51
![Page 56: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/56.jpg)
Posterior distributions for structural parameters
0.92 0.94 0.96 0.980
100
200
0.8 1 1.2 1.40
2
4
0.1 0.2 0.3 0.4 0.5 0.60
5
10
P
1.5 2 2.50
2
4
0.25 0.3 0.35 0.4 0.450
20
40
g
0.4 0.5 0.6 0.7 0.8 0.90
20
40
D
0.85 0.9 0.950
100
200
A
0.4 0.6 0.80
20
40
G
0.05 0.1 0.15 0.2 0.250
20
40
60
D
0.01 0.02 0.03 0.040
200
400
A
0.01 0.02 0.030
200
400
600
800
G
0.01 0.02 0.03 0.04 0.050
200
400
M
No Topics With Topics 52
![Page 57: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/57.jpg)
Posterior distributions for selected topic parameters
-1 0 10
0.5
1
1.5
De
ma
nd
Sh
ock
International Trade
-1 0 10
0.5
1
1.5
TF
P
-1 0 10
0.5
1
Go
vt
Exp
en
ditu
re
-1 0 10
0.5
1
Mo
ne
tary
Sh
ock
-1 0 10
0.5
1
1.5
Consumer Credit
-1 0 10
0.5
1
-1 0 10
0.5
1
-1 0 10
0.5
1
-1 0 10
0.5
1
1.5
Inflation
-1 0 10
0.5
1
-1 0 10
0.5
1
-1 0 10
0.5
1
-1 0 10
0.5
1
1.5
Fiscal Policy
-1 0 10
0.5
1
1.5
-1 0 10
0.5
1
-1 0 10
0.5
1
Prior Prior Mean Posterior Posterior Mean 53
![Page 58: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/58.jpg)
Going forward: Joint model
• Extend the model in two ways:
1. Model the dependence of latent topics on the hidden structural states directly.
2. Allow for autocorrelation among topic shares.
• Instead of first creating φt and then using it for estimation, want to model the generating process of
both text and macroeconomic observables together.
• Why a joint model?
• Conjecture the topic composition and topic share will be more precise and more interpretable as a result.
• Properly take into account the uncertainty around the topic shares.
54
![Page 59: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/59.jpg)
Going forward: Joint model
• New vector of augmented states:
st =
(st
φt−1
)
• New vector of observables:
Yt =
(Yt
wt
)
• Stacks both the “traditional” observables Yt and the text wt .
55
![Page 60: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/60.jpg)
Joint state space representation
• Transition equation (linear):
st+1 = Φ(θ)st +Φϵ(θ)ϵ ← same as before
φt = Tsst + Tφφt−1 +Σeet ← new
• Measurement equation (non-linear):
Yt ∼ p(Yt |st
)=
(Ψ1(θ)stp(wt |st)
)
• Challenges: algorithm for estimation, impact of choice of priors.
Details on p(wt |st )
56
![Page 61: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/61.jpg)
LDA details Back
LDA assumes the following generative process for each document W of length N:
1. Choose topic proportions φ ∼ Dir(ϑ). Dimensionality K of the Dirichlet distribution. Thus,
dimensionality of the topic variable is assumed to be known and fixed.
2. For each word n = 1, ...,N:
2.1 Choose one of K topics zn ∼ Multinomial (φ).
2.2 Choose a term wn from p (wn|zn, β), a multinomial probability conditioned on the topic zn. β is a K ×V
matrix where βi,j = p(w j = 1|z i = 1
). We assign a symmetric Dirichlet prior with V dimensions and
hyperparameter η to each βk .
Posterior:
p(φ, z , β|W , ϑ, η) =p(φ, z ,W , β|ϑ, η)
p(W |ϑ, η)
![Page 62: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/62.jpg)
Model equilibrium equations (log-linearized) Back
ct − dt = Et{ct+1 − dt+1 − Rt + Πt+1}
Πt = κ((1 + ϕc) ct + ϕggt − (1 + ϕ) At
)+ βEtΠt+1
Rt = γΠt +mt
lt = yt − At = cct + ggt − At
![Page 63: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/63.jpg)
Exogenous States Processes Back
log dt = ρd log dt−1 + σdεd,t
logAt = ρA logAt−1 + σAεA,t
log gt = ρg log gt−1 + σgεg ,t
mt = σmεmt
![Page 64: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/64.jpg)
Parameters Recap Back
Param Description
Steady-state-related parameters
β Discount factor
g SS govt expenditure/GDP
Endogenous propagation parameters
ϕ Inverse Frisch elasticity
ζP Fraction of fixed prices
γ Taylor rule elasticity
Param Description
Exogenous shocks parameters
ρD Persistence demand shock
ρA Persistence TFP
ρG Persistence govt expenditure
σD s.d. demand shock innovation
σA s.d. TFP shock
σG s.d. govt expenditure shock
σG s.d. monetary shock
![Page 65: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/65.jpg)
State space matrices Back
Law of motion for the states of the economy is:dt+1
At+1
gt+1
mt+1
︸ ︷︷ ︸
st+1
=
ρd 0 0 0
0 ρA 0 0
0 0 ρg 0
0 0 0 0
︸ ︷︷ ︸
Φ1(θ)
dtAt
gtmt
︸ ︷︷ ︸
st
+
σd 0 0 0
0 σA 0 0
0 0 σg 0
0 0 0 σm
︸ ︷︷ ︸
Φϵ(θ)
εd,t+1
εA,t+1
εg ,t+1
εm,t+1
︸ ︷︷ ︸
ϵt
and for observables:log ctlog Πt
logRt
log lt
︸ ︷︷ ︸
Yt
=
log (1− g)
0
− log β
0
︸ ︷︷ ︸
Ψ0(θ)
+
a1 a2 a3 a4b1 b2 b3 b4γb1 γb2 γb3 1 + b4ca1 ca2 − 1 1 + ca3 ca4
︸ ︷︷ ︸
Ψ1(θ)
dtAt
gtmt
︸ ︷︷ ︸
st
![Page 66: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/66.jpg)
Prior distribution I Back
Param Description Domain Density Param 1 Param 2
Steady-state-related parameters
β Discount factor (0, 1) Beta 0.95 0.05
g SS govt expenditure/GDP (0, 1) Beta 0.35 0.05
Endogenous propagation parameters
ϕ Inverse Frisch elasticity R+ Gamma 1 0.1
ζP Fraction of fixed prices (0, 1) Beta 0.4 0.1
γ Taylor rule elasticity R+ Gamma 1.5 0.25
![Page 67: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/67.jpg)
Prior distribution II Back
Exogenous shocks parameters
ρD Persistence demand shock (0, 1) Uniform 0 1
ρA Persistence TFP (0, 1) Uniform 0 1
ρG Persistence govt expenditure (0, 1) Uniform 0 1
σD s.d. demand shock innovation R+ InvGamma 0.05 0.2
σA s.d. TFP shock R+ InvGamma 0.05 0.2
σG s.d. govt expenditure shock R+ InvGamma 0.05 0.2
σG s.d. monetary shock R+ InvGamma 0.05 0.2
Topic parameters
T0,k Topic baseline R Normal 0 0.1
T1,k,s Topic elasticity to states R Normal 0 0.4
σk s.d. measurement error R+ InvGamma 0.2 0.1
![Page 68: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/68.jpg)
Topics ranked by pro-cyclicality
![Page 69: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/69.jpg)
Details on p(wt |st) Back
We assume the following generative process for a document:
1. Draw φt |φt−1, st ,Tφ,Ts ,Σe ∼ N(φ0 + Tφφt−1 + Tsst ,Σ
e).
2. To map this representation of topic shares into the simplex, use the softmax function:
f (φt) = expφt,i/∑
j expφt,j .
3. For each word n = 1, ...,N:
3.1 Draw topic zn,t ∼ Mult(f (φt)).
3.2 Draw term wn,t ∼ Mult(βt,z).
Then, the distribution of text conditional on the states, p(wt |st), is given by:
p(wt |st) =∫
p(φt |st)
N∏n=1
∑zn,t
p(zn,t |φt)p(wn,t |zn,t , βt)
dφt
![Page 70: Deep Learning for Macroeconomists](https://reader036.vdocuments.us/reader036/viewer/2022081623/62d0c7b3deabce075f75d2fa/html5/thumbnails/70.jpg)
Conclusions
• Deep learning has tremendous potential for macroeconomics.
• Great theoretical and practical reasons for it.
• Many potential applications.