# DifferentiableStateSpaceModels

**Warning:** This package is a proof of concept. While the code will remain working with proper use of a Julia manifest, we can't guarantee ongoing support and maintenance, so you should be prepared to modify the source and help maintain it if using this code for projects.

For a more complete example with the code below and Bayesian estimation, see the rbc estimation notebook.

## Development and Benchmarking

See development.md for contributing code and running benchmarks

# Model Class

The model follows Schmitt-Grohe and Uribe (2004) timing convention. The system takes a nonlinear expectational difference equation including all first-order conditions for decisions and the system evolution equations,

where

In addition, we consider an observation equation - which might be noisy, for

where

Assume that there is a non-stochastic steady state of this problem as

## Perturbation Solution

Define the deviation from the non-stochastic steady state as

The solution finds the first or second order perturbation around that non-stochastic steady state, and yields

where

and with the policy equation,

and finally, substitution in for the observation equation

## First Order Solutions

Perturbation approximates the above model equations, where

and

and

where

This is a linear state-space model and if the priors and shocks are Gaussian, a marginal likelihood can be evaluated with classic methods such as a Kalman Filter. The output of the perturbation can be used manually, or in conjunction with DifferenceEquations.jl.

Second-order solutions are defined similarly. See the estimation notebook for more details.

## Gradients

All of the above use standard solution methods. The primary contribution of this package is that all of these model elements are **differentiable**. Hence, these gradeients can be composed for use in applications such as optimization, gradient-based estimation methods, and with DifferenceEquations.jl which provides differentiable simulations and likelihoods for state-space models. That is, if we think of a perturbation solver mapping

# Examples

## Model Primitives

Models are defined using a Dynare-style DSL using Symbolics.jl. The list of primitives are:

- The list of variables for the controls
$y$ , state$x$ , and deep parameters$p$ . - The set of equations
$H$ as a function of$p, y(t), y(t+1), x(t),$ and$x(t+1)$ . No$t-1$ timing is allowed. - The loading of shocks
$\eta$ as a fixed matrix of constants - The shock covariance Cholesky factor
$\Gamma$ as a function of parameters$p$ - The observation equation
$Q$ as a fixed matrix. - The Cholesky factor of the observation errors,
$\Omega$ as a function of parameters$p$ . At this point only a diagonal matrix is supported. - Either the steady state equations for all of
$y$ and$x$ in closed form as a function of$p$ , or initial conditions for the nonlinear solution to solve for the steady state as functions of$p$

## Defining Models

Install this package with `] add DifferentiableStateSpaceModels`

, then the full code to create the RBC model is

```
∞ = Inf
@variables α, β, ρ, δ, σ, Ω_1
@variables t::Integer, k(..), z(..), c(..), q(..)
x = [k, z] # states
y = [c, q] # controls
p = [α, β, ρ, δ, σ, Ω_1] # parameters
H = [1 / c(t) - (β / c(t + 1)) * (α * exp(z(t + 1)) * k(t + 1)^(α - 1) + (1 - δ)),
c(t) + k(t + 1) - (1 - δ) * k(t) - q(t),
q(t) - exp(z(t)) * k(t)^α,
z(t + 1) - ρ * z(t)] # system of model equations
# analytic solutions for the steady state. Could pass initial values and run solver and use initial values with steady_states_iv
steady_states = [k(∞) ~ (((1 / β) - 1 + δ) / α)^(1 / (α - 1)),
z(∞) ~ 0,
c(∞) ~ (((1 / β) - 1 + δ) / α)^(α / (α - 1)) -
δ * (((1 / β) - 1 + δ) / α)^(1 / (α - 1)),
q(∞) ~ (((1 / β) - 1 + δ) / α)^(α / (α - 1))]
Γ = [σ;;] # matrix for the 1 shock. The [;;] notation just makes it a matrix rather than vector in julia
η = [0; -1;;] # η is n_x * n_ϵ matrix. The [;; ]notation just makes it a matrix rather than vector in julia
# observation matrix. order is "y" then "x" variables, so [c,q,k,z] in this example
Q = [1.0 0 0 0; # select c as first "z" observable
0 0 1.0 0] # select k as second "z" observable
# diagonal cholesky of covariance matrix for observation noise (so these are standard deviations). Non-diagonal observation noise not currently supported
Ω = [Ω_1, Ω_1]
# Generates the files and includes if required. If the model is already created, then just loads
overwrite_model_cache = true
model_rbc = @make_and_include_perturbation_model("rbc_notebook_example", H, (; t, y, x, p, steady_states, Γ, Ω, η, Q, overwrite_model_cache))
```

After generation of the model, they can be included as any other julia files in your code (e.g. `include(joinpath(pkgdir(DifferentiableStateSpaceModels), ".function_cache","my_model.jl"))`

) or moved somewhere more convenient.

Inclusion through the `@make_and_include_perturbation_model`

creates the model automatically; after direct inclusion through a julia file, you can create a model with `m = PerturbationModel(Main.my_model)`

.

## Solving Perturbations

Assuming the above model was created and loaded in one way or another as `m`

```
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
sol = generate_perturbation(model_rbc, p_d, p_f) # Solution to the first-order RBC
sol_2 = generate_perturbation(model_rbc, p_d, p_f, Val(2)); # Solution to the second-order RBC
@show sol.retcode, sol_2.retcode, verify_steady_state(m, p_d, p_f) # the final call checks that the analytically provided steady-state solution is correct
```

The perturbation solution (in the canonical form described in the top section) can be queried from the resulting solution. A few examples for the first order solution are below,

```
@show sol.y, sol.x # steady states y_ss and x_ss These are the values such that y ≡ ŷ + sol.y and x ≡ x̂ + sol.x
@show sol.g_x # the policy
@show sol.A, sol.B # the evolution equation of the state, so that x̂' = A x̂ + B ϵ
@show sol.C, sol.D; # the evolution equation of the state, so that z = C x̂ + ν with variance of ν as D D'.
@show sol.x_ergodic_var; # covariance matrix of the ergodic distribution of x̂, which is mean zero since x̂ ≡ x - x_ss
```

## Functions of Perturbation Solutions (and their Derivatives)

The core feature of this library is to enable gradients of the perturbation solutions with respect to parameters (i.e., anything in the `p_d`

vector). To show this, we will construct a function which uses the resulting law of motion and finds the gradient of the results with respect to this value.

```
function IRF(p_d, ϵ_0; m, p_f, steps)
sol = generate_perturbation(m, p_d, p_f) # First-order perturbation by default, pass Val(2) as additional argument to do 2nd order.
x = sol.B * ϵ_0 # start after applying impulse with the model's shock η and Γ(p)
for _ in 1:steps
# A note: you cannot use mutating expressions here with most AD code. i.e. x .= sol.A * x wouldn't work
# For more elaborate simuluations, you would want to use DifferenceEquations.jl in practice
x = sol.A * x # iterate forward using first-order observation equation
end
return [0, 1]' * sol.C * x # choose the second observable using the model's C observation equation since first-order
end
m = model_rbc # ensure notebook executed above
p_f = (ρ=0.2, δ=0.02, σ=0.01, Ω_1=0.01) # not differentiated
p_d = (α=0.5, β=0.95) # different parameters
steps = 10 # steps ahead to forecast
ϵ_0 = [1.0] # shock size
IRF(p_d, ϵ_0; m, p_f, steps) # Function works on its own, calculating perturbation
```

## Derivatives of the Perturbation Solvers

The perturbation solver fills a cache for values used for calculating derivatives.

For example,

```
using Zygote
function f(params; m, p_f)
p_d = (α=params[1], β=params[2]) # Differentiated parameters
sol = generate_perturbation(m, p_d, p_f) # Default is first-order.
return sum(sol.A) # An ad-hoc example: reducing the law-of-motion matrix into one number
end
# To call it
m = PerturbationModel(Main.my_model)
p_f = (ρ=0.2, δ=0.02, σ=0.01, Ω_1=0.01)
param_val = [0.5, 0.95] # as a vector, but not required
display(f(param_val; m, p_f)) # Function works on its own, calculating perturbation
# Query the solution
@assert f(param_val; m, p_f) ≈ 7.366206154679124
# But you can also get its gradient with Zygote/etc.
display(gradient(params -> f(params; m, p_f), param_val))
# Result check
gradient(params -> f(params; m, p_f), param_val)[1] ≈ [61.41968376547458, 106.44095661062319]
```

However, the real benefit is that this function can itself be differentiated, to find gradients with respect to the deep parameters `p_d`

and the shock `ϵ_0`

```
# Using the Zygote auto-differentiation library already loaded above
p_d = (α=0.5, β=0.95) # different parameters
ϵ_0 = [1.0] # shock size
IRF_grad = gradient((p_d, ϵ_0) -> IRF(p_d, ϵ_0; m, p_f, steps), p_d, ϵ_0) # "closes" over the m, p_f, and steps to create a new function, and differentiates it with respect to other arguments
```

## Simulating Data with DifferenceEquations

## Simulating with DifferenceEquations.jl

The manual iteration of the state-space model from the perturbation solution is possible, but can be verbose and difficult to achieve efficiency for gradients. One benefit of this package is that it creates state-space models in a form consistent with DifferenceEquations.jl which can be easily simulated, visualized, and estimated.

To do this, we will calculate a perturbation solution then simulate it for various `x_0`

drawn from the ergodic solution.

```
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc # ensure notebook executed above
sol = generate_perturbation(m, p_d, p_f) # Solution to the first-order RBC
# Simulate T observations from a random initial condition
T = 20
# draw from ergodic distribution for the initial condition
x_iv = MvNormal(sol.x_ergodic_var)
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))
plot(solve(problem))
```

The `LinearStateSpaceProblem`

type is automatically constructed from the underlying perturbation. However, we can override any of these options, or pass in our own noise rather than simulate it for a particular experiment

```
noise = Matrix([1.0; zeros(T-1)]') # the ϵ shocks are "noise" in DifferenceEquations for SciML compatibility
x_iv = [0.0, 0.0] # can pass in a single value rather than a distribution
problem = LinearStateSpaceProblem(sol, x_iv, (0, T); noise)
plot(solve(problem))
```

To demonstrate the composition of gradients between DifferenceEquations and DifferentiableStateSpaceModels lets adapt this function to simulates an impulse with fixed noise and looks at the final observable

```
function last_observable(p_d, noise, x_iv; m, p_f, T)
sol = generate_perturbation(m, p_d, p_f)
problem = LinearStateSpaceProblem(sol, x_iv, (0, T);noise, observables_noise = nothing) # removing observation noise
return solve(problem).z[end][2] # return 2nd argument of last observable
end
T = 100
noise = Matrix([1.0; zeros(T-1)]') # the ϵ shocks are "noise" in DifferenceEquations for SciML compatibility
x_iv = [0.0, 0.0] # can pass in a single value rather than a distribution
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc # ensure notebook executed above
last_observable(p_d, noise, x_iv; m, p_f, T)
```

And, as before, we can calculate gradients with respect to the underlying `p_d`

parameters, but also with respect to the noise which will demonstrate a key benefit of these methods, as they can let us do a joint likelihood of the latent variables in cases where they cannot be easily marginalized out (e.g., non-Gaussian or nonlinear). Note that the dimensionality of this gradient is over 100.

`gradient((p_d, noise, x_iv) -> last_observable(p_d, noise, x_iv; m, p_f, T), p_d, noise, x_iv)`

Finally, we can use a simple utility functions to investigate an IRF.

```
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
m = model_rbc
sol = generate_perturbation(model_rbc, p_d, p_f)
ϵ0 = [1.0]
T = 10
sim = irf(sol, ϵ0, T)
plot(sim)
```

## Sneak Peak at SciML Compatible Functionality

Finally, there are a variety of features of SciML which are supported. For example, parallel simulations of ensembles and associated summary statistics.

```
# Simulate multiple trajectories with T observations
trajectories = 40
x_iv = MvNormal(sol.x_ergodic_var)
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))
# Solve multiple trajectories and plot an ensemble
ensemble_results = solve(EnsembleProblem(problem), DirectIteration(), EnsembleThreads();
trajectories)
summ = EnsembleSummary(ensemble_results) # see SciML documentation. Calculates median and other quantles automatically.
summ.med # median values for the "x" simulated ensembles
plot(summ, fillalpha= 0.2) # plots by default show the median and quantiles of both variables. Modifying transparency as an example
```

## Calculate sequence of observables

We can use the underlying state-space model to easily simulate states and observables

```
# Simulate T observations
T = 20
p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters
p_d = (α = 0.5, β = 0.95) # Pseudo-true values
sol = generate_perturbation(model_rbc, p_d, p_f) # Solution to the first-order RBC
x_iv = MvNormal(sol.x_ergodic_var) # draw initial conditions from the ergodic distribution
problem = LinearStateSpaceProblem(sol, x_iv, (0, T))
sim = solve(problem, DirectIteration())
ϵ = sim.W # store the underlying noise in the simulation
# Collapse to simulated observables as a matrix - as required by current DifferenceEquations.jl likelihood
# see https://github.com/SciML/DifferenceEquations.jl/issues/55 for direct support of this datastructure
z_rbc = hcat(sim.z...)
```