ParameterHandling.jl

WIP package with some experiments in handling parameters for models. This might need to be two packages.
Author invenia
Popularity
8 Stars
Updated Last
1 Year Ago
Started In
August 2020

ParameterHandling

Stable Dev CI Codecov ColPrac: Contributor's Guide on Collaborative Practices for Community Packages

ParameterHandling.jl is an experiment in handling constrained tunable parameters of models.

The Parameter Handling Problem

Consider the following common situation: you have a function build_model that maps a collection of parameters θ to a model of some kind:

model = build_model(θ)

The model might, for example, be a function that maps some input x to some sort of prediction y:

y = model(x)

where x and y could essentially be anything that you like. You might also wish to somehow "learn" or "tune" or "infer" the parameters θ by plugging build_model into some other function, lets call it learn, that tries out various different parameter values in some clever way and determines which ones are good -- think loss minimisation / objective maximisation, (approximate) Bayesian inference, etc. We'll not worry about exactly what procedure learn employs to try out a number of different parameter values, but suppose that learn has the interface:

learned_θ = learn(build_model, initial_θ)

So far so good, but now consider how one actually goes about writing build_model. There are more or less two things that must be written:

  1. θ must be in a format that learn knows how to handle. A popular approach is to require that θ be a Vector of Real numbers -- or, rather, some concrete subtype of Real.
  2. The code required to turn θ into model inside build_model mustn't be too onerous to write, read, or modify.

While the first point is fairly straightforward, the second point is a bit subtle, so it's worth dwelling on it a little.

For the sake of concreteness, let's suppose that we adopt the convention that θ is a Vector{Float64}. In the case of linear regression, we might assume that θ comprises a length D "weight vector" w, and a scalar "bias" b. So the function to build the model might be something like

function build_model::Vector{Float64})
    return x -> dot(θ[1:end-1], x) + θ[end]
end

The easiest way to see that this is a less than ideal solution is to consider what this function would look like if θ was, say, a NamedTuple with fields w and b:

function build_model::NamedTuple)
    return x -> dot.w, x) + θ.b
end

This version of the function is much easier to read -- moreover if you want to inspect the values of w and b at some other point in time, you don't need to know precisely how to chop up the vector.

Moreover it seems probable that the latter approach is less bug-prone -- suppose for some reason one refactored the code so that the first element of θ became b and the last D elements w; any code that depended upon the original ordering will now be incorrect and likely fail silently. The NamedTuple approach simply doesn't have this issue.

Granted, in this simple case it's not too much of a problem, but it's easy to find situations in which things become considerably more difficult. For example, suppose that we instead had pretty much any kind of neural network, Gaussian process, ODE, or really just any model with more than a couple of distinct parameters. From the perspective of writing complicated models, implementing things in terms of a single vector of parameters that is manually chopped up is an extremely bad design choice. It simply doesn't scale.

However, a single vector of e.g. Float64s is extremely convenient when writing general purpose optimisers / approximate inference routines -- Optim.jl and AdvancedHMC.jl being two obvious examples.

The ParameterHandling.jl Approach

ParameterHandling.jl aims to give you the best of both worlds by providing the tools required to automate the transformation between a "structured" representation (e.g. nested NamedTuple / Dict etc) and a "flattened" (e.g. Vector{Float64}) of your model parameters.

The function flatten eats a structured representation of some parameters, returning the flattened representation and a function that converts the flattened thing back into its structured representation.

flatten is implemented recursively, with a very small number of base-implementations that don't themselves call flatten.

You should expect to occassionally have to extend flatten to handle your own types and, if you wind up doing this for a function in Base that this package doesn't yet cover, a PR including that implementation will be very welcome.

See test/parameters.jl for a couple of examples that utilise flatten to do something similar to the task described above.

Dealing with Constrained Parameters

It is very common to need to handle constraints on parameters e.g. it may be necessary for a particular scalar to always be positive. While flatten is great for changing between representations of your parameters, it doesn't really have anything to say about this constraint problem.

For this we introduce a collection of new AbstractParameter types (whether we really need them to have some mutual supertype is unclear at present) that play nicely with flatten and allow one to specify that e.g. a particular scalar must remain positive, or should be fixed across iterations. See src/parameters.jl and test/parameters.jl for more examples.

The approach to implementing these types typically revolves around some kind of Deferred / delayed computation. For example, a Positive parameter is represented by an "unconstrained" number, and a "transform" that maps from the entire real line to the positive half. The value of a Positive is given by the application of this transform to the unconstrained number. flattening a Positive yields a length-1 vector containing the unconstrained number, rather than the value represented by the Positive object. For example

julia> using ParameterHandling: value, Positive

julia> x_unconstrained = log(1.0) # Specify unconstrained value.
0.0

julia> x = Positive(x_unconstrained) # Construct a number that should remain positive.
Positive{Float64,Bijectors.Exp{0}}(0.0, Bijectors.Exp{0}())

julia> value(x) # Get the constrained value by applying the transform.
1.0

julia> v, unflatten = flatten(x); # Supports the `flatten` interface.

julia> v
1-element Array{Float64,1}:
 0.0

julia> new_v = randn(1) # Pick a random new value.
1-element Array{Float64,1}:
 1.1220600582508566

julia> value(unflatten(new_v)) # Obtain constrained value.
3.071174489325673

It is straightforward to implement your own parameters that interoperate with those already written by implementing value and flatten for them. You might want to do this if this package doesn't currently support the functionality that you need.

Gotchas

  1. Integers typically don't take part in the kind of optimisation procedures that this package is designed to handle. Consequently, flatten(::Integer) produces an empty vector.
  2. deferred has some type-stability issues when used in conjunction with abstract types. For example, flatten(deferred(Normal, 5.0, 4.0)) won't infer properly. A simple work around is to write a function normal(args...) = Normal(args...) and work with deferred(normal, 5.0, 4.0) instead.

Used By Packages

No packages found.