GenerativeModels.jl

This library contains a collection of generative models. It uses trainable ConditionalDists.jl that can be used in conjuction with Flux.jl models. Probability measures such as KL divergence are defined in IPMeasures.jl This package aims to make experimenting with new models as easy as possible.

As an example, check out how to build a conventional variational autoencoder (VAE) that reconstructs MNIST below.

Reconstructing MNIST

First we load the MNIST training dataset

using MLDatasets, Flux
train_x, _ = MNIST.traindata(Float32)
flat_x = reshape(train_x, :, size(train_x,3)) |> gpu
data = Flux.Data.DataLoader(flat_x, batchsize=200, shuffle=true);

and define some parameters for a VAE with an input length xlength and latent vector of zlength.

using ConditionalDists

xlength = size(flat_x, 1)
zlength = 2
hdim    = 512
hd2     = Int(hdim/2)

We define an encoder with diagonal variance on the latent dimension, which is just a Flux model wrapped in a ConditionalMvNormal. The Flux model must return a tuple with the appropriate number of parameters - in case of a MvNormal two: mean and variance. Hence, the SplitLayer returns two vectors of zlength, one of which (the variance) is constrained to be positive.

using ConditionalDists: SplitLayer

# mapping that will be trained to output mean and variance
enc_map = Chain(Dense(xlength, hdim, relu),
                Dense(hdim, hd2, relu),
                SplitLayer(hd2, [zlength,zlength], [identity,softplus]))
# conditional encoder (can be called e.g. like `rand(encoder,x)`, see ConditionalDists.jl)
encoder = ConditionalMvNormal(enc_map)

The decoder will return a Multivariate Normal with scalar variance:

dec_map = Chain(Dense(zlength, hd2, relu),
                Dense(hd2, hdim, relu),
                SplitLayer(hdim, [xlength,1], σ))
decoder = ConditionalMvNormal(dec_map)

Now we can create the VAE model and train it to maximize the ELBO.

using GenerativeModels

model = VAE(zlength, encoder, decoder) |> gpu
loss(x) = -elbo(model,x)

ps = Flux.params(model)
opt = ADAM()

for e in 1:50
    @info "Epoch $e" loss(flat_x)
    Flux.train!(loss, ps, data, opt)
end

Some test reconstructions and the corresponding latent space are shown below:

model = model |> cpu
test_x, test_y = MNIST.testdata(Float32)
p1 = plot_reconstructions(model, test_x[:,:,1:6])