The MCMCChainsStorage.jl package provides options for storing your
MCMCChains.jl chains on disk without using serialization. Serialization is
not suitable for long-term storage; or for sharing your chains to colleagues
with different operating systems, Julia versions, or even without Julia.
MCMCChainsStorage.jl solves these problems.
Currently only storage in HDF5 file formats is supported, but other storage options may be added in the future.
MCMCChainsStorage.jl is in the general Julia registry. That means all you
need to do to install it is to start Julia, activate your desired
environment, enter the package management context (type ]), and issue the
command
pkg> add MCMCChainsStorageThe MCMCChainsStorage package depends on the MCMCChains and the HDF5
packages. If you do not have these packages installed on your system,
installing MCMCChainsStorage will install them automatically.
The packages provides methods for Base.read and Base.write that read an
MCMCChains object from or write it to HDF5 storage:
using HDF5
using MCMCChains
using MCMCChainsStorage
# Construct a chain and write it out...
chain = Chains(randn(500, 2, 4), [:a, :b])
h5open("an_hdf5_file.h5", "w") do f
write(f, chain)
end
# ...and we can get it back
chain = h5open("an_hdf5_file.h5", "r") do f
read(f, Chains)
endReading and writing preserves the sections of the chain, so if you have metadata stored in, for example, the "internals" section, it will be written out and read back properly.
It is also possible to write a chain to a group in a larger HDF5 file:
h5open("another_hdf5_file.h5", "w") do f
g = create_group(f, "a_chain")
write(g, chain)
end
chain = h5open("another_hdf5_file.h5", "r") do f
read(f["a_chain"], Chains)
endThe package provides one additional utility function: if your model returns a named tuple of generated quantities, then you can call
model = ... # Construct a Turing model
trace = Turing.sample(model, ...) # Construct a chain, of shape `(nsamp, nparams, nchain)`
full_trace = append_generated_quantities(trace, Turing.generated_quantities(model, trace))to obtain an MCMCChains object that incorporates both the original samples
and the generated quantities.
The chain is stored with one group for each section (parameters, internals,
etc). Each "name" within the section is stored as a separate HDF5 data set, so
arrays in the chain will be placed in data sets named "x[1]", "x[2]", etc.
Compression is enabled by default; currently there is no way to change this
default, but why would you want to? An advantage of this format is that generic
tools like h5ls will produce a reasonable description of the chain; and it is
straightforward to reconstruct the chain without too much code in any language
that can interface with the HDF5 storage format.