This package aims to provide a nice interface for compression of multiple arrays of the same size in sequence. These arrays can be up to 4D. The intended application is to store snapshots of a iterative process such as a simulation or optimization process. Since sometimes these processes may require a lot of iterations, having compression might save you some RAM. This package uses the ZFP compression algorithm algorithm.
This code implements an vector like interface to access compressed
arrays at different time indexes, so to understand the code you need
to first read the julia documentation on indexing
interfaces.
Basically, I had to implement a method for the Base.getindex
function which governs if an type
can be indexed like an array or vector. I also wrote a method for the function Base.append!
to
add new arrays to the sequential collection of compressed arrays.
I also use functions like fill
and
map
, so reading the documentation
on these functions might also help.
Here is an simple example of its usage. Imagine these A1 till A3 arrays are snapshots of a iterative process.
using SequentialZfpCompression
using Test
# Lets define a few arrays to compress
A1 = rand(Float32, 100,100,100)
A2 = rand(Float32, 100,100,100)
A3 = rand(Float32, 100,100,100)
# Initializing the compressed array sequence
compSeq = SeqCompressor(Float32, 100, 100, 100)
# Compressing the arrays
append!(compSeq, A1)
append!(compSeq, A2)
append!(compSeq, A3)
# Asserting the decompressed array is the same
@test compSeq[1] == A1
@test compSeq[2] == A2
@test compSeq[3] == A3
# Dumping to a file
save("myarrays.szfp", compSeq)
# Reading it back
compSeq2 = load("myarrays.szfp")
# Asserting the loaded type is the same
@test compSeq[:] == compSeq2[:]
Lossy compression is achieved by specifying additional keyword arguments
for SeqCompressor
, which are tol::Real
, precision::Int
, and rate::Real
.
If none are specified (as in the example above) the compression is lossless
(i.e. reversible). Lossy compression parameters are
tol
defines the maximum absolute error that is tolerated.precision
controls the precision, bounding a weak relative error, see this FAQrate
fixes the bits used per value.
This package has two workflows for compression. It can compress the array into a Vector{UInt8}
and
keep it in memory, or it can slice the array and compress each slice, saving each slice to different
files, one per thread.
To use this out-of-core approach, you have four options:
- Use the
inmemory=false
keyword toSeqCompressor
. This will create the files for you intmpdir()
, - Specify
filepaths::Vector{String}
keyword argument with a list of folders, one for each thread, - Specify
filepaths::String
keyword argument with just one folder that will hold all the files, - Specify
envVarPath::String
keyword argument with the name of a environment variable that holds the path to the folder that will hold all the files. This might be useful if you are using a SLURM cluster, that allows you to access the local node storage via theSLURM_TMPDIR
environment variable.
- Add bound checking
- Add documentation for each method
- Add support for compression rate, tolerance and precision
- Add support for parallel compression
- Add support to compress the array in slices, one for each thread
- Add support to dump the struct to a file and read it back
- Make typing more robust
- Add more save methods for the multifile case