A Julia interface to liblzo2.
Install:
using Pkg
Pkg.add("LibLZO")
Compress and decompress bytes:
using LibLZO
const lorem = b"""Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor
in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur
sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est
laborum."""
c_default = compress(lorem) # default: LZO1X_1
@assert length(c_default) < length(lorem)
decompressed = decompress(c_default) # default: LZO1X
@assert decompressed == lorem
Specify the LZO algorithm:
# By Type
c_lzo1y = compress(LZO1Y, lorem)
d_lzo1y = decompress(LZO1Y, c_lzo1y)
@assert d_lzo1y == lorem
# By Symbol
c_lzo1z = compress(:LZO1Z, lorem)
d_lzo1z = decompress(:LZO1Z, c_lzo1z)
@assert d_lzo1z == lorem
# By String
c_lzo1f = compress("LZO1F", lorem)
d_lzo1f = decompress("LZO1F", c_lzo1f)
@assert d_lzo1f == lorem
# With kwargs (works with Type, Symbol, and String versions)
c_lzo1c = compress(LZO1C, lorem; compression_level=9)
d_lzo1c = decompress(LZO1C, c_lzo1c)
@assert d_lzo1c == lorem
# By struct (saves on working memory allocations)
lzo = LZO1X_999()
c_lzo1x = compress(lzo, lorem)
d_lzo1x = decompress(lzo, c_lzo1x)
@assert d_lzo1x == lorem
This package exports the following methods for interfacing with liblzo2:
compress(algo, src::AbstractVector{UInt8})
: compress data insrc
using algorithmalgo
and return the result.unsafe_compress!(algo, dest::AbstractVector{UInt8}, src::AbstractVector{UInt8})
: compress data fromsrc
todest
using algorithmalgo
with no overflow checking and return the number of bytes overwritten at the front ofdest
.decompress(algo, src::AbstractVector{UInt8})
: decompress data insrc
using algorithmalgo
and return the result.decompress!(algo, dest::AbstractVector{UInt8}, src::AbstractVector{UInt8})
: decompress data insrc
todest
in-place using algorithmalgo
and return the number of bytes written, or throw an exception ifdest
is not large enough to hold the decompressed version ofsrc
.unsafe_decompress!(algo, dest::AbstractVector{UInt8}, src::AbstractVector{UInt8})
: decompress data fromscr
todest
using algorithmalgo
with no overflow checking and return the number of bytes overwritten at the front ofdest
.optimize!(algo, src::AbstractVector{UInt8})
: attempt to reduce the size of data insrc
that was compressed using algorithmalgo
, operating onsrc
in-place.unsafe_optimze!(algo, dest::AbstractVector{UInt8}, src::AbstractVector{UInt8})
: attempt to reduce the size of data insrc
that was compressed using algorithmalgo
by decompressing it intodest
with no overflow checking, operating onsrc
in-place.
The package also exports the following LZO algorithm types, representing all the compression, decompression, and optimization functions exported by liblzo2:
- LZO1X family:
LZO1X_1
(also exported asLZO1X
andLZO
)LZO1X_1_11
LZO1X_1_12
LZO1X_1_15
LZO1X_999
- LZO1Y family:
LZO1Y_1
(also exported asLZO1Y
)LZO1Y_999
- LZO1Z family:
LZO1Z_999
(also exported asLZO1Z
)
- LZO1 family:
LZO1
LZO1_99
- LZO1A family:
LZO1A
LZO1A_99
- LZO1B family:
LZO1B
LZO1B_99
- LZO1C family:
LZO1C
LZO1C_99
LZO1C_999
- LZO1F family:
LZO1F_1
(also exported asLZO1F
)LZO1F_999
- LZO2A family:
LZO1A_999
(also exported asLZO2A
)
The following matrix describes which methods are available for which LZO family types:
Family | compress | unsafe_compress! | decompress/decompress! | unsafe_decompress! | optimize! | unsafe_optimize! |
---|---|---|---|---|---|---|
LZO1X | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
LZO1Y | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
LZO1Z | ✅ | ✅ | ✅ | ✅ | ⛔ | ⛔ |
LZO1 | ✅ | ✅ | ⛔ | ✅ | ⛔ | ⛔ |
LZO1A | ✅ | ✅ | ⛔ | ✅ | ⛔ | ⛔ |
LZO1B | ✅ | ✅ | ✅ | ✅ | ⛔ | ⛔ |
LZO1C | ✅ | ✅ | ✅ | ✅ | ⛔ | ⛔ |
LZO1F | ✅ | ✅ | ✅ | ✅ | ⛔ | ⛔ |
LZO2A | ✅ | ✅ | ✅ | ✅ | ⛔ | ⛔ |
The unsafe_*
methods are truly unsafe: if the user-supplied destination vector is not large enough to hold the output, the behavior is undefined and very bad things could happen. Unless you are absolutely sure about what you are doing and are in complete control of the source data, use the compress
, decompress
/decompress!
, and optimize!
methods instead.
This should go without saying, but if you attempt to decompress data with a different algorithm than was used to compress the data, undefined behavior will occur. This will typically result in an exception being thrown by the safe decompress
/decompress!
methods, but not always: for instance, data compressed by the LZO1X family can usually be decompressed using the LZO1Y family without throwing an exception, but the output will be gibberish. There is no good way for the decompress
/decompress!
methods to detect the algorithm used to compress the data, so it is up to the user to keep the compression and decompression algorithms aligned.
That being said, all algorithms in the same algorithm family use the same underlying decompression function. You can, for instance, compress data using LZO1X_1
and decompress it using LZO1X_999
without any issues.
The safe decompression method decompress
works by attempting to decompress the source into a destination vector, catching the overrun exception returned by the liblzo2 library, then increasing the size of the destination and trying again until the decompression succeeds. This means the method may take several times longer and make more allocations than expected to successfully decompress data. If decompression speed or memory efficiency is paramount, use the in-place decompress!
or unsafe_decompress!
methods, but be warned that "unsafe" means UNSAFE (see above).
The optimization feature of the LZO1X and LZO1Y family of algorithms attempts to shuffle around literal copy commands to save on storage and potentially improve decompression speed. Because of the rarity of the special circumstances required to shuffle the literal copy commands around, optimization is expected to reduce the number of bytes necessary to store the compressed data by at most 0.01%, and benchmarks I have performed on modern processors show no difference in decompression speeds between optimized and unoptimized data. The methods are included for completeness' sake, but I do not recommend their use.
Because the compressed data have to be completely decompressed into memory during the optimization process, the safe optimize!
method has to allocate enough memory to guarantee the decompressed data will fit, and that is typically 100 times more memory than necessary. If you insist on optimizing your data, I recommend running unsafe_optimize!
immediately after compress
, using the original source data as the destination for unsafe_optimize!
. This guarantees that there will be just enough memory available to perform the optimization:
lorem_copy = copy(lorem) # just to prove everything works (not necessary in production)
compressed = compress(LZO1X, lorem)
unsafe_optimize!(LZO1X, lorem, compressed) # use the original data as the output location for the decompression
@assert lorem == lorem_copy # still the same