The BSDiff package is a pure Julia implementation of the bsdiff tool for computing and applying binary diffs of files. It supports reading and writing both Colin Percival's classic bsdiff format and Matthew Endsley's modified format. The package offers the same API as the command-line tools:
bsdiff(old, new, [ patch ])
bspatch(old, [ new, ] patch)
The bsdiff
command computes a patch file given old
and new
files. By
default it generates patch files in the classic bsdiff
format. This format
emits control data, diff data and new data in three separately compressed
sections and is typically more compact than the Endsley format. The Endsley
format interleaves control, diff and new data in a single compressed section,
which means that it can be can be written and applied in a fully streamed
fashion, but the patch files tend to be slightly larger. The format can be
selected by passing the format = :classic
or format = :endsley
option.
The bspatch
command applies a patch file to an old
file to produce a new
file. It can auto-detect the patch file format from the magic string in the
patch header, so it is generally not necessary to specifiy the format. If you
only want to apply a specific format of patch, you can pass the same format
option and bspatch
will error unless the patch has the expected format.
The public API for the BSDiff
package consists of the following functions:
bsdiff(old, new, [ patch ]; format = [ :classic | :endsley ]) -> patch
Compute a binary patch that will transform the content of old
into the content
of new
. All arguments can be strings or IO handles. If no patch
argument is
provided, the patch data is written to a temporary file whose path is returned.
The old
argument can also be a 2-tuple of strings and/or IO handles, in which
case the first is used as the old data and the second is used as a precomputed
index of the old data, as computed by bsindex
. Since indexing the old
data is the slowest part of generating a diff, precomputing this and reusing it
can significantly speed up generting diffs from the same old file to multiple
different new files.
The format
keyword argument allows selecting a patch format to generate. The
value must be one of the symbols :classic
or :endsley
indicating a bsdiff
patch format. The classic patch format is generated by default, but the Endsley
format can be selected with bsdiff(old, new, patch, format = :endsley)
.
bspatch(old, [ new, ] patch; format = [ :classic | :endsley ]) -> new
Apply a binary patch given by the patch
argument to the content of old
to
produce the content of new
. All arguments can be strings or IO handles. If no
new
argument is provided, the new data is written to a temporary file whose
path is returned.
Note that the optional argument is the middle argument, which is a bit unusual
but makes the argument order when passing all three paths consistent with the
bspatch
command and with the bsdiff
function.
By default bspatch
auto-detects the patch format, so the format
keyword
argument is usually unnecessary. If you wish to restrict the format of patch
that will be accepted, however, you can use this keyword argument: bspatch
will raise an error unless the patch file has indicated format.
bsindex(old, [ index ]) -> index
Save index data (a sorted suffix array) for the content of old
into index
.
All arguments can be strings or IO handles. If no index
argument is provided,
the index data is saved to a temporary file whose path is returned.
The index can be passed to bsdiff
to speed up the diff computation by passing
(old, index)
as the first argument instead of just old
. Since indexing the
old data is the slowest part of generating a diff, precomputing this and reusing
it can significantly speed up generting diffs from the same old file to multiple
different new files.
julia> cd(mktempdir())
julia> open("goodbye.txt", write=true) do io
println(io, "Goodbye, world.")
end
julia> open("hello.txt", write=true) do io
println(io, "Hello, world!")
end
julia> using BSDiff
julia> patch = bsdiff("goodbye.txt", "hello.txt");
julia> bspatch("goodbye.txt", "hello_copy.txt", patch)
"hello_copy.txt"
julia> read(ans, String)
"Hello, world!\n"
Even though this package produces patch files that are compatible with the
classic and Endsley bsdiff
tools, the patch files it generates may not be
identical for a few reasons:
-
The bzip2 compression used by the package and by the commands may have different settings and produce different results—in general compression libraries like bzip2 don't guarantee perfect reproducibility.
-
The uncompressed patch produced by this package is sometimes better than the one produced by the command line tool due to a bug in the way the command uses
memcmp
to do string comparison. See this pull request for details.
The exact output produced by this library will also not necessarily remain
identical in the future—there are many valid patches for the same old
and
new
data. Improvements to the speed and quality of the patch generation
algorithm may lead to different outputs in the future. However, the patch format
is simple and stable: it is guaranteed that newer versions of the package will
be able to apply patches produced by older versions and vice versa.