BinaryTemplates.jl

Create a binary template from byte chunks at known offsets and file sizes.
Author mkitti
Popularity
2 Stars
Updated Last
1 Year Ago
Started In
July 2022

BinaryTemplates

Stable Dev Build Status Coverage

BinaryTemplates.jl assembles a file from chunks at byte offsets. This is useful for creating large files quickly that are sparse, such as data containers before the data is added.

Example Usage

Below we create a 4 mibibyte template with chunks at byte offsets 0, 4096, and 4193280. The chunks are 1024, 2048, and 512 bytes in length.

The template itself only takes up 3856 bytes. The template can applied to a file that does not yet exist. It will then create a 4 MiB file.

The apply_template function checks that it is only overwriting zeros in the file. If it finds non-zero values where the chunks should be, then it will throw an exception. The function returns a BinaryTemplate representing the data that it overwrote with the chunks. This backup template is also written to a backup file.

Overwriting non-zero values can be opted into with the ensure_zeros keyword. By applying the template a second time, we will obtain a second backup template. This second backup template should be equal to the original template that we applied.

julia> using BinaryTemplates

julia> binary_template_4MiB = BinaryTemplate(4*1024^2, [0, 4096, 4*1024^2-1024], [rand(UInt8, 1024), rand(UInt8, 2048), rand(UInt8, 512)])
BinaryTemplate:
    expected_file_size: 4.000 MiB

    Offsets            Length     Chunk Checksum
    ------------------ ---------- --------------
    0x0000000000000000       1024     0x68028575
    0x0000000000001000       2048     0x8c68f913
    0x00000000003ffc00        512     0x1ec370e3


julia> Base.summarysize(binary_template_4MiB)
3856

julia> fn = tempname(); apply_template(fn, binary_template_4MiB)
BinaryTemplate:
    expected_file_size: 0 bytes

    Offsets            Length     Chunk Checksum
    ------------------ ---------- --------------
    0x0000000000000000          0     0x00000000
    0x0000000000001000          0     0x00000000
    0x00000000003ffc00          0     0x00000000

julia> filesize(fn)
4194304

julia> backup = apply_template(fn, binary_template_4MiB)
ERROR: Non-zero value found in C:\Users\KITTIS~1\AppData\Local\Temp\jl_vQiVuwO6WZ when applying template. Use keyword `ensure_zero = false` to override.
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:33
 [2] apply_template(target_filename::String, t::BinaryTemplate; backup_filename::String, ensure_zero::Bool, truncate::Bool)
   @ BinaryTemplates c:\Users\kittisopikulm\.julia\dev\BinaryTemplates\src\io.jl:161
 [3] apply_template(target_filename::String, t::BinaryTemplate)
   @ BinaryTemplates c:\Users\kittisopikulm\.julia\dev\BinaryTemplates\src\io.jl:148
 [4] top-level scope
   @ REPL[211]:1

julia> backup = apply_template(fn, binary_template_4MiB; ensure_zero = false)
BinaryTemplate:
    expected_file_size: 4.000 MiB

    Offsets            Length     Chunk Checksum
    ------------------ ---------- --------------
    0x0000000000000000       1024     0x68028575
    0x0000000000001000       2048     0x8c68f913
    0x00000000003ffc00        512     0x1ec370e3


julia> backup == binary_template_4MiB
true

Applications

This templating technique can be used to create large HDF5 files where the metadata is confined to a few chunks. See the HDF5BinaryTemplates.jl package. The file can be written very efficiently since apply_template only writes the metadata chunks. It uses seek to skip over regions between the chunks, such as where the datasets might be.