CodecBGZF.jl

BGZF codecs for TranscodingStreams.jl
Author jakobnissen
Popularity
0 Stars
Updated Last
6 Months Ago
Started In
August 2020

CodecBGZF.jl

CI codecov

Codec for BGZF files

This package implements an efficient codec for BGZF files. The BGZF format consists of the concatenation of small gzip blocks. Because the format is blocked, it allows for random access and siginificantly faster de/compression.

The package has the following notable features:

  • Correctness above all: The BGZF format is well specified, and the package must write and read spec-compliantly. This includes validating the given checksums, decompression lengths, and the trailing EOF block.
  • Integration with the Julia ecosystem. This is achieved by this package being a codec for the TranscodingStreams.jl package.
  • Speed: This package should be the fastest Julia implementation of a BGZF parser. It is achieved by leveraging LibDeflate.jl, and by doing de/compression in a multithreaded and asynchronous manner.
  • Convenient random access with virtual file offsets.
  • Creation of GZI index files directly from compressed bgzipped files.

API

High level API

  • BGZFDecompressorStream(io::IO; nthreads=Threads.nthreads()) - create a decompressing TranscodingStream.
  • BGZFCompressorStream(io::IO; nthreads=Threads.nthreads(), compresslevel=6) - create a compressing TranscodingStream compressing to level compresslevel.
  • gzi(io::IO) - return a Vector{UInt8} representing the GZI index for a BGZF file io. To be used like this: gzi(open("/path/to/file.bgz"))
  • VirtualOffset(s::BGZFDecompressorStream) - Get an object representing the current offset of the stream. You can obtain the block offset and inblock offsets with offsets(v)
  • seek(s::BGZFDecompressorStream, v::VirtualOffset) - seek the stream to the given offset.
  • Being TranscodingStreams, you can expect the usual IO-related functions to work on the streams.