"...where ignorance is bliss, 'tis folly to be wise"
-Thomas Gray, "Ode on a Distant Prospect of Eton College"
TruncatedStreams provides types that meet the following four criteria:
- Inherit from
Base.IO
; - Transparently pass all basic IO methods through to a wrapped
IO
object, except... - Lie about
eof
, and... - Do not read a single byte more from the wrapped
IO
object than what is necessary to determine EOF.
using TruncatedStreams
io = IOBuffer(collect(0x00:0xff))
fixed_io = FixedLengthIO(io, 10) # pretend EOF occurs after the first 10 bytes are read
@assert read(fixed_io) == collect(0x00:0x09)
@assert eof(fixed_io) == true
@assert eof(io) == false # a lie, but a useful one!
@assert peek(io) == 0x0a # read exactly 10 bytes from io and not a byte more
sentinel_io = SentinelIO(io, [0x10, 0x11]) # pretend EOF occurs as soon as the sentinel is read
@assert read(sentinel_io) == collect(0x0a:0x0f)
@assert eof(sentinel_io) == true
@assert eof(io) == false
@assert peek(io) == 0x12 # the sentinel is consumed, but not a byte more
close(io)
Julia basically offers two methods for reading some but not all the bytes from an IO object:
read(::IO, ::Integer)
, which reads up to some number of bytes from an IO object, allocating and appending to aVector{UInt8}
to hold everything it reads; orreaduntil(::IO, ::Vector{UInt8})
, which reads bytes from an IO object until a sentinel vector is found, again allocating and appending to aVector{UInt8}
to hold everything it reads.
But what if you find yourself in the following situation:
- You want to read values of many different types from an IO object.
- You know you can safely read some number of bytes from the IO object (either a fixed number or until some sentinel is reached).
- You do not want to (or cannot) read everything from the IO object into memory at once.
This may seem like a contrived situation, but consider an IO object representing a concatenated series of very large files, like what you might see in a TAR or ZIP archive:
- You want to treat each file in the archive like a file on disk, reading an arbitrary number of values of arbitrary types from the file.
- The file either starts with a header that tells you how many bytes long the file is or ends with a sentinel so you know when to stop reading.
- You do not want to (or cannot) read the entire file into memory before parsing.
Enter TruncatedStreams
. This package exports types that inherit from Base.IO
and wrap other Base.IO
objects with one purpose in mind: to lie about EOF. This means you can wrap your IO object and blindly read from it until it signals EOF, just like you would any other IO object. And, if the wrapped IO object supports it, you can write to the stream, seek to a position, skip bytes, mark and reset positions, or do whatever basic IO operation you can think of and not have to worry about whether you remembered to add or subtract the right number of bytes from your running tally, or whether your buffered read accidentally captured half of the sentinel at the end.
Abstraction is ignorance, and ignorance is bliss.
using Pkg; Pkg.install("TruncatedStreams")
FixedLengthIO
wraps an IO
object and will read from it until a certain number of bytes is read, after which FixedLengthIO
will act as if it has reached the end of the file:
julia> using TruncatedStreams
julia> io = IOBuffer(collect(0x00:0xff));
julia> fio = FixedLengthIO(io, 10); # Only read the next 10 bytes
julia> read(fio, UInt64) # First 8 bytes
0x0706050403020100
julia> read(fio) # Everything else
2-element Vector{UInt8}:
0x08
0x09
julia> eof(fio) # It's a lie, but it's a useful one!
true
SentinelIO
wraps an IO
object and will read from in until a sentinel is found, after which SentinelIO
will act as if it has reached the end of the file, discarding the sentinel:
julia> using TruncatedStreams
julia> io = IOBuffer(collect(0x00:0xff));
julia> sio = SentinelIO(io, [0x10, 0x11, 0x12]); # Only read until [0x10, 0x11, 0x12] is found
julia> read(sio, UInt64) # First 8 bytes
0x0706050403020100
julia> read(sio) # Everything else
8-element Vector{UInt8}:
0x08
0x09
0x0a
0x0b
0x0c
0x0d
0x0e
0x0f
julia> eof(sio) # It's a lie, but it's a useful one!
true
julia> peek(io) # Note that the sentinel is no longer in the wrapped IO
0x13