CBOR.jl

A Concise Binary Object Representation (RFC 7049) serialization library in Julia
Popularity
19 Stars
Updated Last
5 Months Ago
Started In
July 2016

CBOR.jl

Build Status Build Status

CBOR.jl is a Julia package for working with the CBOR data format, providing straightforward encoding and decoding for Julia types.

About CBOR

The Concise Binary Object Representation is a data format that's based upon an extension of the JSON data model, whose stated design goals include: small code size, small message size, and extensibility without the need for version negotiation. The format is formally defined in RFC 7049.

Usage

Add the package

Pkg.add("CBOR")

and add the module

using CBOR

Encoding and Decoding

Encoding and decoding follow the simple pattern

bytes = encode(data)

data = decode(bytes)

where bytes is of type Array{UInt8, 1}, and data returned from decode() is usually of the same type that was passed into encode() but always contains the original data.

Primitive Integers

All Signed and Unsigned types, except Int128 and UInt128, are encoded as CBOR Type 0 or Type 1

> encode(21)
1-element Array{UInt8,1}: 0x15

> encode(-135713)
5-element Array{UInt8,1}: 0x3a 0x00 0x02 0x12 0x20

> bytes = encode(typemax(UInt64))
9-element Array{UInt8,1}: 0x1b 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff

> decode(bytes)
18446744073709551615

Byte Strings

An AbstractVector{UInt8} is encoded as CBOR Type 2

> encode(UInt8[x*x for x in 1:10])
11-element Array{UInt8, 1}: 0x4a 0x01 0x04 0x09 0x10 0x19 0x24 0x31 0x40 0x51 0x64

Strings

String are encoded as CBOR Type 3

> encode("Valar morghulis")
16-element Array{UInt8,1}: 0x4f 0x56 0x61 0x6c 0x61 ... 0x68 0x75 0x6c 0x69 0x73

> bytes = encode("אתה יכול לקחת את סוס אל המים, אבל אתה לא יכול להוכיח שום דבר אמיתי")
119-element Array{UInt8,1}: 0x78 0x75 0xd7 0x90 0xd7 ... 0x99 0xd7 0xaa 0xd7 0x99

> decode(bytes)
"אתה יכול לקחת את סוס אל המים, אבל אתה לא יכול להוכיח שום דבר אמיתי"

Floats

Float64, Float32 and Float16 are encoded as CBOR Type 7

> encode(1.23456789e-300)
9-element Array{UInt8, 1}: 0xfb 0x01 0xaa 0x74 0xfe 0x1c 0x13 0x2c 0x0e

> bytes = encode(Float32(pi))
5-element Array{UInt8, 1}: 0xfa 0x40 0x49 0x0f 0xdb

> decode(bytes)
3.1415927f0

Arrays

AbstractVector and Tuple types, except of course AbstractVector{UInt8}, are encoded as CBOR Type 4

> bytes = encode((-7, -8, -9))
4-element Array{UInt8, 1}: 0x83 0x26 0x27 0x28

> decode(bytes)
3-element Array{Any, 1}: -7 -8 -9

> bytes = encode(["Open", 1, 4, 9.0, "the pod bay doors hal"])
39-element Array{UInt8, 1}: 0x85 0x44 0x4f 0x70 0x65 ... 0x73 0x20 0x68 0x61 0x6c

> decode(bytes)
5-element Array{Any, 1}: "Open" 1 4 9.0 "the pod bay doors hal"

> bytes = encode([log2(x) for x in 1:10])
91-element Array{UInt8, 1}: 0x8a 0xfb 0x00 0x00 0x00 ... 0x4f 0x09 0x79 0xa3 0x71

> decode(bytes)
10-element Array{Any, 1}: 0.0 1.0 1.58496 2.0 2.32193 2.58496 2.80735 3.0 3.16993 3.32193

Maps

An AbstractDict type is encoded as CBOR Type 5

> d = Dict()
> d["GNU's"] = "not UNIX"
> d[Float64(e)] = [2, "+", 0.718281828459045]

> bytes = encode(d)
38-element Array{UInt8, 1}: 0xa2 0x65 0x47 0x4e 0x55 ... 0x28 0x6f 0x8a 0xd2 0x56

> decode(bytes)
Dict{Any,Any} with 2 entries:
  "GNU's"           => "not UNIX"
  2.718281828459045 => Any[0x02, "+", 0.718281828459045]

Tagging

To tag one of the above types, encode a Tag with first being an non-negative integer, and second being the data you want to tag.

> bytes = encode(Tag(80, "web servers"))

> data = decode(bytes)
0x50=>"HTTP Web Server"

There exists an IANA registery which assigns certain meanings to tags; for example, a string tagged with a value of 32 is to be interpreted as a Uniform Resource Locater. To decode a tagged CBOR data item, and then to automatically interpret the meaning of the tag, use decode_with_iana.

For example, a Julia BigInt type is encoded as an Array{UInt8, 1} containing the bytes of it's hexadecimal representation, and tagged with a value of 2 or 3

> b = BigInt(factorial(20))
2432902008176640000

> bytes = encode(b * b * -b)
34-element Array{UInt8,1}: 0xc3 0x58 0x1f 0x13 0xd4 ... 0xff 0xff 0xff 0xff 0xff

To decode bytes without interpreting the meaning of the tag, use decode

> decode(bytes)
0x03 => UInt8[0x96, 0x58, 0xd1, 0x85, 0xdb .. 0xff 0xff 0xff 0xff 0xff]

To decode bytes and to interpret the meaning of the tag, use decode_with_iana

> decode_with_iana(bytes)
-14400376622525549608547603031202889616850944000000000000

Currently, only BigInt is supported for automatically tagged encoding and decoding; more Julia types will be added in the future.

Composite Types

A generic DataType that isn't one of the above types is encoded through encode using reflection. This is supported only if all of the fields of the type belong to one of the above types.

For example, say you have a user-defined type Point

mutable struct Point
    x::Int64
    y::Float64
    space::String
end

point = Point(1, 3.4, "Euclidean")

When point is passed into encode, it is first converted to a Dict containing the symbolic names of it's fields as keys associated to their respective values and a "type" key associated to the type's symbolic name, like so

Dict{Any, Any} with 3 entries:
  "x"     => 0x01
  "type"  => "Point"
  "y"     => 3.4
  "space" => "Euclidean"

The Dict is then encoded as CBOR Type 5.

Indefinite length collections

To encode collections of indefinite length, you can just wrap any iterator in the CBOR.UndefLength type. Make sure that your Iterator knows their eltype to e.g. create a bytestring / string / Dict indefinite length encoding. The eltype mapping is:

Vector{UInt8} -> bytestring
String -> bytestring
Pair -> Dict
Any -> List

If the eltype is unknown, but you still want to enforce it, use this constructor:

CBOR.UndefLength{String}(iter)

First create some julia iterator with unknown length

function producer(ch::Channel)
    for i in 1:10
        put!(ch,i*i)
    end
end
iter = Channel(producer)

encode it with UndefLength

> encode(UndefLength(iter))
18-element Array{UInt8, 1}: 0x9f 0x01 0x04 0x09 0x10 ... 0x18 0x51 0x18 0x64 0xff

> decode(bytes)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

While encoding an indefinite length Map, produce first the key and then the value for each key-value pair, or produce pairs!

function cubes(ch::Channel)
    for i in 1:10
        put!(ch, i)       # key
        put!(ch, i*i*i)   # value
    end
end

> bytes = encode(UndefLength{Pair}(Channel(cubes)))
34-element Array{UInt8, 1}: 0xbf 0x01 0x01 0x02 0x08 ... 0x0a 0x19 0x03 0xe8 0xff

> decode(bytes)
Dict(7=>343,4=>64,9=>729,10=>1000,2=>8,3=>27,5=>125,8=>512,6=>216,1=>1)

Note that when an indefinite length CBOR Type 2 or Type 3 is decoded, the result is a concatenation of the individual elements.

function producer(ch::Channel)
    for c in ["F", "ire", " ", "and", " ", "Blo", "od"]
        put!(ch,c)
    end
end

> bytes = encode(UndefLength{String}(Channel(producer)))
23-element Array{UInt8, 1}: 0x7f 0x61 0x46 0x63 0x69 ... 0x6f 0x62 0x6f 0x64 0xff

> decode(bytes)
"Fire and Blood"

Caveats

Encoding a UInt128 and an Int128 isn't supported; use a BigInt instead.

Decoding CBOR data that isn't well-formed is unpredictable.