## LinLogQuantization.jl

Linear and logarithmic quantization for Julia arrays
Author milankl
Popularity
3 Stars
Updated Last
1 Year Ago
Started In
March 2021

# LinLogQuantization.jl

Linear and logarithmic quantisation for Julia arrays into 8, 16, 24 or 32-bit. Quantisation is a lossy compression method that divides the range of values in an array in equi-distant quantums and encodes those from 0 to `2^n-1` where `n` is the number of bits available. The quantums are either equi-distant in linear space or in logarithmic space, which has a denser encoding for values close to the minimum in trade-off with a less dense encoding close to the maximum.

Linear quantization takes values in (-∞,∞) (no `NaN` or `Inf`) logarithmic quantization is only supported for values in [0,∞).

## Usage: Linear quantization

Linear quantisation of n-dimensional arrays (any number format that can be converted to `Float64` is supported, including `Float32, Float16`) into 8, 16, 24 or 32 bit is achieved via

```julia> A = rand(Float32,1000)
julia> L = LinQuant8Array(A)
1000-element LinQuantArray{UInt8,1}:
0xc2
0x19
0x3e
0x5b
⋮```

and similarly with `LinQuant16Array, LinQuant24Array, LinQuant32Array`. Decompression via

```julia> Array(L)
1000-element Array{Float32,1}:
0.76074356
0.09858093
0.24355145
0.357177
⋮```

`Array{T}()` optionally takes a type parameter `T` such that decompression to other number formats than the default `Float32` is possible involves a rounding error which follows a round-to-nearest in linear space.

### Logarithmic quantisation

In a similar way, `LogQuant8Array, LogQuant16Array, LogQuant24Array, LogQuant32Array` compresses an n-dimensional array (non-negative elements only) via logarithmic quantisation.

```julia> A = rand(Float32,100,100)
julia> A[1,1] = 0
julia> L = LogQuant16Array(A)
100×100 LogQuantArray{UInt16,2}:
0x0000  0xf22d  0xfdf6  0xf3e8  0xf775  …
0xe3dc  0xfdc0  0xedb5  0xed47  0xee5b
0xde3d  0xbe58  0xb541  0xf573  0x9885
0xf38b  0xfefe  0xea2f  0xfbb6  0xf0d2
0xd0d2  0xfe1f  0xff60  0xf6cd  0xec26
0xffa6  0xe621  0xf14d  0xfb2c  0xf50c  …
0xfcb7  0xe6fb  0xf237  0xecd5  0xfb0a
0xe4ed  0xf86f  0xf83d  0xff86  0xb686
⋮                                  ⋱```

Exception occurs for 0, which is mapped to `0x0`. `Ox1` to `0xff...ff` are then the available bitpatterns to encode the range from `minimum(A)` to `maximum(A)` logarithmically. By default the rounding mode for logarithmic quantisation is round-to-nearest in linear space. Alternatively, a second argument can be either `:linspace` or `:logspace`, which allows for round-to-nearest in logarithmic space. Decompression as with linear quantisation via the `Array()` function.

## Theory

To compress an array `A`, the minimum and maximum is obtained

```Amin = minimum(A)
Amax = maximum(A)```

which allows the calculation of `Δ`, the inverse of the spacing between two quantums

`Δ = (2^n-1)/(Amax-Amin)`

where `n` is the number of bits used for quantisation. For every element `a` in `A` the corresponding quantum `q` which is closest in linear space is calculated via

`q = T(round((a-Amin)*Δ))`

where `round` is the round-to-nearest function for integers and `T` the conversion function to 24-bit unsigned integers `UInt24` (or `UInt8, UInt16` for other choices of `n`). Consequently, an array of all `q` and `Amin,Amax` have to be stored to allow for decompression, which is obtained by reversing the conversion from `a` to `q`. Note that the rounding error is introduced as the `round` function cannot be inverted.

Logarithmic quantisation distributes the quantums logarithmically, such that more bitpatterns are reserved for values close to the minimum and fewer close to the maximum in `A`. Logarithmic quantisation can be generalised to negative values by introducing a sign-bit, however, we limit our application here to non-negative values. We obtain the minimum and maximum value in `A` as follows

```Alogmin = log(minpos(A))
Alogmax = log(maximum(A))```

where zeros are ignored in the `minpos` function, which instead returns the smallest positive value. The inverse spacing `Δ` is then

`Δ = (2^n-2)/(logmax-logmin)`

Note, that only `2^n-1` (and not 2^n as for linear quantisation) bitpatterns are used to resolve the range between minimum and maximum, as we want to reserve the bitpattern `0x000000` for zero. The corresponding quantum `q` for `a` `A` is then

`q = T(round(c + Δ*log(a)))+0x1`

unless `a=0` in which case `q=0x000000`. The constant `c` can be set as `-Alogmin*Δ` such that we obtain essentially the same compression function as for linear quantisation, except that every element `a` in `A` is converted to their logarithm first. However, rounding to nearest in logarithmic space will therefore be achieved, which is a biased rounding mode, that has a bias away from zero. We can correct this round-to-nearest in logarithmic space rounding mode with

`c = 1/2 - Δ*log(minimum(A)*(exp(1/Δ)+1)/2)`

which yields round-to-nearest in linear space. See next section.

## Round to nearest in linear or logarithmic space

For a logarithmic integer system with base `b` (i.e. only `0,b,b²,b³,...` are representable), for example, we have

```log_b(1) = 0
log_b(√b) = 0.5
log_b(b) = 1
log_b(√b³) = 1.5
log_b(b²) = 2```

such that `q*√b` is always halfway between two representable numbers `q,q2` in logarithmic space, which will be the threshold for round up or down in the `round` function. `q*√b` is not halfway in linear space, which is always at `q + (q*b - q)/2`. For simplicity we can set `q=1`, and for `b=2` we find that

`√2 = 1.41... != 1.5 = 1 + (2-1)/2`

Round-to-nearest in log-space therefore rounds the values between 1.41... and 1.5 to 2, which will introduce an away-from-zero bias. As halfway in log-space is reached by multiplication with `√b`, this can be corrected to halfway in linear space by adding a constant `c_b` in log-space, such that conversion from halfway in linear space, i.e. `1+(b-1)/2` should yield halway in log-space, i.e. 0.5

`c_b + log_b(1+(b-1)/2) = 0.5`

So, for `b=2` we have `c_b = 0.5 - log2(1.5) ≈ -0.085`. Hence, a small number will be subtracted before rounding is applied to reduce the away-from-zero bias.

Figure A1. Schematic to illustrate round-to-nearest in linear vs logarithmic space for logarithmic number systems.

We now generalise the logarithmic system, such that the distance `dlog = 1/Δ` between two representable numbers (i.e. quantums) is not necessarily 1 (in log-space) and we allow for an offset as done in the logarithmic quantisation. Let `min` be the offset (i.e. the minimum of the uncompressed array) and `dlin` the spacing between the first two representable quantums `min,q2`. Then the logarithm of halfway in linear space, `log_b(min + dlin/2)`, should map to `0.5`.

`c_b + (log_b(min + dlin/2) - log_b(min))/dlog = 0.5`

With `dlin = b^(log_b(min) + dlog) - min` this can be transformed into

`c_b = 1/2 - 1/dlog*log_b((b^dlog + 1)/2)`

and combined with the offset correction `-log_b(min)*Δ` to form either

```c = -log(min)*Δ,   (round-to-nearest in log-space)
c = 1/2 - Δ*log(minimum(A)*(exp(1/Δ)+1)/2)    (round-to-nearest in linear-space)```

with `b = ℯ`, so that only the natural logarithm has to be computed for every element in the uncompressed array.

## Benchmarking

Approximate throughputs are (via `@btime`)

Method 8 bit 16 bit 24 bit 32 bit
Linear
compression 1350 MB/s 1350 MB/s 50 MB/s 1350 MB/s
decompression 4700 MB/s 4700 MB/s 4000 MB/s 3600 MB/s
Logarithmic
compression 285 MB/s 285 MB/s 40 MB/s 285 MB/s
decompression 250 MB/s 250 MB/s 250 MB/s 500 MB/s

24-bit quantisation is via `UInt24` from the `BitIntegers` package, which introduces a drastic slow-down.

## Installation

LinLogQuantization.jl is registered, so just do

`julia>] add LinLogQuantization`

where `]` opens the package manager

### Used By Packages

No packages found.