Julia package to determine effective number of uncorrelated data points in a correlated data stream via an O(log N)
online binning algorithm.
This package uses the online logarithmic binning algorithm discussed in Refs. [1] and [2], but uses the numerically stable first-pass pairwise algorithm from Ref. [3] to update the mean and variance accumulators. Importantly, the binning analysis is online in the sense that the whole data stream of size O(N)
need not be stored. Instead, a much smaller data stream of size O(log N)
needs to be. This makes properly assessing the correlated errors generated from Markov Chain Monte Carlo simulations practical to update on-the-fly [4].
First, construct an empty BinningAccumulator
with of T <: Number
parametric type. Let's take the default T = Float64
as an example.
# Initialize a BinningAccumulator{T}
bacc = BinningAccumulator()
We currently only support Float
types, i.e. T <: AbstractFloat
or T
is a Complex{Float#}
. The tested types are listed in OLB_tested_numbers
.
Then, push!
either a single value or a data stream (sequence of values of itr
type) to the BinningAccumulator
. The online analysis will be taken care of automatically.
# push! a single value into the BinningAccumulator
# Values of incorrect type are converted to the correct type internally
push!(bacc, 1)
# push! a data stream into the BinningAccumulator
push!(bacc, [1, 2, 3, 4, 3, 2, 1])
One can then calculate the following statistics from the BinningAccumulator
at any binning level = lvl
:
mean(bacc::BinningAccumulator; level = lvl) # arithmetic mean
var(bacc::BinningAccumulator; level = lvl) # sample variance
std(bacc::BinningAccumulator; level = lvl) # sample standard deviation
var_of_mean(bacc::BinningAccumulator; level = lvl) # variance of the mean
std_error(bacc::BinningAccumulator; level = lvl) # standard error of the mean
The binning level
is optional. By default, the binning level
is set to level = 0
. This level, accessed by bacc[level = 0]
, represents the unbinnned statistics from of the original data stream. The LevelAccumulator
s from any binning level
can also be extracted using the overloaded []
notation as
julia> bacc[level = 0]
LevelAccumulator{Float64} with online fields:
level = 0
num_bins = 6
Taccum = 15.0
Saccum = 5.5
Paccum = PairAccumulator{Float64}(false, [0.0, 1.0])
Calculated Level Statistics:
Current Mean = 2.5
Current Variance = 1.1
Current Std. Deviation = 1.0488088481701516
Current Var. of the Mean = 0.18333333333333335
Current Std. Error = 0.42817441928883765
julia> bacc[level = 1]
LevelAccumulator{Float64} with online fields:
level = 1
num_bins = 2
Taccum = 5.0
Saccum = 2.0
Paccum = PairAccumulator{Float64}(false, [0.0, 2.5])
Calculated Level Statistics:
Current Mean = 2.5
Current Variance = 2.0
Current Std. Deviation = 1.4142135623730951
Current Var. of the Mean = 1.0
Current Std. Error = 1.0
The online binning functionality works by combining the method described in [1], where one keeps track of several accumulators, with the first-pass pairwise algorithm from [3]. The online (i.e. O(1)
) quantities that are obtained from this process are the Tvalue
, Svalue
, mean
or var
iance. These statistics are online in the sense that they are simple function of online accumulators, and so we emphasize their calculation is still amortized with complexity O(1)
. This is despite that the mean
, var
iance, etc. are not updated continuously; only
Using the notation of Ref. [3], the
is given by
where the mean is given by
The variance is then given by
These two quantities can be computed online with the first-pass pairwise algorithm given an additional two elements
where
We implement this with three nested Accumulator
struct
s: the outermost BinningAccumulator
, the middle-level LevelAccumulator
, and the innermost PairAccumulator
. The BinningAccumulator
stores a Vector
of LevelAccumulators
, each of which store their own PairAccumulator
.
The PairAccumulator
struct
is the outward facing element of the BinningAccumulator
in that it takes in data directly from the data stream. After a pair of values has been imported from the stream, then LevelAccumulator
, where the PairAccumulator
is reset!
. At the same time, the outermost BinningAccumulator
passes the PairAccumulator
in the LevelAccumulator
at the next binning level, where the whole process is repeated again, except the accumulated
The fact that the data stream is processed in pairs before being passed along to the other binning levels inherently leads to a bin_depth
given by
-
BinningAnalysis.jl
for a very similar Julia package which served as an inspiration for this one. Our package does not have as broad of a scope as theirs.- One of the authors of
BinningAnalysis.jl
also wrote Ref. [4] which gives a great introduction to the statistical analysis of Monte Carlo data.
- One of the authors of
-
OnlineStats.jl
for many more online statistics and routines, most of which are beyond the scope of this package.
[1] See Section D of M. Wallerberger's Efficient estimation of autocorrelation spectra (2018) at arXiv:1810.05079.
[2] Ambegaokar, V. and Troyer, M. Estimating errors reliably in Monte Carlo simulations of the Ehrenfest model, American Journal of Physics 78, 150-157 (2010) doi.org/10.1119/1.3247985
[3] Chan, T. F., Golub, G. H., & LeVeque, R. J. (1983). Algorithms for Computing the Sample Variance: Analysis and Recommendations. The American Statistician, 37(3), 242–247. doi.org/10.2307/2683386
[4] Bauer, C. Simulating and machine learning quantum criticality in a nearly antiferromagnetic metal. PhD Dissertation, 2020. Specifically Section 2.6.