Build Status |
Community |
---|---|

This package provides tools to estimate standard errors and autocorrelation times of correlated time series. A typical example is a Markov chain obtained in a Metropolis Monte Carlo simulation.

**Binning tools:**

- Logarithmic Binning
- Size complexity:
`O(log(N))`

- Time complexity:
`O(N)`

- Size complexity:
- Full Binning (all bin sizes that work out evenly)

**Statistical resampling methods:**

- Jackknife resampling.

As per usual, you can install the registered package with

`] add BinningAnalysis`

Note that there is BinningAnalysisPlots.jl which defines some Plots.jl recipes for `LogBinner`

and `FullBinner`

to facilitate visualizing the error convergence.

## Binning tools

### Logarithmic Binning

```
B = LogBinner()
# As per default, 2^32-1 ≈ 4 billion values can be added to the binner. This value can be
# tuned with the `capacity` keyword argument.
push!(B, 4.2)
append!(B, [1,2,3]) # multiple values at once
x = mean(B)
Δx = std_error(B) # standard error of the mean
tau_x = tau(B) # autocorrelation time
# Alternatively you can provide a time series already in the constructor
x = rand(100)
B = LogBinner(x)
Δx = std_error(B)
```

### Full Binning

```
B = FullBinner() # <: AbstractVector (lightweight wrapper)
push!(B, 2.0)
append!(B, [1,2,3])
x = mean(B)
Δx = std_error(B) # standard error of the mean
# Alternatively you can provide a time series already in the constructor
x = rand(100)
F = FullBinner(x)
push!(F, 2.0) # will modify x as F is just a thin wrapper
Δx = std_error(F)
```

### Incremental Binning

```
# Averages pushed values more and more, starting with no averaging
# Averaging includes 2x more values for every blocksize averages saved
B = IncrementBinner(0.0, blocksize=50)
for x in rand(10_000)
push!(B, x)
end
# Returns the effective indices for the values saved
# I.e. [1, 2, ...49, 50, 51.5, 53.5, ..., 146.5, 148.5, 151.5, ...]
xs = indices(B)
# Returns the averaged values saved
ys = values(B)
```

## Resampling methods

### Jackknife

```
x = rand(100)
xmean, Δx = jackknife(identity, x) # jackknife estimates for mean and standard error of <x>
# in this example
# isapprox(Δx, std(x)/sqrt(length(x))) == true
x_inv_mean, Δx_inv = jackknife(identity, 1 ./ x) # # jackknife estimates for mean and standard error of <1/x>
# Multiple time series
x = rand(100)
y = rand(100)
# The inputs of the function `g` must be provided as arguments in `jackknife`.
g(x, y, xy) = x * y / xy # <x><y> / <xy>
g_mean, Δg = jackknife(g, x, y, x .* y)
```

### Error Propagator

```
ep = ErrorPropagator(N_args=1)
# Essentially a LogBinner that can hold multiple variables. Errors can be derived
# for functions which depend on these variables. The memory overhead of this
# type is O(N_args^2 log(N_samples)), making it much cheaper than jackknife for
# few variables
push!(ep, rand())
append!(ep, rand(99))
# Mean and error of the (first) input
xmean = mean(ep, 1)
Δx = std_error(ep, 1)
# To compute the mean and error of a function we need its gradient
f(x) = x.^2
dfdx(x) = 2x
y = mean(ep, f)[1]
Δy = std_error(ep, dfdx)[1]
# Error propagator with multiple variables:
ep = ErrorPropagator(N_args=3)
# Multiple time series
x = rand(100)
y = rand(100)
append!(ep, x, y, x.*y)
# means and standard error of inputs:
xs = means(ep)
Δxs = std_errors(ep)
# mean and error of a function dependant on x, y and xy
# Note that this function takes a vector input
g(v) = v[1] * v[2] / v[3] # <x><y> / <xy>
dgdx(v) = [v[2]/v[3], v[1]/v[3], -v[1]*v[2]/v[3]^2]
g_mean = mean(ep, g)
Δg = std_error(ep, dgdx)
```

## Convenience wrapper

If you want to calculate the standard error of an existing time series there you can use the convenience wrapper `std_error(x[; method=:log])`

. It takes a keyword argument `method`

, which can be `:log`

, `:full`

, or `:jackknife`

.

```
ts = rand(1000);
std_error(ts) # default is logarithmic binning
std_error(ts, method=:full)
```

## Supported types

All statistical tools should work with number-like (`<: Number`

) and array-like (`<: AbstractArray`

) elements. Regarding complex numbers, we follow base Julia and define
`var(x) = var(real(x)) + var(imag(x))`

.

If you observe unexpected behavior please file an issue!

## References

- J. Gubernatis, N. Kawashima, and P. Werner, Quantum Monte Carlo Methods: Algorithms for Lattice Models, Book (2016)
- V. Ambegaokar, and M. Troyer, Estimating errors reliably in Monte Carlo simulations of the Ehrenfest model, American Journal of Physics 78, 150 (2010)