Optimal histogram binning based on piecewise constant model.
Paper: Studies in Astronomical Time Series Analysis. VI. Bayesian Block Representations [https://arxiv.org/abs/1207.5578]
Have you ever hated the default histogram binning rules in your favourite analysis and plotting library?
You can solve your problem by relying on
This package provides the function
bayesian_blocks, which determines the bin sequence that maximises the probability of observing your data, assuming a histogram can describe it.
In other words, the package implements a complicated algorithm that returns the optimal histogram, respecting some simple constraints you can customise.
If you can't take it anymore, go directly to the usage examples or look at the file
make_plot.jl, or if you want to know all the internal details look at
Four ingredients determine the optimal histogram:
- The likelihood of a given bin.
- The apriori probability of observing that bin.
- The maximum resolution at which you want to separate the data (default is
- The minimum possible number of observations in each bin (default is
How can you influence these factors?
It is not modifiable. For a long time now, the Likelihood principle and its natural Bayesian extension have been part of the toolbox of every statistician: it is a solid principle that we can reasonably trust :)
It is modifiable. The package implements a wide choice of possibilities. Indeed you can choose the prior among the following alternatives:
BIC(default): Bayesian information criterion, requires no parameters and is asymptotically consistent.
AIC: Akaike information criterion: minimises prediction error, requires no parameters (in some cases adds too many bins, but the problem can be solved using (3) and (4)).
HQIC: Hannan-Quinn criterion, has intermediate behaviour between BIC and AIC, is close to consistency, and tries to minimise prediction error.
FPR(p): Scargle Criterion, a data bin is added if it has a false positive rate lower than
Geometric(gamma): varying the parameter
gammachanges the average number of bins allowed.
Pearson(p): This is useful when you want bins containing about
Nis the total number of events.
NoPrior: for non-Bayesians, it always requires tuning (3) and (4).
?, You can implement a customised prior by following the examples in the
If you need to alter this parameter, add the keyword argument
resolution = ?to the
bayesian_blocksfunction. A typical value might be
Like (3), you can configure it via
min_counts = ?.
Thank you for reading this (brief?) introductory section.
using Pkg Pkg.add("BayesHistogram")
for looking at every option available type in the repl
using Plots, BayesHistogram X = exp.(randn(5000)./3) bl = bayesian_blocks(X) # plot using "to_pdf" support, density = to_pdf(bl) plot(support, density) # or using "edges" parameter and an histogramming procedure histogram(X, bins=bl.edges, normalize = :pdf)
we can change the prior as follows:
bl = bayesian_blocks(X, prior=AIC(), resolution=40) plot(to_pdf(bl)...) bl = bayesian_blocks(X, prior=FPR(0.2), resolution=40) plot!(to_pdf(bl)...)
we can also plot the errors:
plot(to_pdf(bl)..., color="black") scatter!(bl.centers, bl.heights, yerr = bl.error_heights, color="black")
we can also estimate averages and their errors:
estimate(bl) do v; v end # result: (1.0610279949641472, 0.014646660658986687) # the result can be refined by increasing the number of integration points estimate(bl,100) do v; v^2 end # result: (1.2574274634942957, 0.021142852215391358)