KernelDensitySJ.jl

Bandwidth selection using Sheather and Jones algorithm (sometimes known as width.SJ or bw.SJ)
Author rasmushenningsson
Popularity
1 Star
Updated Last
1 Year Ago
Started In
May 2020

KernelDensitySJ.jl

Build Status Build Status Coverage Coverage

Overview

KernelDensitySJ.jl is a package for Kernel Density Estimation and function estimation by Gaussian Kernel Smoothing.

More specifically, this package implements accurate, robust and reasonably fast algorithms for:

  • Kernel Density Estimation in one dimension using Sheather and Jones algorithm for bandwidth selection, also known as width.SJ or bw.SJ. (The specific version of the algorithm is sometimes referred to as "solve-the-equation".) Only Gaussian kernels are supported.
  • Evaluation of the density.
  • Gaussian Kernel Smoothing, i.e. given a set of points (x and y values), computes a weighted average of the y values with weights coming from gaussian kernels placed at each x value.

Extra care is taken to handle outliers and extreme values for different parameters, ensuring that the results are accurate for a very wide range of inputs.

Installation

Start the Julia REPL, press ']' to enter package mode and type

add KernelDensitySJ

Usage Example

Let's start with some random points on the x-axis for a simple example.

using KernelDensitySJ
px = -1 .+ 2.0.*rand(40)

Now we can estimate the bandwidth

bw = bwsj(px)

and evaluate the density

x = range(-1.0,stop=1.0,length=201)
yd = density(px, bw, x)

density

If we also have y values for the points

py = cos.(π.*px) .+ 0.2.*randn(40)

then we can make a smooth estimate of the underlying function by

ys = smooth(px, py, bw, x)

smooth

The desired precision can be controlled by the rtol and atol keyword arguments.

Here is how it looks like with a couple of different values for the bandwidth: density2 smooth2

Technical Details

The bandwidth is computed by solving equation (12) in [Sheather and Jones, 1991] using root-finding methods. To solve the equation both accurately and fast, this packages computes and successively refines lower and upper bounds of the target function for a given bandwidth until the bounds no longer includes 0, which is enough to bracket the root.

The density and smooth functions use a similar approach by refining upper and lower bounds until the desired tolerance has been achieved. It's common that the weight is very close to zero for points far away from the evaluation point and this strategy ensures that we do not spend unnecessary computing power there, while still having accurate bounds for the error.

The upper and lower bounds are computed by splitting the estimated functions into intervals where the functions are convex/concave and using Jensen's inequality for one direction and a linear approximation for the other.

References

  • Sheather, S. J., & Jones, M. C. (1991). A reliable data‐based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society: Series B (Methodological), 53(3), 683-690.

Other Kernel Density Estimation Packages

The following packages also implement Kernel Density Estimation in Julia: