KDEstimation.jl

Provides a general framework for implementing and performing Kernel Density Estimation
Author m-wells
Popularity
6 Stars
Updated Last
1 Year Ago
Started In
October 2019

KDEstimation (Kernel Density Estimation)

Build Status codecov Coverage Status

The purpose of this package is to provide a general framework for implementing Kernel Density Estimation methods.

Univariate KDE

The density estimator

where

  • is the estimator
  • is the kernel function
  • is the bandwidth can be evaluated using one of three implemented methods.
  • Direct()
    • where is the sample size
  • Binned()
    • where is the number of evaluation points
    • by default
  • FFT()
    • where is the number of evaluation points
    • by default

Multivariate KDE (work in progress)

Kernels implemented

Here is a link to the relevant wikipedia article

Kernel Support
Biweight
Cosine
Epanechnikov
Logistic unbounded
Normal unbounded
SymTriangularDist
Triweight
Uniform

This package uses Distributions.jl to suppy kernels such that

where

and is one of the kernels listed in the table above.

Note: for the Uniform distribution, Distributions.jl defines (loc,scale) = (a, b-a)) where a and b are the bounds lower and upper bounds, respectively. This package accounts for this inconsistancy by evaluating the Uniform kernel as

Bandwidth selection via Least Squares Cross Validation

The objective function to minimize is given by

where

This has also been implemented using Direct, Binned, and FFT methods.

Example usage

using KDEstimation, Distributions
# set a seed for reproducibility
using StableRNGs
rng = StableRNG(1111)
# generate random data
x = randn(rng, 100)
rot = rule_of_thumb2(Normal,x)
println("rule of thumb: ", rot)
lscv_res = lscv(Normal,x,FFT())
rule of thumb: 0.3955940866915174

LSCV{Normal,FFT(4096),1}
Results of Optimization Algorithm
 * Algorithm: Golden Section Search
 * Search Interval: [0.289408, 0.389348]
 * Minimizer: 3.457826e-01
 * Minimum: -2.834224e-01
 * Iterations: 33
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 34

Visualization using Plots.jl

using Plots; pyplot()
plot(lscv_res)

png

h = minimizer(lscv_res)
fkde = kde(Normal, h, x, FFT())
frot = kde(Normal, rot, x, FFT())
# these are callable
@show fkde(0.3);
@show frot(-2);
fkde(0.3) = 0.38237039523949345
frot(-2) = 0.04546902308913938
plot(fkde, label="LSCV", lw=2)
plot!(frot, label="Rule of thumb", lw=2)

png

Further Reading

This work has been heavily influenced by Artur Gramacki's book "Nonparametric Kernel Density Estimation and Its Computational Aspects"