This package provides a basic Julia implementation of the Hampel filter1, which is a robust method for detecting and replacing outliers in a univariate time series. Compared to convolutional filters, the Hampel filter is less likely to smooth edges and better at removing isolated spikes without affecting the rest of the data. But it is tunably less aggressive than a standard median filter.
Given the values
where
By default, the spread
For
julia> using HampelOutliers, Statistics, StatsBase
julia> x = collect(1:11); x[5] = -6;
julia> m, S = median(x), mad(x, normalize=true)
(6.0, 4.447806655516805)
At the default threshold,
julia> findall( Hampel.identify(x) )
1-element Vector{Int64}:
5
If we change the threshold to
julia> findall( Hampel.identify(x, threshold=3) )
Int64[]
In the context of a time series, the Hampel filter criterion and replacement is usually applied in a moving window fashion. For this package, the window length is always odd, and you specify the half-width. For example,
julia> x = @. cos((0:10) / 5);
julia> x[[5, 6]] .= [9, -3];
julia> findall( Hampel.filter(x, 1) .!= x )
Int64[]
julia> findall( Hampel.filter(x, 2) .!= x )
2-element Vector{Int64}:
5
6
One may also specify integer weights
before the median and spread are calculated.
If replaced values are used immediately in the calculations for following values, the filter is called recursive. That is, if
You can accomplish the recursive form by using the mutating Hampel.filter!
, as shown here:
julia> t = 0:40;
julia> x = @. sign(cos(3t)) + 0.1*sin(t/4);
julia> y = Hampel.filter(x, 4); # nonrecursive
julia> count(x .!= y)
8
julia> Hampel.filter!(x, x, 4); # recursive
julia> count(x .!= y)
17
At the ends of the sequence, the window refers to fictitious values that are outside the sequence. The boundary
keyword argument specifies how these situations are handled. The options are:
:truncate
(the default) means that the window is truncated at the ends of the sequence.:repeat
means that the sequence is extended by repeating the first and last values.:reflect
means that the sequence is extended by reflecting across the boundaries.
Footnotes
-
J Astola, P Kuosmanen, Fundamentals of nonlinear digital filtering (CRC Press, Boca Raton, FL, USA, 1997) ↩