FastRunningMedian.jl

Efficient running median for Julia
Author Firionus
Popularity
6 Stars
Updated Last
1 Year Ago
Started In
September 2020

FastRunningMedian.jl

This Julia Package allows you to calculate a running median - fast.

Installation

In Julia, execute:

]add FastRunningMedian

High-level API

# FastRunningMedian.running_medianFunction.

running_median(input, window_size, tapering=:symmetric)

Run a median filter of window_size over the input array and return the result.

Taperings

The tapering decides the behaviour at the ends of the input. All taperings are mirror symmetric with respect to the middle of the input array. The available taperings are:

  • :symmteric or :sym: Ensure that the window is symmetric around each point of the output array by always growing or shrinking the window by 2. The output has the same length as the input if window_size is odd. If window_size is even, the output has one element less.
  • :asymmetric or :asym: Always adds or removes one element when calculating the next output value. Creates asymmetric windowing at the edges of the array. If the input is N long, the output is N+window_size-1 elements long.
  • :asymmetric_truncated or :asym_trunc: The same as asymmetric, but truncated at beginning and end to match the size of :symmetric.
  • :none or :no: No tapering towards the ends. If the input has N elements, the output is only N-window_size+1 long.

If you choose an even window_size, the elements of the output array lie in the middle between the input elements on a continuous underlying axis.

Performance

The underlying algorithm should scale as O(N log w) with the input size N and the window_size w.

source

Taperings Visualized

Each data point is shown as a cross and the windows are visualized as colored boxes, the input is grey.

Tapering Examples

Performance Comparison

Benchmark Comparison

For large window sizes, this package performs even better than calling runmed in R, which uses the Turlach implementation written in C. For small window sizes, the Stuetzle implementation in R still outperforms this package, but the overhead from RCall doesn't seem worth it. Development of a fast implementation for small window sizes is ongoing, see the corresponding issues for details.

In contrast to this package, SortFilters.jl supports arbitrary probability levels, for example to calculate quantiles.

You can find the Notebook used to create the above graph in the benchmark folder. I ran it on an i7-2600K with 8 GB RAM while editing and browsing in the background.

Stateful API

FastRunningMedian provides a stateful API that can be used for streaming data, e. g. to reduce RAM consumption, or build your own high-level API.

# FastRunningMedian.MedianFilterType.

MedianFilter(first_val::T, window_size::Int) where T <: Real

Construct a stateful running median filter.

Manipulate with grow!, roll!, shrink!. Query with median, length, window_size, isfull.

source

# FastRunningMedian.grow!Function.

grow!(mf::MedianFilter, val)

Grow mf with the new value val.

If mf would grow beyond maximum window size, an error is thrown. In this case you probably wanted to use roll!.

The new element is pushed onto the end of the circular buffer.

source

# FastRunningMedian.roll!Function.

roll!(mf::MedianFilter, val)

Roll the window over to the next position by replacing the first and oldest element in the ciruclar buffer with the new value val.

Will error when mf is not full yet - in this case you must first grow! mf to maximum capacity.

source

# FastRunningMedian.shrink!Function.

shrink!(mf::MedianFilter)

Shrinks mf by removing the first and oldest element in the circular buffer.

Will error if mf contains only one element as a MedianFilter with zero elements would not have a median.

source

# FastRunningMedian.medianFunction.

median(mf::MedianFilter)

Determine the current median in mf.

Implementation

If the number of elements in MedianFilter is odd, the low_heap is always one element bigger than the high_heap. The top element of the low_heap then is the median.

If the number of elements in MedianFilter is even, both heaps are the same size and the median is the mean of both top elements.

source

# Base.lengthFunction.

length(mf::MedianFilter)

Returns the number of elements in the stateful median filter mf.

This number is equal to the length of the internal circular buffer.

source

# FastRunningMedian.window_sizeFunction.

window_size(mf::MedianFilter)

Returns the window_size of the stateful median filter mf.

This number is equal to the capacity of the internal circular buffer.

source

# FastRunningMedian.isfullFunction.

isfull(mf::MedianFilter)

Returns true, when the length of the stateful median filter mf equals its window_size.

source

Sources

W. Hardle, W. Steiger 1995: Optimal Median Smoothing. Published in Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 44, No. 2 (1995), pp. 258-264. https://doi.org/10.2307/2986349

(I did not implement their custom double heap, but used two heaps from DataStructures.jl)

Keywords

Running Median is also known as Rolling Median or Moving Median.

Used By Packages