# FastRunningMedian.jl

This Julia Package allows you to calculate a running median - fast.

## Installation

In Julia, execute:

`]add FastRunningMedian`

## High-level API

#
** FastRunningMedian.running_median** —

*Function*.

`running_median(input, window_size, tapering=:symmetric)`

Run a median filter of `window_size`

over the input array and return the result.

**Taperings**

The tapering decides the behaviour at the ends of the input. All taperings are mirror symmetric with respect to the middle of the input array. The available taperings are:

`:symmteric`

or`:sym`

: Ensure that the window is symmetric around each point of the output array by always growing or shrinking the window by 2. The output has the same length as the input if`window_size`

is odd. If`window_size`

is even, the output has one element less.`:asymmetric`

or`:asym`

: Always adds or removes one element when calculating the next output value. Creates asymmetric windowing at the edges of the array. If the input is N long, the output is N+window_size-1 elements long.`:asymmetric_truncated`

or`:asym_trunc`

: The same as asymmetric, but truncated at beginning and end to match the size of`:symmetric`

.`:none`

or`:no`

: No tapering towards the ends. If the input has N elements, the output is only N-window_size+1 long.

If you choose an even `window_size`

, the elements of the output array lie in the middle between the input elements on a continuous underlying axis.

**Performance**

The underlying algorithm should scale as O(N log w) with the input size N and the window_size w.

## Taperings Visualized

Each data point is shown as a cross and the windows are visualized as colored boxes, the input is grey.

## Performance Comparison

For large window sizes, this package performs even better than calling `runmed`

in R, which uses the Turlach implementation written in C. For small window sizes, the Stuetzle implementation in R still outperforms this package, but the overhead from RCall doesn't seem worth it. If you want to add a fast implementation for small window sizes to this package, feel free to put in a PR or open an issue with any questions you might have.

In contrast to this package, SortFilters.jl supports arbitrary probability levels, for example to calculate quantiles.

You can find the Notebook used to create the above graph in the `benchmark`

folder. I ran it on an i7-2600K with 8 GB RAM while editing and browsing in the background.

## Stateful API

FastRunningMedian provides a stateful API that can be used for streaming data, e. g. to reduce RAM consumption, or build your own high-level API.

#
** FastRunningMedian.MedianFilter** —

*Type*.

`MedianFilter(first_val::T, window_size::Int) where T <: Real`

Construct a stateful running median filter.

Manipulate with `grow!`

, `roll!`

, `shrink!`

. Query with `median`

, `length`

, `window_size`

, `isfull`

.

#
** FastRunningMedian.grow!** —

*Function*.

`grow!(mf::MedianFilter, val)`

Grow mf with the new value `val`

.

Returns the updated median. If mf would grow beyond maximum window size, an error is thrown. In this case you probably wanted to use `roll!`

.

The new element is pushed onto the end of the circular buffer.

#
** FastRunningMedian.roll!** —

*Function*.

`roll!(mf::MedianFilter, val)`

Roll the window over to the next position by replacing the first and oldest element in the ciruclar buffer with the new value `val`

.

Will error when `mf`

is not full yet - in this case you must first `grow!`

mf to maximum capacity.

#
** FastRunningMedian.shrink!** —

*Function*.

`shrink!(mf::MedianFilter)`

Shrinks `mf`

by removing the first and oldest element in the circular buffer.

Returns the updated median. Will error if mf contains only one element as a MedianFilter with zero elements would not have a median.

#
** FastRunningMedian.median** —

*Function*.

`median(mf::MedianFilter)`

Determine the current median in `mf`

.

**Implementation**

If the number of elements in MedianFilter is odd, the low_heap is always one element bigger than the high_heap. The top element of the low_heap then is the median.

If the number of elements in MedianFilter is even, both heaps are the same size and the median is the mean of both top elements.

#
** Base.length** —

*Function*.

`length(mf::MedianFilter)`

Returns the number of elements in the stateful median filter `mf`

.

This number is equal to the length of the internal circular buffer.

#
** FastRunningMedian.window_size** —

*Function*.

`window_size(mf::MedianFilter)`

Returns the window_size of the stateful median filter `mf`

.

This number is equal to the capacity of the internal circular buffer.

#
** FastRunningMedian.isfull** —

*Function*.

`isfull(mf::MedianFilter)`

Returns true, when the length of the stateful median filter `mf`

equals its window_size.

## Sources

W. Hardle, W. Steiger 1995: Optimal Median Smoothing. Published in Journal of the Royal Statistical Society, Series C (Applied Statistics), Vol. 44, No. 2 (1995), pp. 258-264. https://doi.org/10.2307/2986349

(I did not implement their custom double heap, but used two heaps from DataStructures.jl)

## Keywords

Running Median is also known as Rolling Median or Moving Median.