# Motif recognition

Appveyor |
---|

This module proposes a detection algorithm based on JEREMY BUHLER and MARTIN TOMPA's paper "*Finding Motifs Using Random Projections*" which is well adapted for categorical time-series. It is now part of the package CategoricalTimeSeries.jl

The algorithm although very precise is not exact. Therefore, when you are done detecting potential motifs with the `detect_motifs`

function, you can refine your results with `find_motifs`

for an exact search.

For a quick overview, have a look at the **examples** provided **at the end of this readme**.

The main functions return instances of a class called **pattern**:

**pattern — Class**

A class storing useful information about found motifs in a time-series. An array of `pattern`

instances is returned when the searching algorithm is done running.

Attributes:

**shape**(Array{Any,1}): Array containing the shape (or contour) of the first found repetition of the motif.**repetitions**(Array{Array{Any,1},1}): all the different shapes from the motif's repetitions, they can vary a bit from one to the next.**positions**(Array{Int,1}): the positions at which the different repetitions of the motif were found.

## Main functions

**detect_motifs**

`detect_motifs(ts, w, d, t = w - d; iters = 1000, tolerance = 0.95)`

Detects all motifs of length 'w' occuring more often than chance, being identical to each other up to 'd' differences inside of imput time-series 'ts'. 'ts' can be a single vector containing one timeseries, a vector of vectors or a matrix containing several timeseries.
Returns an array of `pattern`

, inside of which the patterns are classified by how frequently they are observed. The first elements is therefore the most frequently observed motif, and so on.

Parameters:

ts(Array{Any,1}, Array{Array{Any,1}, 1}, Array{Any, 2}): input time-series in which motifs are searched for.w(Int): length of motifs to look for.d(Int): allowed errors (differences) between motifs repetitions.t = w - d(Int): size of the masks to use for random projection in the detection (defaults to w - d).iters = 1000(Int): the numbers of iterations for the random projection process (defaults to 1000)tolerance = 0.95(Float): threshold of motif identification. If set to 1, only matrix entries that are strictly superior to the (probabilistic) threshold are taken into account. Defaults to 0.7, meaning that matrix entries need to be bigger than 0.7*threshold.

Returns:

motifs: list of`pattern`

instances sorted by frequency of occurence. motifs[1] is therefore the most frequent motif, motifs[2] the second most observed and so on.

**find_motifs**

`find_motifs(ts, shape, d)`

Given a motif of shape 'shape' (array{any,1}), looks for all the repetitions of it which differ only up to 'd' differences inside of the input time-series 'ts'. Input:

Parameters:

ts(Array{Any,1}) : time-series in which to look for motifsshape(Array{Any,1}): shape (aray{any,1}) of the motif to look for.d(Int): allowed errors (differences) between motifs

Returns:

motif: an instance of`pattern`

containing the found repetition of the input 'shape'.

## Plotting

To help visualize results, two simple plotting functions are provided.

**plot_motif**

`plot_motif(m::pattern)`

Plots all repetitions of an input `pattern`

instance on top of each other to see how similar they are to each other.

Parameters:

m: Instance of the`pattern`

class

**plot_motif**

`plot_motif(m::pattern, ts)`

Plots all repetitions of an input `pattern`

instance on top of the input time-series 'ts' to better visualize their repartition in time.

Parameters:

m: Instance of the`pattern`

classts(Array{Any,1}): Input time-series

## Example

From Michael Brecker's improvisation over the piece "confirmation", we extract a time-series of pitch intervals (difference from one note to the next). A spectral envelope analysis reveals a peak at period 6~7, so we look for motifs of length 7 and allow for 1 error between them. After detection, we visualize the most frequent motif:

```
using MotifRecognition
using DelimitedFiles
using Plots
data = readdlm(joinpath(pkgdir(MotifRecognition), "test", "confirmation"))
pitch = mod.(data, 12) #Removing octave position: not needed
intervals = pitch[2:end] .- pitch[1:end-1] #getting interval time-series.
m = detect_motifs(intervals, 7, 1; iters = 700, tolerance = 0.7)
plot(m[1]) #plotting most frequent motif
```

We notice that the motif `[-1, -2, 10, -10, 2, 3, 5]`

seems to be the underlying (consensus) shape. In musical notation, this motif would look like this (written in C major):

We do an exact search with 1 error allowed to check if our previous detection missed any repetitions, and plot the found motif on top of each other:

```
consensus_shape = [-1, -2, 10, -10, 2, 3, 5]
motif = find_motifs(intervals, consensus_shape, 1)
plot(motif)
```

Here, we obtain the same plot as before but this is not necessarily always the case. Knowing the consensus motif usually allows to find its repetitions more efficiently.

Now, we visualize the repetitions of the motif in the time-series:

`plot(motif, intervals)`

### Citing

If you used this module in a scientific publication, please consider citing the package it came from:

```
@article{nelias2021categoricaltimeseries,
title={CategoricalTimeSeries. jl: A toolbox for categorical time-series analysis},
author={Nelias, Corentin},
journal={Journal of Open Source Software},
volume={6},
number={67},
pages={3733},
year={2021}
}
```