OutlierDetectionData.jl is a package to download and read common outlier detection datasets. This package is a part of OutlierDetection.jl, the outlier detection ecosystem for Julia.
API Overview
The API currently is simple; we provide a single namespace per dataset collection. A dataset collection such as ODDS
bundles multiple outlier detection datasets. For each dataset collection, the following methods are provided:
List all available datasets in the collection:
list()
List a subset of datasets starting with prefix
:
list(prefix::Union{AbstractString, Regex})
Load a single dataset with name
. This command automatically starts to download the file if the file does not exist. Currently, the data is returned as a tuple containing X::DataFrame
and y::Vector{Int}
, where X
is a matrix of features with one observation per row and y
represents the labels with "normal"
indicating inliers and "outlier"
indicating outliers.
load(name::AbstractString)
Example:
The following example shows how you can load the "cardio"
dataset from the ODDS collection.
using OutlierDetectionData: ODDS
X, y = ODDS.load("cardio")
Available Collections:
The available collections are:
- ODDS, Outlier Detection DataSets, Shebuti Rayana, 2016
- ELKI, On the Evaluation of Unsupervised Outlier Detection, Campos et al., 2016
- TSAD, The UCR Time Series Archive, Dau et al., 2018
For the TSAD collection, the class with the least members is chosen as the anomaly class and all other classes are defined as normal. If there are multiple classes, the lexically first class is chosen.
Licenses
Please make sure that you check and accept the licenses of the individual datasets before publishing your work. This package is licensed under the terms of the MIT license.