Probability distributions and measures for finite sample spaces whose
elements are labeled (consist of the class pool of a
CategoricalArray
).
Designed for performance in machine learning applications. For
example, probabilistic classifiers in
MLJ typically
predict the UnivariateFiniteVector
objects defined in this package.
For probability distributions over integers see the Distributions.jl package, whose methods the current package extends.
Linux | Coverage |
---|---|
using Pkg
Pkg.add("CategoricalDistributions")
The sample space of the UnivariateFinite
distributions provided by
this package is the class pool of a CategoricalArray
:
using CategoricalDistributions
using CategoricalArrays
import Distributions
import UnicodePlots # for optional pretty display
data = ["no", "yes", "no", "maybe", "maybe", "no",
"maybe", "no", "maybe"] |> categorical
julia> d = Distributions.fit(UnivariateFinite, data)
UnivariateFinite{Multiclass{3}}
┌ ┐
maybe ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.4
no ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.5
yes ┤■■■■■■■ 0.1
└ ┘
julia> pdf(d, "no")
0.5
julia> mode(d)
CategoricalValue{String, UInt32} "no"
A UnivariateFinite
distribution can also be constructed directly
from a probability vector:
julia> d2 = UnivariateFinite(["no", "yes"], [0.15, 0.85], pool=data)
UnivariateFinite{Multiclass{3}}
┌ ┐
no ┤■■■■■■ 0.15
yes ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.85
└ ┘
A UnivariateFinite
distribution tracks all classes in the pool:
levels(d2)
3-element Vector{String}:
"maybe"
"no"
"yes"
julia> pdf(d2, "maybe")
0.0
julia> pdf(d2, "okay")
ERROR: DomainError with Value okay not in pool. :
Arrays of UnivariateFinite
distributions are defined using the same
constructor. Broadcasting methods, such as pdf
, are optimized for
such arrays:
julia> v = UnivariateFinite(["no", "yes"], [0.1, 0.2, 0.3, 0.4], augment=true, pool=data)
4-element UnivariateFiniteArray{Multiclass{3}, String, UInt32, Float64, 1}:
UnivariateFinite{Multiclass{3}}(no=>0.9, yes=>0.1)
UnivariateFinite{Multiclass{3}}(no=>0.8, yes=>0.2)
UnivariateFinite{Multiclass{3}}(no=>0.7, yes=>0.3)
UnivariateFinite{Multiclass{3}}(no=>0.6, yes=>0.4)
julia> pdf.(v, "no")
4-element Vector{Float64}:
0.9
0.8
0.7
0.6
Query the UnivariateFinite
doc-string for advanced constructor options.
A (non-standard) implementation of pdf
allows for extraction of the full
probability array:
julia> L = levels(data)
3-element Vector{String}:
"maybe"
"no"
"yes"
julia> pdf(v, L)
4×3 Matrix{Float64}:
0.0 0.9 0.1
0.0 0.8 0.2
0.0 0.7 0.3
0.0 0.6 0.4
There is, in fact, no enforcement that probabilities in a
UnivariateFinite
distribution sum to one, only that they be belong
to a type T
for which zero(T)
is defined. In particular
UnivariateFinite
objects implement arbitrary non-negative, signed,
or complex measures over a finite labeled set.
-
A new type
UnivariateFinite{S}
for representing probability distributions over the pool of aCategoricalArray
, that is, over finite labeled sets. HereS
is a subtype ofOrderedFactor
from ScientificTypesBase.jl, if the pool is ordered, or ofMulticlass
if the pool is unordered. -
A new array type
UnivariateFiniteArray{S} <: AbstractArray{<:UnivariateFinite{S}}
for efficiently manipulating arrays ofUnivariateFinite
distributions. -
Implementations of
rand
for generating random samples of aUnivariateFinite
distribution. -
Implementations of the
pdf
,logpdf
,mode
andmodes
methods of Distributions.jl, with efficient broadcasting over the new array type. -
Implementation of
Distributions.fit
from Distributions.jl forUnivariateFinite
distributions. -
A single constructor for constructing
UnivariateFinite
distributions and arrays thereof, from arrays of probabilities.
The initial release of this package is based almost entirely on code originally residing in MLJBase.jl with contributions from Anthony Blaom, Thibaut Lienart, Samuel Okon, and Chad Scherrer. These contributions are not reflected in the current repository's commit history.