Learning Algebraic Varieties

Welcome to the LearningAlgebraicVarieties package from the article Learning Algebraic Varieties from Samples by P. Breiding, S. Kalisnik, B. Sturmfels and M. Weinstein.

Special thanks to V. Valve who helped updating the PHCcurve diagrams.

To install the package, open a new Julia session and type

]add LearningAlgebraicVarieties

After the installation is completed the command

using LearningAlgebraicVarieties

loads all the functions into the current session.

All functions accept m data points in ℝ^n or ℙ^(n-1) as an m×n matrix Ω; i.e., as arrays.

We provide some datasets in the JLD2 data format. Download the datasets.jld2 from this repository, navigate to its folder and use

using JLD2 # add this packages with ]add JLD2
@load "datasets.jld2"

Now, your session should contain a dictionary with name data containing some datasets.

How to make dimension diagrams

Here is an example:

DimensionDiagrams(Ω, true)

plots the all dimension diagrams for the data in projective space. On the other hand,

DimensionDiagrams(Ω, false, diagrams = [:CorrSum, :BoxCounting], eps_ticks = 10)

plots the dimension diagrams CorrSum and BoxCounting for data in euclidean space. The estimates are computed for 10 values of ϵ between 0 and 1.

The complete syntax of the DimensionDiagrams function is as follows.

DimensionDiagrams(
    Ω::Array{T,2},
    projective::Bool;
    diagrams  = [:CorrSum, :BoxCounting, :NPCA, :MLE, :ANOVA, :PHCurve],
    eps_ticks = 25,
    fontsize = 16,
    lw = 4
    ) where {T <: Number}

Here:

Ω is a matrix whose columns are the data points.
projective = false: makes diagrams in euclidean space.
projective = true: makes diagrams in projective space. There are some optional arguments.
methods: lists the dimension estimators to be plotted.
eps_ticks = k: puts k evenly spaced ϵ into [0,1]. At those ϵs the dimensions are computed.
fontsize: sets the font size of the axes.
lw: sets the line width.

How to compute multivariate Vandermonde matrices

Here is an example.

MultivariateVandermondeMatrix(Ω, 2, true)

computes the multivariate Vandermonde matrix for the sample Ω and all monomials of degree 2. The true value determines homogeneous equations. On the other hand, the Vandermonde matrix with all monomials of degree at most 2 is computed by

MultivariateVandermondeMatrix(Ω, 2, false)

It is also possible to define the exponents involved. For example,

exponents = [[1,0,0], [1,1,1]]
MultivariateVandermondeMatrix(Ω, exponents)

computes the multivariate Vandermonde matrix for Ω ⊂ ℝ^3 and the monomials x_1 and x_1 x_2 x_3.

Here is the full syntax

MultivariateVandermondeMatrix(Ω::Array{T},
                              d::Int64,
                              homogeneous_equations::Bool)

MultivariateVandermondeMatrix(data::Array{T},
                              exponents::Vector)

where

Ω is a matrix whose colums are the data.
d is the degree of the monomials.
homogeneous_equations = true restricts the space of monomials to monomials of degree d.
homogeneous_equations = false computes all monomials of degree at most d.
exponents is an array of exponents vectors.

How to find equations

Here is an example.

FindEquations(Ω, :with_svd, 2, true)

finds homogeneous equations of degree 2 using SVD to compute the kernel of the Vandermonde matrix, while

FindEquations(Ω, :with_qr, 3, false)

finds all polynomials of degree at most 3 and uses QR to compute the kernel of the Vandermonde matrix.

To find all equations with support x_1 x_2 and x_1^2 using the reduced row echelon form to compute the kernel, type

exponents = [[1,1], [2,0]]
FindEquations(Ω, :with_rref, exponents)

A multivariate Vandermonde matrix may be passed to FindEquations:

M = MultivariateVandermondeMatrix(Ω, 2, false)
FindEquations(M, :with_svd, τ)

where τ is a tolerance value.

The full syntax of FindEquations is as follows.

FindEquations(Ω::Array{T,2},
              alg::Symbol,
              d::Int64,
              homogeneous_equations::Bool)
              where  {T<:Number}

FindEquations(Ω::Array{T,2},
              alg::Symbol,
              exponents::Array{Array{Int64,1},1})
              where  {T<:Number}

FindEquations(M::MultivariateVandermondeMatrix,
              alg::Symbol,
              τ::Float64)

Here:

Ω is a matrix whose colums are the data points.
alg is the algorithm that should be used (one of :with_svd, :with_qr, :with_rref).
d is the degree of the equations.
homogeneous_equations = true restricts the search space to homogeneous polynomials.
homogeneous_equations = false computes all polynomials of degree at most d.
exponents is an array of exponent vectors.
τ is the tolerance value.

Distances

Computing distances is a key aspect in both dimension estimation and persistent homology. Here are the functions with which we compute distances.

To compute the scaled Fubini Study distances between the data points in Ω type

ScaledEuclidean(Ω)

On the other hand,to compute the scaled Fubini Study distances between the data points in Ω type

ScaledFubiniStudy(Ω)

Finally, the ellipsoid-driven complex is encoded in a distance matrix. It is computed by typing

EllipsoidDistances(Ω, f, λ)

where f is a vector of polynomials of the type provided by DynamicPolynomials.jl and λ sets the ratio between the principal axes of the ellipsoids