Distributed High-Performance symbolic regression in Julia
377 Stars
Updated Last
11 Months Ago
Started In
January 2021

SymbolicRegression.jl searches for symbolic expressions which optimize a particular objective.

Latest release Documentation Forums Paper
version Dev Discussions Paper
Build status Coverage
CI Coverage Status

Check out PySR for a Python frontend.



Cite this software


Install in Julia with:

using Pkg

The heart of this package is the EquationSearch function, which takes a 2D array (shape [features, rows]) and attempts to model a 1D array (shape [rows]) using analytic functional forms.

Run with:

using SymbolicRegression

X = randn(Float32, 5, 100)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2

options = SymbolicRegression.Options(
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],

hall_of_fame = EquationSearch(
    X, y, niterations=40, options=options,

You can view the resultant equations in the dominating Pareto front (best expression seen at each complexity) with:

dominating = calculate_pareto_frontier(hall_of_fame)

This is a vector of PopMember type - which contains the expression along with the score. We can get the expressions with:

trees = [member.tree for member in dominating]

Each of these equations is a Node{T} type for some constant type T (like Float32).

You can evaluate a given tree with:

tree = trees[end]
output, did_succeed = eval_tree_array(tree, X, options)

The output array will contain the result of the tree at each of the 100 rows. This did_succeed flag detects whether an evaluation was successful, or whether encountered any NaNs or Infs during calculation (such as, e.g., sqrt(-1)).

Constructing trees

You can also manipulate and construct trees directly. For example:

using SymbolicRegression

options = Options(;
    binary_operators=[+, -, *, ^, /], unary_operators=[cos, exp, sin]
x1, x2, x3 = [Node(; feature=i) for i=1:3]
tree = cos(x1 - 3.2 * x2) - x1^3.2

This tree has Float64 constants, so the type of the entire tree will be promoted to Node{Float64}.

We can convert all constants (recursively) to Float32:

float32_tree = convert(Node{Float32}, tree)

We can then evaluate this tree on a dataset:

X = rand(Float32, 3, 100)
output, did_succeed = eval_tree_array(tree, X, options)

Exporting to SymbolicUtils.jl

We can view the equations in the dominating Pareto frontier with:

dominating = calculate_pareto_frontier(hall_of_fame)

We can convert the best equation to SymbolicUtils.jl with the following function:

eqn = node_to_symbolic(dominating[end].tree, options)
println(simplify(eqn*5 + 3))

We can also print out the full pareto frontier like so:


for member in dominating
    complexity = compute_complexity(member, options)
    loss = member.loss
    string = string_tree(member.tree, options)


Code structure

SymbolicRegression.jl is organized roughly as follows. Rounded rectangles indicate objects, and rectangles indicate functions.

(if you can't see this diagram being rendered, try pasting it into

flowchart TB
    op --> ES
    d --> ES
    subgraph ES[EquationSearch]
        direction TB
        IP --> p1
        IP --> p2
        subgraph p1[Thread 1]
            direction LR
            pop1 --> src[s_r_cycle]
            src --> opt[optimize_and_simplify_population]
            opt --> pop1
        subgraph p2[Thread 2]
            direction LR
            pop2 --> src2[s_r_cycle]
            src2 --> opt2[optimize_and_simplify_population]
            opt2 --> pop2
        pop1 --> hof
        pop2 --> hof
        hof --> migration
        pop1 <-.-> migration
        pop2 <-.-> migration
    ES --> output([HallOfFame])

The HallOfFame objects store the expressions with the lowest loss seen at each complexity.

The dependency structure of the code itself is as follows:

    AdaptiveParsimony --> Mutate
    AdaptiveParsimony --> Population
    AdaptiveParsimony --> RegularizedEvolution
    AdaptiveParsimony --> SingleIteration
    AdaptiveParsimony --> SymbolicRegression
    CheckConstraints --> Mutate
    CheckConstraints --> SymbolicRegression
    Complexity --> CheckConstraints
    Complexity --> HallOfFame
    Complexity --> LossFunctions
    Complexity --> Mutate
    Complexity --> Population
    Complexity --> SearchUtils
    Complexity --> SingleIteration
    Complexity --> SymbolicRegression
    ConstantOptimization --> Mutate
    ConstantOptimization --> SingleIteration
    Core --> AdaptiveParsimony
    Core --> CheckConstraints
    Core --> Complexity
    Core --> ConstantOptimization
    Core --> HallOfFame
    Core --> InterfaceDynamicExpressions
    Core --> LossFunctions
    Core --> Migration
    Core --> Mutate
    Core --> MutationFunctions
    Core --> PopMember
    Core --> Population
    Core --> Recorder
    Core --> RegularizedEvolution
    Core --> SearchUtils
    Core --> SingleIteration
    Core --> SymbolicRegression
    Dataset --> Core
    HallOfFame --> SearchUtils
    HallOfFame --> SingleIteration
    HallOfFame --> SymbolicRegression
    InterfaceDynamicExpressions --> LossFunctions
    InterfaceDynamicExpressions --> SymbolicRegression
    LossFunctions --> ConstantOptimization
    LossFunctions --> HallOfFame
    LossFunctions --> Mutate
    LossFunctions --> PopMember
    LossFunctions --> Population
    LossFunctions --> SymbolicRegression
    Migration --> SymbolicRegression
    Mutate --> RegularizedEvolution
    MutationFunctions --> Mutate
    MutationFunctions --> Population
    MutationFunctions --> SymbolicRegression
    Operators --> Core
    Operators --> Options
    Options --> Core
    OptionsStruct --> Core
    OptionsStruct --> Options
    PopMember --> ConstantOptimization
    PopMember --> HallOfFame
    PopMember --> Migration
    PopMember --> Mutate
    PopMember --> Population
    PopMember --> RegularizedEvolution
    PopMember --> SingleIteration
    PopMember --> SymbolicRegression
    Population --> Migration
    Population --> RegularizedEvolution
    Population --> SearchUtils
    Population --> SingleIteration
    Population --> SymbolicRegression
    ProgramConstants --> Core
    ProgramConstants --> Dataset
    ProgressBars --> SearchUtils
    ProgressBars --> SymbolicRegression
    Recorder --> Mutate
    Recorder --> RegularizedEvolution
    Recorder --> SingleIteration
    Recorder --> SymbolicRegression
    RegularizedEvolution --> SingleIteration
    SearchUtils --> SymbolicRegression
    SingleIteration --> SymbolicRegression
    Utils --> CheckConstraints
    Utils --> ConstantOptimization
    Utils --> Options
    Utils --> PopMember
    Utils --> SingleIteration
    Utils --> SymbolicRegression

Bash command to generate dependency structure from src directory (requires vim-stream):

echo 'stateDiagram-v2'
for f in *.jl; do
    for line in $(cat $f | grep -e 'import \.\.' -e 'import \.'); do
        echo $(echo $line | vims -s 'dwf:d$' -t '%s/^\.*//g' '%s/Module//g') $(basename "$f" .jl);
done | vims -l 'f a--> ' | sort

Search options