SoleData.jl

Manage unstructured and multimodal datasets!
Author aclai-lab
Popularity
11 Stars
Updated Last
5 Months Ago
Started In
October 2021

SoleData.jl – Datasets for data-driven symbolic AI

Stable Dev Build Status Coverage

In a nutshell

SoleData.jl defines optimized structures for representing logisets, that is, sets of logical interpretations. Logisets are the symbolic counterpart to Machine Learning datasets.

Examples

Propositional Logic

Symbolic AI treats tabular dataset (e.g., the Iris flower dataset) as a set of propositional interpretations (or propositional logiset), onto which formulas of propositional logic are interpreted.

julia> using SoleData, MLJBase;

julia> X = PropositionalLogiset(MLJBase.load_iris())
PropositionalLogiset (6.16 KBs)
├ # instances:                  150
├ # features:                   5
└ Table: ...

julia> φ = parseformula(
           "sepal_length > 5.8 ∧ sepal_width < 3.0 ∨ target == \"setosa\"";
           atom_parser = a->Atom(parsecondition(SoleData.ScalarCondition, a; featuretype = SoleData.VariableValue))
       )
SyntaxBranch: (sepal_length > 5.8 ∧ sepal_width < 3.0) ∨ target == setosa

julia> check(φ, X, 10) # Check the formula on a single instance
true

julia> satmask = check(φ, X); # Check the formula on the whole dataset

julia> slicedataset(X, satmask)
PropositionalLogiset (3.66 KBs)
├ # instances:                  79
├ # features:                   5
└ Table: ...


julia> slicedataset(X, (!).(satmask))
PropositionalLogiset (3.38 KBs)
├ # instances:                  71
├ # features:                   5
└ Table: ...

Modal Logic

Symbolic AI treats non-tabular datasets (e.g., datasets of time-series or images) as sets of interpretations (logisets) of more-than-propositional logics, that can express relational patterns. In the following example, a time-series dataset such as NATOPS is interpreted via a modal logic formalism based on intervals and Allen's (or Interval Algebra) relations. On each time series in NATOPS, we hereby check the following temporal property, encoded via a modal logical formula: "there an interval where V1 is always higher than -0.54, and such that there exists a later interval where either V3 is lower than -0.78, or V5 is higher than -0.84."

julia> Xdf, y = SoleData.load_arff_dataset("NATOPS");

julia> X = scalarlogiset(Xdf)
SupportedLogiset with 1 support (343.08 MBs)
├ worldtype:                   SoleLogics.Interval{Int64}
├ featvaltype:                 Float64
├ featuretype:                 SoleData.AbstractUnivariateFeature
├ frametype:                   SoleLogics.FullDimensionalFrame{1, SoleLogics.Interval{Int64}}
├ # instances:                 360
├ usesfullmemo:                true
├[BASE] UniformFullDimensionalLogiset of channel size (51,) (342.91 MBs)
│ ├ size × eltype:              (51, 51, 360, 48) × Float64
│ └ features:                   48 -> SoleData.AbstractUnivariateFeature[max[V1], min[V1], max[V2], min[V2], ..., min[V22], max[V23], min[V23], max[V24], min[V24]]
└[SUPPORT 1] FullMemoset (0 memoized values, 174.42 KBs))

julia> φ = parseformula(
           "⟨G⟩(min[V1] > -0.54 ∧ ⟨L⟩(max[V3] < -0.78 ∨ min[V5] > -0.84))",
           SoleLogics.diamondsandboxes(SoleLogics.IARelations);
           atom_parser = a->Atom(parsecondition(SoleData.ScalarCondition, a; featvaltype = Float64)),
       );
SyntaxBranch: ⟨G⟩(min[V1] > -0.54 ∧ ⟨L⟩(max[V3] < -0.78 ∨ min[V5] > -0.84))

julia> syntaxstring(φ; variable_names_map = names(Xdf)) |> println
⟨G⟩(min[X[Hand tip l]] > -0.54 ∧ ⟨L⟩(max[Z[Hand tip l]] < -0.78 ∨ min[Y[Hand tip r]] > -0.84))

julia> check(φ, X) # Query each instance
360-element Vector{Bool}:
 1
 1
 1
 1
 1
 1
 1
 1
 0
 1
 0
 1
 0
 1
 0
 1
 1
 1
 1
...

About

The package is developed by the ACLAI Lab @ University of Ferrara.

SoleData.jl provides the data layer for Sole.jl, an open-source framework for symbolic machine learning.