| Linux | Coverage | Documentation |
|---|---|---|
This package makes a distinction between machine type and scientific type of a Julia object:
-
The machine type refers to the Julia type being used to represent the object (for instance,
Float64). -
The scientific type is one of the types defined in ScientificTypesBase.jl reflecting how the object should be interpreted (for instance,
ContinuousorMulticlass).
using Pkg
Pkg.add("ScientificTypes")- developers of statistical and scientific software who want to articulate their data type requirements in a generic, purpose-oriented way, and who are furthermore happy to adopt an existing convention about what data types should be used for what purpose (a convention first developed for the MLJ ecosystem, but useful in a general context)
The module ScientificTypes defined in this repo rexports the
scientific types and associated methods defined in ScientificTypesBase.jl
and provides:
-
a collection of
scitypedefinitions that articulate a default convention. -
a
coercefunction, for changing machine types to reflect a specified scientific interpretation (scientific type) -
an
autotypefuction for "guessing" the intended scientific type of data
For more information and examples please refer to the manual.
using ScientificTypes, DataFrames
X = DataFrame(
a = randn(5),
b = [-2.0, 1.0, 2.0, missing, 3.0],
c = [1, 2, 3, 4, 5],
d = [0, 1, 0, 1, 0],
e = ['M', 'F', missing, 'M', 'F'],
)
sch = schema(X)will print
┌───────┬────────────────────────────┬─────────────────────────┐
│ names │ scitypes │ types │
├───────┼────────────────────────────┼─────────────────────────┤
│ a │ Continuous │ Float64 │
│ b │ Union{Missing, Continuous} │ Union{Missing, Float64} │
│ c │ Count │ Int64 │
│ d │ Count │ Int64 │
│ e │ Union{Missing, Unknown} │ Union{Missing, Char} │
└───────┴────────────────────────────┴─────────────────────────┘
Detail is obtained in the obvious way; for example:
julia> sch.names
(:a, :b, :c, :d, :e)To specify that instead b should be regared as Count, and that both d and e are Multiclass, we use the coerce function:
Xc = coerce(X, :b=>Count, :d=>Multiclass, :e=>Multiclass)
schema(Xc)which prints
┌───────┬───────────────────────────────┬────────────────────────────────────────────────┐
│ names │ scitypes │ types │
├───────┼───────────────────────────────┼────────────────────────────────────────────────┤
│ a │ Continuous │ Float64 │
│ b │ Union{Missing, Count} │ Union{Missing, Int64} │
│ c │ Count │ Int64 │
│ d │ Multiclass{2} │ CategoricalValue{Int64, UInt32} │
│ e │ Union{Missing, Multiclass{2}} │ Union{Missing, CategoricalValue{Char, UInt32}} │
└───────┴───────────────────────────────┴────────────────────────────────────────────────┘
ScientificTypes is based on code from MLJScientificTypes.jl (now deprecated) and in particular builds on contributions of Anthony Blaom (@ablaom), Thibaut Lienart (@tlienart), Samuel Okon (@OkonSamuel), and others not recorded in the ScientificTypes commit history.
ScientificTypes.jl 2.0 implements the DefaultConvention, which
coincides with the deprecated MLJ convention of
MLJScientificTypes.jl
0.4.8. The code at ScientificTypes 1.1.2 (which defined only the API)
became
ScientificTypesBase.jl
1.0.