FeatureTransforms.jl provides utilities for performing feature engineering in machine learning pipelines with support for AbstractArrays and Tables.
There are a few key parts to the Transforms.jl API, refer to the documentation for each to learn more.
Transforms are callable types that define certain operations to be performed on data, for example, normalizating or computing a linear combination. Refer to the Guide to Transforms to learn how they are defined and used on various types of input.- The
apply,apply!andapply_appendmethods are used to implementTransforms in various ways. Consult the Examples Section for a guide to some typical use cases. See also the example below. - The Transform Interface is used when you want to encapsulate sequences of
Transforms in an end-to-end feature engineering pipeline. - For a full list of currently implemented
Transforms, consult the API.
julia> using Pkg; Pkg.add("FeatureTransforms")Load in the dependencies and construct some toy data.
julia> using DataFrames, FeatureTransforms
julia> df = DataFrame(:a=>[1, 2, 3, 4, 5], :b=>[5, 4, 3, 2, 1], :c=>[2, 1, 3, 1, 3])
5×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 2
2 │ 2 4 1
3 │ 3 3 3
4 │ 4 2 1
5 │ 5 1 3Next, we construct the Transform that we want to perform on the data.
This can be done one of three ways:
applywhich does not mutate the underlying data,apply!which does mutate the underlying data,apply_appendwhich willapplytransform thenappendthe result to a copy of the input.
All Transforms support the non-mutating apply and apply_append methods, but any Transform that changes the type or dimension of the input does not support the mutating apply!.
In any case, the return type will be the same as the input, so if you provide an Array you get back an Array, and if you provide a Table you get back a Table.
Here we are working with a DataFrame, so the return will always be a DataFrame:
julia> p = Power(3);
julia> FeatureTransforms.apply(df, p; cols=[:a], header=[:a3])
5×1 DataFrame
Row │ a3
│ Int64
─────┼───────
1 │ 1
2 │ 8
3 │ 27
4 │ 64
5 │ 125
julia> FeatureTransforms.apply!(df, p; cols=[:a])
5×3 DataFrame
Row │ a b c
│ Int64 Int64 Int64
─────┼─────────────────────
1 │ 1 5 2
2 │ 8 4 1
3 │ 27 3 3
4 │ 64 2 1
5 │ 125 1 3
julia> FeatureTransforms.apply_append(df, p; cols=[:a], header=[:a3])
5×4 DataFrame
Row │ a b c a3
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 5 2 1
2 │ 2 4 1 8
3 │ 3 3 3 27
4 │ 4 2 1 64
5 │ 5 1 3 125
As an extra convenience, you can call the Transform type directly, which emulates calling apply:
julia> ohe = OneHotEncoding(1:3);
julia> lc = LinearCombination([1, -10]);
julia> ohe_df = ohe(df; cols=[:c], header=[:cat1, :cat2, :cat3])
julia> lc_df = lc(df; cols=[:a, :b], header=[:ab]);
julia> df = hcat(df, lc_df, ohe_df)
5×7 DataFrame
Row │ a b c ab cat1 cat2 cat3
│ Int64 Int64 Int64 Int64 Bool Bool Bool
─────┼─────────────────────────────────────────────────
1 │ 1 5 2 -49 false true false
2 │ 8 4 1 -32 true false false
3 │ 27 3 3 -3 false false true
4 │ 64 2 1 44 true false false
5 │ 125 1 3 115 false false true