FeatureTransforms.jl

Transformations for performing feature engineering in machine learning applications
Popularity
37 Stars
Updated Last
8 Months Ago
Started In
January 2021

FeatureTransforms

Stable Dev CI Codecov Code Style: Blue ColPrac: Contributor's Guide on Collaborative Practices for Community Packages

FeatureTransforms.jl provides utilities for performing feature engineering in machine learning pipelines with support for AbstractArrays and Tables.

Getting Started

There are a few key parts to the Transforms.jl API, refer to the documentation for each to learn more.

  1. Transforms are callable types that define certain operations to be performed on data, for example, normalizating or computing a linear combination. Refer to the Guide to Transforms to learn how they are defined and used on various types of input.
  2. The apply, apply! and apply_append methods are used to implement Transforms in various ways. Consult the Examples Section for a guide to some typical use cases. See also the example below.
  3. The Transform Interface is used when you want to encapsulate sequences of Transforms in an end-to-end feature engineering pipeline.
  4. For a full list of currently implemented Transforms, consult the API.

Installation

julia> using Pkg; Pkg.add("FeatureTransforms")

Quickstart

Load in the dependencies and construct some toy data.

julia> using DataFrames, FeatureTransforms

julia> df = DataFrame(:a=>[1, 2, 3, 4, 5], :b=>[5, 4, 3, 2, 1], :c=>[2, 1, 3, 1, 3])
5×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   11      5      2
   22      4      1
   33      3      3
   44      2      1
   55      1      3

Next, we construct the Transform that we want to perform on the data. This can be done one of three ways:

  1. apply which does not mutate the underlying data,
  2. apply! which does mutate the underlying data,
  3. apply_append which will apply transform then append the result to a copy of the input.

All Transforms support the non-mutating apply and apply_append methods, but any Transform that changes the type or dimension of the input does not support the mutating apply!.

In any case, the return type will be the same as the input, so if you provide an Array you get back an Array, and if you provide a Table you get back a Table. Here we are working with a DataFrame, so the return will always be a DataFrame:

julia> p = Power(3);

julia> FeatureTransforms.apply(df, p; cols=[:a], header=[:a3])
5×1 DataFrame
 Row │ a3    
     │ Int64 
─────┼───────
   11
   28
   327
   464
   5125

julia> FeatureTransforms.apply!(df, p; cols=[:a])
5×3 DataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   11      5      2
   28      4      1
   327      3      3
   464      2      1
   5125      1      3

julia> FeatureTransforms.apply_append(df, p; cols=[:a], header=[:a3])
5×4 DataFrame
 Row │ a      b      c      a3    
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   11      5      2      1
   22      4      1      8
   33      3      3     27
   44      2      1     64
   55      1      3    125

As an extra convenience, you can call the Transform type directly, which emulates calling apply:

julia> ohe = OneHotEncoding(1:3);

julia> lc = LinearCombination([1, -10]);

julia> ohe_df = ohe(df; cols=[:c], header=[:cat1, :cat2, :cat3])

julia> lc_df = lc(df; cols=[:a, :b], header=[:ab]);

julia> df = hcat(df, lc_df, ohe_df)
5×7 DataFrame
 Row │ a      b      c      ab     cat1   cat2   cat3  
     │ Int64  Int64  Int64  Int64  Bool   Bool   Bool  
─────┼─────────────────────────────────────────────────
   11      5      2    -49  false   true  false
   28      4      1    -32   true  false  false
   327      3      3     -3  false  false   true
   464      2      1     44   true  false  false
   5125      1      3    115  false  false   true

Used By Packages