Documentation | Build Status | Help |
---|---|---|
Lale.jl is a Julia wrapper of Python's Lale library for semi-automated data science. Lale makes it easy to automatically select algorithms and tune hyperparameters of pipelines that are compatible with scikit-learn, in a type-safe fashion.
More details of the design can be found in the paper: Lale AutoML@KDD.
Instructions for Lale developers can be found here.
For a quick notebook demo: Lale Notebook Demo or you can view it with NBViewer.
- automation: provides a consistent high-level interface to existing pipeline search tools including Hyperopt, GridSearchCV, and SMAC
- correctness checks: uses JSON Schema to catch mistakes when there is a mismatch between hyperparameters and their type, or between data and operators
- interoperability: supports growing library of transformers and estimators
Here is an example of a typical Lale
pipeline using the following processing elements: Principal
Component Analysis (PCA), NoOp (no operation), Random Forest Regression (RFR),
and Decision Tree Regression (DTree):
lalepipe = (PCA + NoOp) >> (RFR | DTree)
laleopt = LalePipeOptimizer(lalepipe,max_evals = 10,cv = 3)
laletr = fit!(laleopt, Xtrain,Ytrain)
pred = transform!(laletr,Xtest)
The block of code above will jointly search the optimal hyperparameters of both Random Forest and Decision Tree learners and select the best learner while at the same time searching the optimal hyperparameters of the PCA.
The pipe combinator, p1 >> p2
, first runs sub-pipeline
p1
and then pipes its output into sub-pipeline p2
.
The union combinator, p1 + p2
, runs sub-pipelines p1
and p2
separately
over the same data, and then concatenates the output columns of both.
The or combinator, p1 | p2
, creates an algorithmic choice for the optimizer
to search and select which between p1
and p2
yields better results.
Lale is in the Julia General package registry. The latest release can be installed from the julia prompt:
julia> using Pkg
julia> Pkg.update()
julia> Pkg.add("Lale")
or use Julia's pkg shell which can be triggered by ]
julia> ]
pkg> update
pkg> add Lale
using Lale
using DataFrames: DataFrame
# load data
iris = getiris()
Xreg = iris[:,1:3] |> DataFrame
Yreg = iris[:,4] |> Vector
Xcl = iris[:,1:4] |> DataFrame
Ycl = iris[:,5] |> Vector
# regression dataset
regsplit = train_test_split(Xreg,Yreg;testprop = 0.20)
trXreg,trYreg,tstXreg,tstYreg = regsplit
# classification dataset
clsplit = train_test_split(Xcl,Ycl;testprop = 0.20)
trXcl,trYcl,tstXcl,tstYcl = clsplit
# lale ops
pca = laleoperator("PCA")
rb = laleoperator("RobustScaler")
noop = laleoperator("NoOp","lale")
rfr = laleoperator("RandomForestRegressor")
rfc = laleoperator("RandomForestClassifier")
treereg = laleoperator("DecisionTreeRegressor")
# Lale regression
lalepipe = (pca + noop) >> (rfr | treereg )
lale_hopt = LalePipeOptimizer(lalepipe,max_evals = 10,cv = 3)
laletrain = fit(lale_hopt,trXreg,trYreg)
lalepred = transform(laletrain,tstXreg)
score(:rmse,lalepred,tstYreg) |> println
# Lale classification
lalepipe = (rb + pca) |> rfc
lale_hopt = LalePipeOptimizer(lalepipe,max_evals = 10,cv = 3)
laletrain = fit(lale_hopt,trXcl,trYcl)
lalepred = transform(laletrain,tstXcl)
score(:accuracy,lalepred,tstYcl) |> println
Moreover, Lale is also compatible with AutoMLPipeline @pipeline
syntax:
# regression pipeline
regpipe = @pipeline (pca + rb) |> rfr
regmodel = fit(regpipe,trXreg, trYreg)
regpred = transform(regmodel,tstXreg)
regperf(x,y) = score(:rmse,x,y)
regperf(regpred, tstYreg) |> println
crossvalidate(regpipe,Xreg,Yreg,regperf)
# classification pipeline
clpipe = @pipeline (pca + noop) |> rfc
clmodel = fit(clpipe,trXcl, trYcl)
clpred = transform(clmodel,tstXcl)
classperf(x,y) = score(:accuracy,x,y)
classperf(clpred, tstYcl) |> println
crossvalidate(clpipe,Xcl,Ycl,classperf)