MLJBalancing.jl

A package with exported learning networks that combine resampling methods from Imbalance.jl and classification models from MLJ
Author JuliaAI
Popularity
5 Stars
Updated Last
8 Months Ago
Started In
September 2023

MLJBalancing

A package providing composite models wrapping class imbalance algorithms from Imbalance.jl with classifiers from MLJ.

โฌ Installation

import Pkg;
Pkg.add("MLJBalancing")

๐Ÿš… Sequential Resampling

This package allows chaining of resampling methods from Imbalance.jl with classification models from MLJ. Simply construct a BalancedModel object while specifying the model (classifier) and an arbitrary number of resamplers (also called balancers - typically oversamplers and/or undersamplers).

๐Ÿ“– Example

Construct the resamplers and the model

SMOTENC = @load SMOTENC pkg=Imbalance verbosity=0
TomekUndersampler = @load TomekUndersampler pkg=Imbalance verbosity=0
LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels verbosity=0

oversampler = SMOTENC(k=5, ratios=1.0, rng=42)
undersampler = TomekUndersampler(min_ratios=0.5, rng=42)

logistic_model = LogisticClassifier()

Wrap them all in BalancedModel

balanced_model = BalancedModel(model=logistic_model, balancer1=oversampler, balancer2=undersampler)

Here training data will be passed to balancer1 then balancer2, whose output is used to train the classifier model. When balanced_model is used for prediction, the resamplers balancer1 and blancer2 are bypassed.

In general, any number of balancers can be passed to the function, and the user can give the balancers arbitrary names while passing them.

At this point, balanced_model behaves like one single model

You can fit, predict, cross-validate and hyperparamter tune it like any other MLJ model. Here is an example for hyperparameter tuning:

r1 = range(balanced_model, :(balancer1.k), lower=3, upper=10)
r2 = range(balanced_model, :(balancer2.min_ratios), lower=0.1, upper=0.9)

tuned_balanced_model = TunedModel(
    model=balanced_model,
    tuning=Grid(goal=4),
    resampling=CV(nfolds=4),
    range=[r1, r2],
    measure=cross_entropy
);

mach = machine(tuned_balanced_model, X, y);
fit!(mach, verbosity=0);
fitted_params(mach).best_model

๐Ÿš†๐Ÿš† Parallel Resampling with Balanced Bagging

The package also offers an implementation of bagging over probabilistic classifiers where the majority class is repeatedly undersampled T times down to the size of the minority class. This undersampling scheme was proposed in the EasyEnsemble algorithm found in the paper Exploratory Undersampling for Class-Imbalance Learning. by Xu-Ying Liu, Jianxin Wu, & Zhi-Hua Zhou where an Adaboost model was used and the output scores were averaged.

Construct a BalancedBaggingClassifier

In this you must specify some probabilistic model, and optionally specify the number of bags T and the random number generator rng. If T is not specified it is set as the ratio between the majority and minority counts. If rng isn't specified then default_rng() is used.

LogisticClassifier = @load LogisticClassifier pkg=MLJLinearModels verbosity=0
logistic_model = LogisticClassifier()
bagging_model = BalancedBaggingClassifier(model=logistic_model, T=10, rng=Random.Xoshiro(42))

Now it behaves like one single model

You can fit, predict, cross-validate and hyperparameter-tune it like any other probabilistic MLJ model where X must be a table input (e.g., a dataframe).

mach = machine(bagging_model, X, y)
fit!(mach)
pred = predict(mach, X)