This is a simple package. See below for a list of more complex packages for linear regression in Julia.
Because I keep finding myself thinking, "I need some simple linear regression
here...", and missing the level of abstraction halfway in between X \ y
and
GLM.jl, without lots of additional
dependencies.
I keep running into this Discourse
thread
and wishing I could just be using LinearRegression
.
Alistair.jl would fit the bill, but
hasn't been maintained and doesn't work with Julia 1+.
Linear regression based on vector and matrix inputs and outputs:
lr = linregress(X, y)
X
can be a vector (1D inputs, each element is one observation) or a matrix (multivariate inputs, each row is one observation, columns represent features).
y
can be a vector (1D outputs, each element is one observation) or a matrix (multivariate outputs, each row is one observation, columns represent targets).
Weighted linear regression:
lr = linregress(X, y, weights)
weights
is the vector of each observation's weight.
Intercept/bias term: By default, implicitly adds a column of ones to account for the intercept term.
You can disable this and force the linear regression to go through the origin by passing the intercept=false
keyword argument.
Choice of solver:
By default, uses QR factorization (X \ y
) to solve the linear system.
You can explicitly choose a solver by passing the method
keyword argument.
Currently implemented choices are method=SolveQR()
(using QR factorization, the default) and method=SolveCholesky()
(using Cholesky factorization; can be faster, but numerically less accurate).
Predicting:
ytest = lr(Xtest)
Extracting coefficients:
β = coef(lr)
which includes the intercept/bias in the last position, if intercept=true
(the default).
You can explicitly obtain slopes and intercept/bias by calling
LinearRegression.slope(lr)
LinearRegression.bias(lr)
I'm happy to receive issue reports and pull requests, though I am likely to say no to proposals that would significantly increase the scope of this package (see below for other packages with more features).
-
Be as comprehensive as SciML's LinearSolve.jl (on the other hand, less dependencies).
-
Ridge regression (use MultivariateStats.jl instead, or convince me it really should be part of LinearRegression.jl as well).
-
Handling of DataFrames (use GLM.jl instead).
-
Lots of regression statistics (use GLM.jl instead).
-
Different (non-Gaussian) observation models (use GLM.jl instead).
-
Sparse regression (use SparseRegression.jl instead).
-
Bayesian linear regression (use BayesianLinearRegressors.jl instead).
-
Online estimation (use OnlineStats.jl instead).
Want to suggest another package to recommend here? Feel free to open a pull request! (: