GLFixedEffectModels.jl

Fast estimation of generalized linear models with high dimensional categorical variables in Julia
Author jmboehm
Popularity
12 Stars
Updated Last
2 Years Ago
Started In
January 2019

GLFixedEffectModels.jl

Lifecycle Build Status Coverage Status

This package estimates generalized linear models with high dimensional categorical variables. It builds on Matthieu Gomez's FixedEffects.jl and Amrei Stammann's Alpaca.

Installation

] add https://github.com/jmboehm/GLFixedEffectModels.jl.git

Example use

using GLFixedEffectModels, GLM, Distributions
using RDatasets

df = dataset("datasets", "iris")
df.binary = zeros(Float64, size(df,1))
df[df.SepalLength .> 5.0,:binary] .= 1.0
df.SpeciesDummy = categorical(df.Species)
idx = rand(1:3,size(df,1),1)
a = ["A","B","C"]
df.Random = vec([a[i] for i in idx])
df.RandomCategorical = categorical(df.Random)

m = @formula binary ~ SepalWidth + fe(SpeciesDummy)
x = nlreg(df, m, Binomial(), LogitLink(), start = [0.2] )

m = @formula binary ~ SepalWidth + PetalLength + fe(SpeciesDummy)
nlreg(df, m, Binomial(), LogitLink(), Vcov.cluster(:SpeciesDummy,:RandomCategorical) , start = [0.2, 0.2] )

Documentation

The main function is nlreg(), which returns a GLFixedEffectModel <: RegressionModel.

nlreg(df, formula::FormulaTerm,
    distribution::Distribution,
    link::GLM.Link,
    vcov::CovarianceEstimator; ...)

The required arguments are:

  • df: a Table
  • formula: A formula created using @formula.
  • distribution: A Distribution. See the documentation of GLM.jl for valid distributions.
  • link: A Link function. See the documentation of GLM.jl for valid link functions.
  • vcov: A CovarianceEstimator to compute the variance-covariance matrix.

The optional arguments are:

  • save::Union{Bool, Symbol} = false: Should residuals and eventual estimated fixed effects saved in a dataframe? Use save = :residuals to only save residuals. Use save = :fe to only save fixed effects.
  • method::Symbol: A symbol for the method. Default is :cpu. Alternatively, :gpu requires CuArrays. In this case, use the option double_precision = false to use Float32. This option is the same as for the FixedEffectModels.jl package.
  • contrasts::Dict = Dict() An optional Dict of contrast codings for each categorical variable in the formula. Any unspecified variables will have DummyCoding.
  • maxiter::Integer = 1000: Maximum number of iterations in the Newton-Raphson routine.
  • maxiter_center::Integer = 10000: Maximum number of iterations for centering procedure.
  • double_precision::Bool: Should the demeaning operation use Float64 rather than Float32? Default to true.
  • dev_tol::Real : Tolerance level for the first stopping condition of the maximization routine.
  • rho_tol::Real : Tolerance level for the stephalving in the maximization routine.
  • step_tol::Real : Tolerance level that accounts for rounding errors inside the stephalving routine
  • center_tol::Real : Tolerance level for the stopping condition of the centering algorithm. Default to 1e-8 if double_precision = true, 1e-6 otherwise.

Things that still need to be implemented

  • Better default starting values
  • Bias correction
  • Weights
  • Better StatsBase interface & prediction
  • Better benchmarking
  • Integration with RegressionTables.jl

Related Julia packages

  • FixedEffectModels.jl estimates linear models with high dimensional categorical variables (and with or without endogeneous regressors).
  • FixedEffects.jl is a package for fast pseudo-demeaning operations using LSMR. Both this package and FixedEffectModels.jl build on this.
  • Alpaca.jl is a wrapper to the Alpaca R package, which solves the same tasks as this package.
  • GLM.jl estimates generalized linear models, but without explicit support for categorical regressors.
  • Econometrics.jl provides routines to estimate multinomial logit and other models.
  • RegressionTables.jl will, in the future, support pretty printing of results from this package.

References

Fong, DC. and Saunders, M. (2011) LSMR: An Iterative Algorithm for Sparse Least-Squares Problems. SIAM Journal on Scientific Computing

Stammann, A. (2018) Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-way Fixed Effects. Mimeo, Heinrich-Heine University Düsseldorf

Used By Packages

No packages found.