GREMLModels.jl is a Julia package for fitting variance component models that are structured according to known, dense, relationship matrices. The intended use case is the analysis of genetic and environmental components of variance for quantitative traits, when genetic relationship matrices are computed from genome-wide SNP data.
GREMLModels.jl can fit models of the form
In this model
where
The variance component parameters can be defined as functions of the vector of parameters
The package can be installed from github with
(@v1.6) pkg> add https://github.com/espenmei/GREMLModels.jl
Here is an example with simulated data showing how the package may be used to fit two variance components and two covariates for the means. The model for the means can be defined using the @formula
macro together with a DataFrame
storing the covariates and the response variable. The model for the covariance can be defined by providing a vector with relationship matrices.
using GREMLModels, LinearAlgebra, DataFrames, StatsModels
n, m = 1000, 2000
v = randn(n, m)
R1 = Symmetric(v * v' / m)
R2 = Diagonal(ones(n))
V = 2R1 + 4R2
y = cholesky(V).L * randn(n)
dat = DataFrame(y = y, x = rand(n), z = rand(n))
r = [R1, R2]
m = fit(GREMLModel, @formula(y ~ 1 + x + z), dat, r)
This should give an output reasonably similar to this
logLik -2 logLik AIC AICc BIC
-2314.5590 4629.1180 4639.1180 4639.1784 4663.6568
Variance component parameters:
Comp. Est. Std. Error
θ₁ 2.0943 -
θ₂ 4.0674 -
Fixed-effects parameters:
────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────────────────
(Intercept) -0.179071 0.203213 -0.88 0.3782
x -0.0029327 0.26284 -0.01 0.9911
z 0.435445 0.274538 1.59 0.1127
────────────────────────────────────────────────────
Not all arguments are available when fitting a model this way. Sometimes it may be useful to first define a model with the relevant arguments and in a next step to fit that model.
y = dat[!,:y]
X = Matrix(hcat(ones(n), dat[!,2:3]))
dat = GREMLData(y, X, r)
m2 = GREMLModel(dat, [1.0, 1.0], [0.0, 0.0])
fit!(m2)
This first creates an object of type GREMLData
that is used to hold the data input to a GREMLModel
. Then a model, represented by an object of type GREMLModel
is created from a constructor that accepts a GREMLData
, a vector of initial values for the variance component parameters [1.0, 1.0]
and a vector of lower bounds for those parameters [0.0, 0.0]
. And in the last line that model is fitted fit!(m2)
. The !
signals that the function modifies the model, which is a convention in julia.
Consider the model for responses obtained from offspring
where
The genetic relatedness matrices
In order to fit the model, the first step is to compute the gneetic relationship matrices. The genetic relationships will be estimated from a simulated dataset of 10000 SNPs from 6000 parent-offspring trios stored in plink format. The julia package SnpArrays
is used to read snp-data and compute genetic relationships.
using VCModels, LinearAlgebra, DataFrames, StatsModels, SnpArrays
gpath = joinpath(@__DIR__, "data", "trio")
trio = SnpData(gpath)
A = 2 * grm(trio.snparray)
Because the snp-data was arranged according to families within roles, the genetic relationship matrix can be partitioned as
K = div(trio.people, 3)
mid = 1:K
pid = K+1:2K
oid = 2K+1:3*K
Amm = A[mid, mid]
App = A[pid, pid]
Aoo = A[oid, oid]
Dpm = A[pid, mid] + A[mid, pid]
Dom = A[oid, mid] + A[mid, oid]
Dop = A[oid, pid] + A[pid, oid]
The off-diagonal blocks are added to compute the expressions within parenthesis in the covariance expression above. These relationship matrices are collected in a vector as before
R = Diagonal(ones(K))
r = [Amm, App, Aoo, Dpm, Dom, Dop, R]
It is also necessary to define a matrix of covariates for the means, which is just a constant here
X = ones(K)
The dataset also contains some simulated responses
y = parse.(Float64, trio.person_info[oid, :phenotype])
A challenge with this model is that some of the variance terms represent covariances, which puts restrictions on which values the parameters can take. If we collect the genetic parameters in a covariance matrix
this should be positive semidefinite. One way to enforce this is to let the parameters of the model be the elements of the lower-triangular Cholesky factorization
If
Then
This can be implemented by overriding the default VCModels.transform!
method. This is used to define the
function GREMLModels.transform!(δ::Vector, θ::Vector)
δ[1] = θ[1]^2
δ[2] = θ[2]^2 + θ[4]^2
δ[3] = θ[3]^2 + θ[5]^2 + θ[6]^2
δ[4] = θ[2] * θ[1]
δ[5] = θ[3] * θ[1]
δ[6] = θ[3] * θ[2] + θ[5] * θ[4]
δ[7] = θ[7]
end
The last element is the residual variance. We can try to fit this model by initializing L to identity and setting the lower bound for the diagonal elements to zero
dat = GREMLData(y, X, r)
m1 = GREMLModel(dat, [1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0], [0.0, -Inf, -Inf, 0.0, -Inf, 0.0, 0.0])
fit!(m1)
This should give an output similar to
logLik -2 logLik AIC AICc BIC
-17772.7032 35545.4065 35561.4065 35561.4305 35615.0026
Variance component parameters:
Comp. Est. Std. Error
θ₁ 1.9214 -
θ₂ 0.1848 -
θ₃ 1.1346 -
θ₄ -1.6814 -
θ₅ -1.2530 -
θ₆ 0.4947 -
θ₇ 9.3309 -
Fixed-effects parameters:
────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────
x1 2.07909 0.0448462 46.36 <1e-99
────────────────────────────────────────
Note that the variance components reported in the output are not the variance components in the model, they are the elements of
m1.δ
7-element Vector{Float64}:
3.6919273370876997
2.861211241117614
3.101860361178375
0.35508630372540173
2.1799820994344636
2.3164093450653436
9.330854714731743