Iterative hard thresholding as a multiple regression model for GWAS
Iterative hard thresholding - a multiple regression approach to analyze data from a Genome Wide Association Studies (GWAS)

Download and install Julia. Within Julia, copy and paste the following:

using Pkg

This package supports Julia v1.5+ for Mac, Linux, and window machines.


Quick Start

The following uses data under the data directory. PLINK files are stored in normal.bed, normal.bim, normal.fam.

# load package & cd to data directory
using MendelIHT

# select k SNPs in PLINK file
result = iht("normal", 9, Normal) # run IHT with k = 9
result = iht("normal", 10, Normal, covariates="covariates.txt") # separately include covariates, k = 10
result = iht("normal", 10, Normal, covariates="covariates.txt", phenotypes="phenotypes.txt") # phenotypes are stored separately

# run cross validation to determine best k
mses = cross_validate("normal", Normal, path=1:20) # test k = 1, 2, ..., 20
mses = cross_validate("normal", Normal, path=[1, 5, 10, 15, 20]) # test k = 1, 5, 10, 15, 20
mses = cross_validate("normal", Normal, path=1:20, covariates="covariates.txt") # separately include covariates
mses = cross_validate("normal", Normal, path=1:20, covariates="covariates.txt", phenotypes="phenotypes.txt") # if phenotypes are in separate file

# other distributions
result = iht("plinkfile", 10, Bernoulli) # logistic regression with k = 10
result = iht("plinkfile", 10, Poisson) # Poisson regression with k = 10
result = iht("plinkfile", 10, NegativeBinomial, est_r=:Newton) # Negative Binomial regression + nuisnace parameter estimation

# Multivariate regression (multiple quantitative phenotypes)
result = iht("plinkfile", 10, MvNormal, phenotypes=[6, 7]) # phenotypes stored in 6th and 7th column of .fam file
result = iht("plinkfile", 10, MvNormal, phenotypes="phenotypes.txt") # phenotypes stored separate file

Please see our latest documentation for more detail.

Citation and Reproducibility:

See our paper for algorithmic details. If you use MendelIHT.jl, please cite:

  title={{Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity}},
  author={Chu, Benjamin B and Keys, Kevin L and German, Christopher A and Zhou, Hua and Zhou, Jin J and Sobel, Eric M and Sinsheimer, Janet S and Lange, Kenneth},
  publisher={Oxford University Press}

In the figures subfolder, one can find all the code to reproduce the figures and tables in our paper. Some syntax may be outdated, so please file an issue if you encounter any problem with reproducibility.

Bug fixes and user support

If you encounter a bug or need user support, please open a new issue on Github. Please provide as much detail as possible for bug reports, ideally a sequence of reproducible code that lead to the error.

PRs and feature requests are welcomed!