SimGBS.jl

A simple method to simulate Genotyping-by-Sequencing (GBS) data.
Author anshess
Popularity
3 Stars
Updated Last
3 Months Ago
Started In
January 2021

SimGBS: A Julia Package to Simulate Genotyping-by-Sequencing (GBS) Data

Open In Colab latest Build Status GitHub release (latest by date) GitHub issues Hits GitHub license

Introduction

SimGBS is a versatile method of simulating Genotyping-by-Sequencing (GBS) data. It can be implemented with any genome of choice. Users can modify different parameters to customise GBS setting, such as the choice of restriction enzyme and sequencing depth. By taking the gene-drop approach, users can also specify the demographic history and define population structure (by supplying a pedigree file). Like real sequencers, SimGBS will output data into FASTQ format.

Installation

SimGBS.jl is registered in the General registry. It can be installed using Pkg.add,

julia> import Pkg;Pkg.add("SimGBS")

or simply

julia> ] 
pkg> add SimGBS

Input

  • Reference genome of the target species in FASTA format (e.g., xxx.fasta.gz/xxx.fa.gz)

  • A list of Illumina barcodes (e.g., GBS_Barcodes.txt)

  • (optional) Pedigree File (e.g.,small.ped)

Output

  • GBS fragments generated by virtual digestion (e.g.,rawGBStags.txt)

  • Selected GBS fragments after fragment size-selection (e.g.,GBStags.txt)

  • Haplotypes, SNP and QTL genotypes (e.g.,hap.txt, snpGeno.txt and qtlGeno.txt)

  • Basic information about simulated GBS experiment (e.g.,keyFile.txt)

  • Simulated GBS reads in FASTQ format (e.g.,xxxxx.fastq)

etc.

Overview

For more information, please visit the documentation page.

Citation

Please cite the following if you use SimGBS.jl,

What's Next?

The following tools are recommended for downstream analyses of GBS data,

  • snpGBS: a simple bioinformatics workflow to identify single nucleotide polymorphism (SNP) from Genotyping-by-Sequencing (GBS) data.

  • KGD: R code for the analysis of genotyping-by-sequencing (GBS) data, primarily to construct a genomic relationship matrix for the genotyped individuals.

  • GUSLD: An R package for estimating linkage disequilibrium using low and/or high coverage sequencing data without requiring filtering with respect to read depth.

  • SMAP a software package that analyzes read mapping distributions and performs haplotype calling to create multi-allelic molecular markers.

Used By Packages

No packages found.