SimGBS is a versatile method of simulating Genotyping-by-Sequencing (GBS) data. It can be implemented with any genome of choice. Users can modify different parameters to customise GBS setting, such as the choice of restriction enzyme and sequencing depth. By taking the gene-drop approach, users can also specify the demographic history and define population structure (by supplying a pedigree file). Like real sequencers, SimGBS will output data into FASTQ format.
SimGBS.jl
is registered in the General
registry. It can be installed using Pkg.add
,
julia> import Pkg;Pkg.add("SimGBS")
or simply
julia> ]
pkg> add SimGBS
-
Reference genome of the target species in FASTA format (e.g.,
xxx.fasta.gz
/xxx.fa.gz
) -
A list of Illumina barcodes (e.g.,
GBS_Barcodes.txt
) -
(optional) Pedigree File (e.g.,
small.ped
)
-
GBS fragments generated by virtual digestion (e.g.,
rawGBStags.txt
) -
Selected GBS fragments after fragment size-selection (e.g.,
GBStags.txt
) -
Haplotypes, SNP and QTL genotypes (e.g.,
hap.txt
,snpGeno.txt
andqtlGeno.txt
) -
Basic information about simulated GBS experiment (e.g.,
keyFile.txt
) -
Simulated GBS reads in FASTQ format (e.g.,
xxxxx.fastq
)
etc.
For more information, please visit the documentation page.
Please cite the following if you use SimGBS.jl
,
The following tools are recommended for downstream analyses of GBS data,
-
snpGBS: a simple bioinformatics workflow to identify single nucleotide polymorphism (SNP) from Genotyping-by-Sequencing (GBS) data.
-
KGD: R code for the analysis of genotyping-by-sequencing (GBS) data, primarily to construct a genomic relationship matrix for the genotyped individuals.
-
GUSLD: An R package for estimating linkage disequilibrium using low and/or high coverage sequencing data without requiring filtering with respect to read depth.
-
SMAP a software package that analyzes read mapping distributions and performs haplotype calling to create multi-allelic molecular markers.