TypedFASTX.jl

FASTX records with typed sequences and optional qualities.
Author AntonOresten
Popularity
2 Stars
Updated Last
1 Year Ago
Started In
July 2023

TypedFASTX

Latest Release MIT license Documentation Status Coverage

TypedFASTX.jl is a Julia package for working with FASTA and FASTQ files using typed records. It is largely based on BioJulia's FASTX.jl package, whose records are un-typed, i.e. they are agnostic to what kind of data they contain. Besides from the sequence field, the TypedRecord type also has a description and an optional quality field. TypedFASTX.jl aims to enhance readability and reduce potential errors when dealing with different types of biological sequences. It also allows you to define different methods for specific record types.

Performance

TypedRecords generally take up less memory than FASTX.jl records, since BioSequences.jl's LongSequence type stores sequence information more efficiently. However, this approach might be slightly slower compared to, for instance, storing each field in its own vector, due to the additional overhead required to keep it flexible and user-friendly. TypedFASTX.jl is a little slower than FASTX.jl at writing records to files, as the sequences need to be encoded back to ASCII bytes (which is done through string interpolation) to be stored in FASTA/FASTQ format. One benchmark showed that writing records takes about twice as long compared to FASTX.jl. When it comes to reading, it should be almost as fast as just using plain FASTX.jl (including sequence type conversions).

Installation

You can install TypedFASTX from the Julia REPL. Type ] to enter the Pkg REPL mode and run:

(@v1.9) pkg> add TypedFASTX

Example usage

julia> using TypedFASTX

julia> mickey = DNARecord("Mickey Smith", "GATTACA", "quA1!Ty") # quality is optional
DNARecord (FASTQ):
 description: "Mickey Smith"
    sequence: "GATTACA"
     quality: "quA1!Ty"

julia> sequence(mickey)
7nt DNA Sequence:
GATTACA

julia> sequence(String, mickey)
"GATTACA"

julia> error_rate(mickey)
0.14653682578684113

julia> description(mickey)
"Mickey Smith"

julia> identifier(mickey)
"Mickey"

julia> ricky = LongAA("Ricky Smith", "SMITH")
AARecord (FASTA):
 description: "Ricky Smith"
    sequence: "SMITH"

julia> sequence(ricky)
5aa Amino Acid Sequence:
SMITH

Check out the documentation for more detailed information on how to use the package.

Used By Packages

No packages found.