TypedFASTX.jl is a Julia package for working with FASTA and FASTQ files using typed records. It is largely based on BioJulia's FASTX.jl package, whose records are un-typed, i.e. they are agnostic to what kind of data they contain. Besides from the sequence field, the TypedRecord type also has a description and an optional quality field. TypedFASTX.jl aims to enhance readability and reduce potential errors when dealing with different types of biological sequences. It also allows you to define different methods for specific record types.
TypedRecords generally take up less memory than FASTX.jl records, since BioSequences.jl's LongSequence type stores sequence information more efficiently. However, this approach might be slightly slower compared to, for instance, storing each field in its own vector, due to the additional overhead required to keep it flexible and user-friendly. TypedFASTX.jl is a little slower than FASTX.jl at writing records to files, as the sequences need to be encoded back to ASCII bytes (which is done through string interpolation) to be stored in FASTA/FASTQ format. One benchmark showed that writing records takes about twice as long compared to FASTX.jl. When it comes to reading, it should be almost as fast as just using plain FASTX.jl (including sequence type conversions).
You can install TypedFASTX from the Julia REPL. Type ]
to enter the Pkg REPL mode and run:
(@v1.9) pkg> add TypedFASTX
julia> using TypedFASTX
julia> mickey = DNARecord("Mickey Smith", "GATTACA", "quA1!Ty") # quality is optional
DNARecord (FASTQ):
description: "Mickey Smith"
sequence: "GATTACA"
quality: "quA1!Ty"
julia> sequence(mickey)
7nt DNA Sequence:
GATTACA
julia> sequence(String, mickey)
"GATTACA"
julia> error_rate(mickey)
0.14653682578684113
julia> description(mickey)
"Mickey Smith"
julia> identifier(mickey)
"Mickey"
julia> ricky = LongAA("Ricky Smith", "SMITH")
AARecord (FASTA):
description: "Ricky Smith"
sequence: "SMITH"
julia> sequence(ricky)
5aa Amino Acid Sequence:
SMITH
Check out the documentation for more detailed information on how to use the package.