Load fasta files that contain DNA strings and process it for other downstream tasks
Author kchu25
4 Stars
Updated Last
1 Year Ago
Started In
May 2022


Stable Dev Build Status Coverage

This is a package that provides subroutines that loads the DNA sequences in the specified fasta file. The DNA sequences are then transformed into some other useful information, e.g. one-hot/WYK encoded vectors, kmer-frequency preserved shuffled sequences, Markov background estimates, partitioned datasets for K-fold cross-validations (for fasta with labels), etc. for downstream machine learning tasks. As of now, we require all sequences in the fasta file to be the same length, and strings must be defined on DNA alphabets {A,C,G,T}.


Coming Soon