A Julia package to fold all local and long-range RNA duplexes.
2 Stars
Updated Last
3 Years Ago
Started In
November 2016


Build Status codecov Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

RNA suffix array traversal through chromosomal multiple alignment files (MAF format) to fold and score (with both free-energy models and phylogenetic likelihood) all local and long-range RNA duplexes while allowing bulges and mismatches.

Full documentation is coming soon (Requires Julia v0.6.4), however comprehensive examples of API usage for EvoDuplexes package can be found in test/runtests.jl.

Executable can be run using evo-duplex.jl in /bin:

$ julia evo-duplex.jl -h
usage: evo-duplex.jl --tree TREE [--cons-regions CONS-REGIONS]
                     --gene-regions GENE-REGIONS --maf MAF
                     [--model-load MODEL-LOAD]
                     [--model-data MODEL-DATA] [--model-train]
                     [-o OUTPUT] [--cons-branch CONS-BRANCH]
                     [--sig-ratio SIG-RATIO] [--output-all]
                     [--max-distance MAX-DISTANCE]
                     [--max-duplex-length MAX-DUPLEX-LENGTH]
                     [--max-bulges MAX-BULGES]
                     [--max-mismatches MAX-MISMATCHES]
                     [--max-deltag MAX-DELTAG] [-h]

optional arguments:
  --tree TREE           Phylogenetic tree with neutral branch lengths
                        (in newick format)
  --cons-regions CONS-REGIONS
                        BED file containing conserved regions
  --gene-regions GENE-REGIONS
                        BED file containing gene units to allow
                        long-range folding within
  --maf MAF             Directory with MAF files named by chromosome
                        (chr1.maf.gz...) (default: "../maf")
  --model-load MODEL-LOAD
                        Load pre-trained IsolationForest models,
  --model-data MODEL-DATA
                        Load training data from `.jlt` file, output
                        `.evt.jls` file
  --model-train         Train the IsolationForest models from scratch
                        using input regions, output `.evt.jls` file
  -o, --output OUTPUT   Output prefix (default: "evoduplex")
  --cons-branch CONS-BRANCH
                        When conserved regions are given, use this
                        branch length multiplier (type: Float64,
                        default: 0.33)
  --sig-ratio SIG-RATIO
                        Output the top fraction of outliers (type:
                        Float64, default: 0.05)
  --output-all          Output all folds regardless of prediction
  --max-distance MAX-DISTANCE
                        Set a limit on the maximum distance between a
                        left/right arm of a duplex (type: Int64,
                        default: 2000)
  --max-duplex-length MAX-DUPLEX-LENGTH
                        Set the maximum duplex length that will be
                        accessible from the suffix array (type: Int64,
                        default: 50)
  --max-bulges MAX-BULGES
                        Set the maximum number of bulges to allow for
                        a duplex (type: Int64, default: 3)
  --max-mismatches MAX-MISMATCHES
                        Set the maximum number of mismatches to allow
                        for a duplex (type: Int64, default: 3)
  --max-deltag MAX-DELTAG
                        Set the maximum allowable deltaG value (type:
                        Float64, default: -8.0)
  -h, --help            show this help message and exit