Pseudo-likelihood maximization in Julia. If you use this algorithm, you should cite:
-
M. Ekeberg, C. Lovkvist, Y. Lan, M. Weigt, E. Aurell, Improved contact prediction in proteins: Using pseudolikelihood to infer Potts models, Phys. Rev. E 87, 012707 (2013)
-
M. Ekeberg, T. Hartonen, E. Aurell, Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences, arXiv:1401.4832 (supplementary material)
The present software is a Julia implementation of above mentioned papers, with no reference to the original MATLAB implementation.
The code requires Julia version 1.5 or later.
The package is in Julia's General Registry and can be installed from the REPL by typing ]
(to enter Julia's package manager) and then:
(@v1.x) pkg> add PlmDCA
The code internally uses NLopt which provides a Julia interfaces to the free/open-source NLopt library.
To load the code just type
julia> using PlmDCA
The functions in this package are written to maximize performance. Most
computationally-heavy functions can use multiple threads (start julia with
the -t
option or set the JULIA_NUM_THREADS
environment variable).
For more information on how set correctly the number of threads, please
refer to the online Julia Documentation on Multi-Threading.
The software provides two main functions plmdca(filename::String, ...)
and plmdca_sym(filename::String,...)
(resp. the asymmetric and
symmetric coupling version of the algorithm). Empirically it turns out
that the asymmetric version is faster and more accurate. This function
take as input the name of a (possibly zipped) multiple sequence.
We also provide another function mutualinfo(filename::String,...)
to
compute the mutual information score.
There are a number of possible algorithmic strategies for the
optimization problem. As long as local gradient-based optimization is
concerned, this is a list of :symbols
(associated to the different
methods):
:LD_MMA, :LD_SLSQP, :LD_LBFGS, :LD_TNEWTON_PRECOND
:LD_TNEWTON_PRECOND_RESTART, :LD_TNEWTON, :LD_VAR2, :LD_VAR1
After some experiments, we found that the best compromise between
accuracy and speed is achieved by the Low Storage BFGS method
:LD_LBFGS
, which is the default method in the code. The other
methods can be set changing the default optional argument
(e.g. method=:LD_SLSQP
).
The functions output a type PlmOut
(say X
) with 4 fields:
X.Jtensor
: the coupling matrixJ[ri,rj,i,j]
a symmetrizedq x q x N x N
array, whereN
is the number of residues in the multiple sequence alignment, andq
is the alphabet "size" (typically 21 for proteins).X.htensor
: the external fieldh[r_i,i]
q x N
array.X.pslike
: the pseudolikelihoodX.score
: a vector ofTuple{Int,Int,Float64}
containing the candidate contacts in descending score order (residue1, residue2 , score12).
-
The minimal julia version for using this code is 1.3 (package version <= v0.2.0)
-
From package versions 0.3.0 on the minimal
julia
requirement is 1.5 (although the oldest version we test is v1.6)