AssigningSecondaryStructure provides a way to assign loops, helices, and strands to protein backbones using a simplified version of the DSSP algorithm.
Both the BioStructures.jl and ProteinSecondaryStructures.jl packages provide interfaces for more sophisticated secondary structure assignment, but they both call the DSSP_jll.jl binary under the hood, which requires writing structures to a file with significant overhead.
The package is registered in the General registry, and can be installed from the REPL with ]add AssigningSecondaryStructure
.
The assign_secondary_structure
function takes a vector of atom coordinate arrays of size (3, 3, L). The first axis is for the x, y, and z coordinates, the second axis is for the atom types (N, CA, C), and the third axis is for the residues.
julia> using BioStructures
julia> coords_vector = map(collectchains(read("test/data/1ZAK.pdb", PDBFormat))) do chain
reshape(coordarray(chain, backboneselector), 3, 4, :)[:, 1:3, :] # get N, CA, C atoms only
end
julia> using AssigningSecondaryStructure
julia> assign_secondary_structure(coords_vector) # 2 chains
2-element Vector{Vector{Int64}}:
[1, 1, 1, 1, 3, 3, 3, 3, 3, 3 … 2, 2, 2, 2, 2, 2, 2, 1, 1, 1]
[1, 1, 1, 1, 3, 3, 3, 3, 3, 3 … 2, 2, 2, 2, 2, 2, 2, 1, 1, 1]
This package was originally ported from the PyDSSP package, created by Shintaro Minami. The code has since been rewritten to look more like the 1983 paper (Kabsch W and Sander C), and to be more Julian, understandable, and efficient, at the cost of it no longer being differentiable like the PyDSSP version. The time complexity is still quadratic, so it may be slow for larger proteins. We plan on making a more efficient version with k-d trees.