ChemistryFeaturization.jl

Unified graph building and featurizing for Weave.jl, AtomicGraphNets.jl, and (maybe soon) more!
Author aced-differentiate
Popularity
16 Stars
Updated Last
24 Days Ago
Started In
July 2020

ChemistryFeaturization.jl

Run testscodecov

Unified graph building and featurizing for Weave.jl, AtomicGraphNets.jl, and (maybe soon) more!

Documentation is starting to be built in the wiki!

This package is currently focused on bulk systems. For organic molecules, MolecularGraph is recommended. PubChem stores many molecular features for the compounds they catalog, and their data can be accessed via PubChemCrawler.

Features

Graph-building and featurization from CIF files

  • Build graphs (as SimpleWeightedGraphs) from CIF files using PyCall to pymatgen functions
  • Visualization using GraphPlot, check out the visualize_graph function in the graph_functions.jl file, you can make pretty pictures like these, whether the graph is simpler or more complicated (thickness of connections indicates weight of edge in graph (higher weights for nearer neighbors)):

graph_EuMgTl2graph_K4W414O14 (NB: this animation's syntax is slightly out of date, new one to come!)

  • Flexible featurization (currently onehot-style) and decoding: choose features to include, level of discretization, etc., and directly decode feature vectors to check values:
julia> features = Symbol.(["Group", "Row", "Block", "Atomic mass", "Atomic radius", "X"])
6-element Array{Symbol,1}:
 :Group
 :Row
 :Block
 Symbol("Atomic mass")
 Symbol("Atomic radius")
 :X

julia> atom_feature_vecs, featurization = make_feature_vectors(features)
[ Info: 16 elements were dropped so that all features are defined.

julia> decode_feature_vector(atom_feature_vecs["Si"], featurization)
Dict{Symbol,Any} with 6 entries:
  Symbol("Atomic mass")   => (27.1071, 53.2064)
  Symbol("Atomic radius") => (0.955, 1.19)
  :Group                  => 14
  :Row                    => 3
  :Block                  => "p"
  :X                      => (1.684, 2.012)

SMILES input

Sean to add...

Requirements

  • Julia 1.4+
  • packages listed in Project.toml
  • In addition, you will need your PyCall to have access to the pymatgen package, which can be added using Conda.jl as: Conda.add("pymatgen"; channel="conda-forge"), as well as the rdkit package (Conda.add("rdkit"; channel="conda-forge"))

Future Plans:

  • "hybrid" featurizations using features from multiple paradigms if available
  • more input file formats? e.g. SELFIES