Glove
Implements Global Word Vectors.
using Pkg
Pkg.add("https://github.com/domluna/Glove.jl.git")
See benchmark/perf.jl
for a usage example.
Here's the rough idea:
-
Take text and make a LookupTable. This is a dictionary that has a map from words -> ids and vice-versa. Preprocessing steps should be taken prior to this.
-
Use
weightedsums
to get the weighted co-occurence sum totals. This returns aCooccurenceDict
. -
Convert the
CooccurenceDict
to aCooccurenceVector
. The reasoning for this is faster indexing when we train the model. -
Initialize a
Model
and train the model with theCooccurenceVector
using theagagrad!
method.
It's pretty fast at this point. On a single core it's roughly 3x slower than the optimized C version.
TODO
-
[ ] More docs.
-
[ ] See if precompile(args...) does anything
-
[ ] Notebook example ( has to have emojis )
-
[ ] Multi-threading